Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning

Jae Hoon Kim; Kong Joo Lee

Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning

Jae Hoon Kim

Kong Joo Lee

The KIPS Transactions:PartB , Vol. 10, No. 1, pp. 47-56, Feb. 2003

10.3745/KIPSTB.2003.10.1.47, PDF Download:

Abstract

Korean delimits words by white-space like English, but words in Korean is a little different in structure from those in English. Words in English generally consist of one word, but those in Korean are composed of one word and/or morpheme or more. Because of this difference, a word between white-spaces is called an Eojeol in Korean. We propose a method for segmenting and classifying Korean words and/or morphemes based on syllables using an instance-based learning. In this paper, elements of feature sets for the instance-based learning are one previous syllable, one current syllable, two next syllables, a final consonant of the current syllable, and two previous categories. Our method shows more than 97% of the F-measure of word segmentation using ETRI corpus and KAIST corpus.

Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

J. H. Kim and K. J. Lee, "Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning," The KIPS Transactions:PartB , vol. 10, no. 1, pp. 47-56, 2003. DOI: 10.3745/KIPSTB.2003.10.1.47.

[ACM Style]

Jae Hoon Kim and Kong Joo Lee. 2003. Segmenting and Classifying Korean Words based on Syllables Using 1nstance-Based Learning. The KIPS Transactions:PartB , 10, 1, (2003), 47-56. DOI: 10.3745/KIPSTB.2003.10.1.47.