Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning


The KIPS Transactions:PartB , Vol. 18, No. 1, pp. 45-50, Feb. 2011
10.3745/KIPSTB.2011.18.1.45,   PDF Download:

Abstract

Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. S. Choi and H. S. Kim, "Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning," The KIPS Transactions:PartB , vol. 18, no. 1, pp. 45-50, 2011. DOI: 10.3745/KIPSTB.2011.18.1.45.

[ACM Style]
Maeng Sik Choi and Hark Soo Kim. 2011. Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning. The KIPS Transactions:PartB , 18, 1, (2011), 45-50. DOI: 10.3745/KIPSTB.2011.18.1.45.