Estimating the Number of Korean Words Based on Corpus


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 5, No. 7, pp. 1774-1782, Jul. 1998
10.3745/KIPSTE.1998.5.7.1774,   PDF Download:

Abstract

It is very hard to estimate the number of total words in a language. Recently large corpus which is the body of written, spoken or other material and which is thought as the representative of a language is under construction. So, it is possible to estimate the number of words in a language based on the corpus. In this paper we propose the method for estimating the number of Korean words using Korean corpus and estimate the number of words. We also estimate the number of Korean names which occupy the large part of proper nouns. To estimate the number of total different Korean words and names we applied a generalized linear estimation method. 1,062,392 is the number of estimated Korean words using the corpus of 10 million phrases and 1,493,003 is the estimated number of Korean names.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. S. Ki and H. G. shik, "Estimating the Number of Korean Words Based on Corpus," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 5, no. 7, pp. 1774-1782, 1998. DOI: 10.3745/KIPSTE.1998.5.7.1774.

[ACM Style]
Kim Sung Ki and Han Geun shik. 1998. Estimating the Number of Korean Words Based on Corpus. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 5, 7, (1998), 1774-1782. DOI: 10.3745/KIPSTE.1998.5.7.1774.