Comparison of Significant Term Extraction Based on the Number of Selected Principal Components


The KIPS Transactions:PartB , Vol. 13, No. 3, pp. 329-336, Jun. 2006
10.3745/KIPSTB.2006.13.3.329,   PDF Download:

Abstract

In this paper, we propose a method of significant term extraction within a document. The technique used is Principal Component Analysis(PCA) which is one of the multivariate analysis methods. PCA can sufficiently use term-term relationships within a document by term-term correlations. We use a correlation matrix instead of a covariance matrix between terms for performing PCA. We also try to find out thresholds of both the number of components to be selected and correlation coefficients between selected components and terms. The experimental results on 283 Korean newspaper articles show that the condition of the first six components with correlation coefficients of 0.4 is the best for extracting sentence based on the significant selected terms.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
C. B. Lee, C. Y. Ock, H. R. Park, "Comparison of Significant Term Extraction Based on the Number of Selected Principal Components," The KIPS Transactions:PartB , vol. 13, no. 3, pp. 329-336, 2006. DOI: 10.3745/KIPSTB.2006.13.3.329.

[ACM Style]
Chang Beom Lee, Cheol Young Ock, and Hyuk Ro Park. 2006. Comparison of Significant Term Extraction Based on the Number of Selected Principal Components. The KIPS Transactions:PartB , 13, 3, (2006), 329-336. DOI: 10.3745/KIPSTB.2006.13.3.329.