Performance Improvement by a Virtual Documents Technique in Text Categorization


The KIPS Transactions:PartB , Vol. 11, No. 4, pp. 501-508, Aug. 2004
10.3745/KIPSTB.2004.11.4.501,   PDF Download:

Abstract

This paper proposes a virtual relevant document technique in the learning phase for text categorization. The method uses a simple transformation of relevant documents, i.e. making virtual documents by combining document pairs in the training set. The virtual document produced by this method has the enriched term vector space, with greater weights for the terms that co-occur in two relevant documents. The experimental results showed a significant improvement over the baseline, which proves the usefulness of the proposed method : 71% improvement on TREC-11 filtering test collection and 11% improvement on Reuters-21578 test set for the topics with less than 100 relevant documents in the micro average F1. The result analysis indicates that the addition of virtual relevant documents contributes to the steady improvement of the performance.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. S. Lee and D. U. An, "Performance Improvement by a Virtual Documents Technique in Text Categorization," The KIPS Transactions:PartB , vol. 11, no. 4, pp. 501-508, 2004. DOI: 10.3745/KIPSTB.2004.11.4.501.

[ACM Style]
Kyung Soon Lee and Dong Un An. 2004. Performance Improvement by a Virtual Documents Technique in Text Categorization. The KIPS Transactions:PartB , 11, 4, (2004), 501-508. DOI: 10.3745/KIPSTB.2004.11.4.501.