Automatic Text Categorization Using Passage-based Weight Function and Passage Type


The KIPS Transactions:PartB , Vol. 12, No. 6, pp. 703-714, Oct. 2005
10.3745/KIPSTB.2005.12.6.703,   PDF Download:

Abstract

Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents available today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Reuters text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
W. K. Joo, J. S. Kim, K. S. Choi, "Automatic Text Categorization Using Passage-based Weight Function and Passage Type," The KIPS Transactions:PartB , vol. 12, no. 6, pp. 703-714, 2005. DOI: 10.3745/KIPSTB.2005.12.6.703.

[ACM Style]
Won Kyun Joo, Jin Suk Kim, and Ki Seok Choi. 2005. Automatic Text Categorization Using Passage-based Weight Function and Passage Type. The KIPS Transactions:PartB , 12, 6, (2005), 703-714. DOI: 10.3745/KIPSTB.2005.12.6.703.