Performance Improvement by Cluster Analysis in Korean-English and Japanese-English Cross-Language Information Retrieval


The KIPS Transactions:PartB , Vol. 11, No. 2, pp. 233-240, Apr. 2004
10.3745/KIPSTB.2004.11.2.233,   PDF Download:

Abstract

This paper presents a method to implicitly resolve ambiguities using dynamic incremental clustering in Korean-to-English and Japanese-to-English cross-language information retrieval (CLIR). The main objective of this paper shows that document clusters can effectively resolve the ambiguities tremendously increased in translated queries as well as take into account the context of all the terms in a document. In the framework we propose, a query in Korean/Japanese is first translated into English by looking up bilingual dictionaries, then documents are retrieved for the translated query terms based on the vector space retrieval model or the probabilistic retrieval model. For the top-ranked retrieved documents, query-oriented document clusters are incrementally created and the weight of each retrieved document is re-calculated by using the clusters. In the experiment based on TREC test collection, our method achieved 39.41% and 36.79% improvement for translated queries without ambiguity resolution in Korean-to-English CLIR, and 17.89% and 30.46% improvements in Japanese-to-English CLIR, on the vector space retrieval and on the probabilistic retrieval, respectively. Our method achieved 12.30% improvements for all translation queries, compared with blind feedback in Korean-to-English CLIR. These results indicate that cluster analysis help to resolve ambiguity.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
L. G. Sun, "Performance Improvement by Cluster Analysis in Korean-English and Japanese-English Cross-Language Information Retrieval," The KIPS Transactions:PartB , vol. 11, no. 2, pp. 233-240, 2004. DOI: 10.3745/KIPSTB.2004.11.2.233.

[ACM Style]
Lee Gyeong Sun. 2004. Performance Improvement by Cluster Analysis in Korean-English and Japanese-English Cross-Language Information Retrieval. The KIPS Transactions:PartB , 11, 2, (2004), 233-240. DOI: 10.3745/KIPSTB.2004.11.2.233.