A Document Collection Method for More Accurate Search Engine


The KIPS Transactions:PartA, Vol. 10, No. 5, pp. 469-478, Oct. 2003
10.3745/KIPSTA.2003.10.5.469,   PDF Download:

Abstract

Internet information search engines using web robots visit web servers connected to the Internet periodically or non-periodically. They extract and classify data collected according to their own methods and construct their databases, which are the basis of web information search engines. These procedures are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular internet portal sites which provide users ways how to find information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. E. Yong, K. H. Yong, H. H. Yeong, "A Document Collection Method for More Accurate Search Engine," The KIPS Transactions:PartA, vol. 10, no. 5, pp. 469-478, 2003. DOI: 10.3745/KIPSTA.2003.10.5.469.

[ACM Style]
Ha Eun Yong, Kwon Hui Yong, and Hwang Ho Yeong. 2003. A Document Collection Method for More Accurate Search Engine. The KIPS Transactions:PartA, 10, 5, (2003), 469-478. DOI: 10.3745/KIPSTA.2003.10.5.469.