Practical Page Segmentation using Connected Components and Color Information


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 7, No. 1, pp. 273-285, Jan. 2000
10.3745/KIPSTE.2000.7.1.273,   PDF Download:

Abstract

While page segmentation is an important step in document recognition, there haven''t been many researches on it. More improvement is still needed on the segmentation of document elements in complicated or color documents. In this paper, I present a new page segmentation method which can segment pages with multiple columns, dotted lines, graphics, and photographs. I extract all connected components using contour following and combine them depending on the size and positional information of them. Separate text location is done for non-text color regions to extract possible text lines. To see the performance of the proposed method, experiments are done for 180 documents. Four commercial OCR programs are also tested and the proposed method showed the best result.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
P. K. Kim, "Practical Page Segmentation using Connected Components and Color Information," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 7, no. 1, pp. 273-285, 2000. DOI: 10.3745/KIPSTE.2000.7.1.273.

[ACM Style]
Pyeoung Kee Kim. 2000. Practical Page Segmentation using Connected Components and Color Information. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 7, 1, (2000), 273-285. DOI: 10.3745/KIPSTE.2000.7.1.273.