An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams


The KIPS Transactions:PartD, Vol. 14, No. 7, pp. 733-742, Dec. 2007
10.3745/KIPSTD.2007.14.7.733,   PDF Download:

Abstract

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, memory usage for data stream analysis should be confined finitely although new data elements are continuously generated in a data stream. To satisfy this requirement, data stream processing sacrifices the correctness of its analysis result by allowing some errors. The old distribution statistics are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. This paper proposes a grid based clustering algorithm for a data stream. Given a set of initial grid cells, the dense range of a grid cell is recursively partitioned into a smaller cell based on the distribution statistics of data elements by a top down manner until the smallest cell, called a unit cell, is identified. Since only the distribution statistics of data elements are maintained by dynamically partitioned grid cells, the clusters of a data stream can be effectively found without maintaining the data elements physically. Furthermore, the memory usage of the proposed algorithm is adjusted adaptively to the size of confined memory space by flexibly resizing the size of a unit cell. As a result, the confined memory space can be fully utilized to generate the result of clustering as accurately as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
N. H. Park and W. S. Lee, "An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams," The KIPS Transactions:PartD, vol. 14, no. 7, pp. 733-742, 2007. DOI: 10.3745/KIPSTD.2007.14.7.733.

[ACM Style]
Nam Hun Park and Won Suk Lee. 2007. An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams. The KIPS Transactions:PartD, 14, 7, (2007), 733-742. DOI: 10.3745/KIPSTD.2007.14.7.733.