Effect of Sampling for Multi-set Cardinality Estimation


KIPS Transactions on Computer and Communication Systems, Vol. 4, No. 1, pp. 15-22, Jan. 2015
10.3745/KTCCS.2015.4.1.15,   PDF Download:

Abstract

Estimating the number of distinct values is really well-known problems in network data measurement and many effective algorithms are suggested. Recent works have built upon technique called Linear Counting to solve the estimation problem for massive sets or spreaders in small memory. Sampling is used to reduce the measurement data, and it is assumed that sampling gives bad effect on the accuracy. In this paper, however, we show that the sampling on multi-set estimation sometimes gives better results for CSE with sampling than for MCSE that examines all the packets without sampling in terms of accuracy and estimation range. To prove this, we presented mathematical analysis, conducted experiment with real data, and compared the results of CSE, MCSE, and CSES.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
D. Dao and D. H. Nyang, "Effect of Sampling for Multi-set Cardinality Estimation," KIPS Transactions on Computer and Communication Systems, vol. 4, no. 1, pp. 15-22, 2015. DOI: 10.3745/KTCCS.2015.4.1.15.

[ACM Style]
Dinhnguyen Dao and Dae Hun Nyang. 2015. Effect of Sampling for Multi-set Cardinality Estimation. KIPS Transactions on Computer and Communication Systems, 4, 1, (2015), 15-22. DOI: 10.3745/KTCCS.2015.4.1.15.