Effective Parallel Hash Join Algorithm Based on Histogram Equalization in the Presence of Data Skew


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 4, No. 2, pp. 338-348, Feb. 1997
10.3745/KIPSTE.1997.4.2.338,   PDF Download:

Abstract

In this paper, we first propose a data distribution framework to resolve load imbalance and bucket overflow in parallel hash join. Using the histogram equalization technique, the framework transforms a histogram of skewed data to the desired uniform distribution that corresponds to the relative computing power of node processors in the system. Next we propose an efficient parallel hash join algorithm for handling skewed data based on the proposed data distribution methodology. For performance comparison of our algorithm with other hash join algorithms, we perform simulation experiments and actual execution on COREDB database computer with 8-node hypercube architecture. In these experiments, skewed data distribution of the join attribute is modeled using a Zipf-like distribution. The performance studies indicate that our algorithm outperforms other algorithms in the skewed cases.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
P. U. Kyu, C. H. Kyu, K. T. Gon, "Effective Parallel Hash Join Algorithm Based on Histogram Equalization in the Presence of Data Skew," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 4, no. 2, pp. 338-348, 1997. DOI: 10.3745/KIPSTE.1997.4.2.338.

[ACM Style]
Park Ung Kyu, Choi Hwang Kyu, and Kim Tag Gon. 1997. Effective Parallel Hash Join Algorithm Based on Histogram Equalization in the Presence of Data Skew. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 4, 2, (1997), 338-348. DOI: 10.3745/KIPSTE.1997.4.2.338.