Discretization of Continuous - Valued Attributes for Classification Learning


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 4, No. 6, pp. 1541-1549, Jun. 1997
10.3745/KIPSTE.1997.4.6.1541,   PDF Download:

Abstract

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into account the value of the target attribute. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for experiment. We use back propagation algorithm and ID3 as classification tools to compare the accuracy of our discretization method with that of other methods.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
L. C. Hwan, "Discretization of Continuous - Valued Attributes for Classification Learning," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 4, no. 6, pp. 1541-1549, 1997. DOI: 10.3745/KIPSTE.1997.4.6.1541.

[ACM Style]
Lee Chang Hwan. 1997. Discretization of Continuous - Valued Attributes for Classification Learning. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 4, 6, (1997), 1541-1549. DOI: 10.3745/KIPSTE.1997.4.6.1541.