Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM


KIPS Transactions on Computer and Communication Systems, Vol. 11, No. 12, pp. 445-452, Dec. 2022
https://doi.org/10.3745/KTCCS.2022.11.12.445,   PDF Download:
Keywords: Machine Learning, Scaling, SMOTE, Light GBM, Imbalanced Classification
Abstract

Class distribution of unbalanced data is an important part of the digital world and is a significant part of cybersecurity. Abnormal activity of unbalanced data should be found and problems solved. Although a system capable of tracking patterns in all transactions is needed, machine learning with disproportionate data, which typically has abnormal patterns, can ignore and degrade performance for minority layers, and predictive models can be inaccurately biased. In this paper, we predict target variables and improve accuracy by combining estimates using Synthetic Minority Oversampling Technique (SMOTE) and Light GBM algorithms as an approach to address unbalanced datasets. Experimental results were compared with logistic regression, decision tree, KNN, Random Forest, and XGBoost algorithms. The performance was similar in accuracy and reproduction rate, but in precision, two algorithms performed at Random Forest 80.76% and Light GBM 97.16%, and in F1-score, Random Forest 84.67% and Light GBM 91.96%. As a result of this experiment, it was confirmed that Light GBM's performance was similar without deviation or improved by up to 16% compared to five algorithms.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
Y. Han and I. Joe, "Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM," KIPS Transactions on Computer and Communication Systems, vol. 11, no. 12, pp. 445-452, 2022. DOI: https://doi.org/10.3745/KTCCS.2022.11.12.445.

[ACM Style]
Young-Jin Han and In-Whee Joe. 2022. Imbalanced Data Improvement Techniques Based on SMOTE and Light GBM. KIPS Transactions on Computer and Communication Systems, 11, 12, (2022), 445-452. DOI: https://doi.org/10.3745/KTCCS.2022.11.12.445.