Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code


KIPS Transactions on Computer and Communication Systems, Vol. 8, No. 2, pp. 35-40, Feb. 2019
https://doi.org/10.3745/KTCCS.2019.8.2.35,   PDF Download:  
Keywords: Distributed Processing System, Malware detection, feature extraction, Machine Learning
Abstract

Traditional Malware Detection is susceptible for detecting malware which is modified by polymorphism or obfuscation technology. By learning patterns that are embedded in malware code, machine learning algorithms can detect similar behaviors and replace the current detection methods. Data must collected continuously in order to learn malicious code patterns that change over time. However, the process of storing and processing a large amount of malware files is accompanied by high space and time complexity. In this paper, an HDFS-based distributed processing system is designed to reduce space complexity and accelerate feature extraction time. Using a distributed processing system, we extract two API features based on filtering basis, 2-gram feature and APICFG feature and the generalization performance of ensemble learning models is compared. In experiments, the time complexity of the feature extraction was improved about 3.75 times faster than the processing time of a single computer, and the space complexity was about 5 times more efficient. The 2-gram feature was the best when comparing the classification performance by feature, but the learning time was long due to high dimensionality.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Lee, S. Euh and D. Hwang, "Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code," KIPS Transactions on Computer and Communication Systems, vol. 8, no. 2, pp. 35-40, 2019. DOI: https://doi.org/10.3745/KTCCS.2019.8.2.35.

[ACM Style]
Hyunjong Lee, Seongyul Euh, and Doosung Hwang. 2019. Distributed Processing System Design and Implementation for Feature Extraction from Large-Scale Malicious Code. KIPS Transactions on Computer and Communication Systems, 8, 2, (2019), 35-40. DOI: https://doi.org/10.3745/KTCCS.2019.8.2.35.