A Survey on Malware Detection Using Data Mining Techniques

Top Cited Papers

29 June 2017

journal article
survey
Published by Association for Computing Machinery (ACM) in ACM Computing Surveys

Vol. 50 (3), 1-40
https://doi.org/10.1145/3073559

Abstract

In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the major defense against malware. Unfortunately, driven by the economic benefits, the number of new malware samples has explosively increased: anti-malware vendors are now confronted with millions of potential malware samples per year. In order to keep on combating the increase in malware samples, there is an urgent need to develop intelligent methods for effective and efficient malware detection from the real and large daily sample collection. In this article, we first provide a brief overview on malware as well as the anti-malware industry, and present the industrial needs on malware detection. We then survey intelligent malware detection methods. In these methods, the process of detection is usually divided into two stages: feature extraction and classification/clustering. The performance of such intelligent malware detection approaches critically depend on the extracted features and the methods for classification/clustering. We provide a comprehensive investigation on both the feature extraction and the classification/clustering techniques. We also discuss the additional issues and the challenges of malware detection using data mining techniques and finally forecast the trends of malware development.

Keywords

Funding Information

Scientific and Technological Support Project (Society) of Jiangsu (BE2016776)
U.S. National Science Foundation (IIS-1213026, CNS-1461926 and CNS-1618629)
Chinese NSF (91646116)

This publication has 90 references indexed in Scilit:

Classification of malware based on integrated static and dynamic features
Journal of Network and Computer Applications, 2013
Random KNN feature selection - a fast and stable alternative to Random Forests
BMC Bioinformatics, 2011
Ensemble-based classifiers
Artificial Intelligence Review, 2009
The WEKA data mining software
ACM SIGKDD Explorations Newsletter, 2009
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2008
A survey on automated dynamic malware-analysis techniques and tools
ACM Computing Surveys, 2008
A review of feature selection techniques in bioinformatics
Bioinformatics, 2007
A review of associative classification mining
The Knowledge Engineering Review, 2007
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Journal of Computer and System Sciences, 1997
Evidential reasoning using stochastic simulation of causal models
Artificial Intelligence, 1987

Cited by 381 articles