A Survey on Malware Detection Using Data Mining Techniques
Top Cited Papers
- 29 June 2017
- journal article
- survey
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 50 (3), 1-40
- https://doi.org/10.1145/3073559
Abstract
In the Internet age, malware (such as viruses, trojans, ransomware, and bots) has posed serious and evolving security threats to Internet users. To protect legitimate users from these threats, anti-malware software products from different companies, including Comodo, Kaspersky, Kingsoft, and Symantec, provide the major defense against malware. Unfortunately, driven by the economic benefits, the number of new malware samples has explosively increased: anti-malware vendors are now confronted with millions of potential malware samples per year. In order to keep on combating the increase in malware samples, there is an urgent need to develop intelligent methods for effective and efficient malware detection from the real and large daily sample collection. In this article, we first provide a brief overview on malware as well as the anti-malware industry, and present the industrial needs on malware detection. We then survey intelligent malware detection methods. In these methods, the process of detection is usually divided into two stages: feature extraction and classification/clustering. The performance of such intelligent malware detection approaches critically depend on the extracted features and the methods for classification/clustering. We provide a comprehensive investigation on both the feature extraction and the classification/clustering techniques. We also discuss the additional issues and the challenges of malware detection using data mining techniques and finally forecast the trends of malware development.Keywords
Funding Information
- Scientific and Technological Support Project (Society) of Jiangsu (BE2016776)
- U.S. National Science Foundation (IIS-1213026, CNS-1461926 and CNS-1618629)
- Chinese NSF (91646116)
This publication has 90 references indexed in Scilit:
- Classification of malware based on integrated static and dynamic featuresJournal of Network and Computer Applications, 2013
- Random KNN feature selection - a fast and stable alternative to Random ForestsBMC Bioinformatics, 2011
- Ensemble-based classifiersArtificial Intelligence Review, 2009
- The WEKA data mining softwareACM SIGKDD Explorations Newsletter, 2009
- Learning Deep Architectures for AIFoundations and Trends® in Machine Learning, 2008
- A survey on automated dynamic malware-analysis techniques and toolsACM Computing Surveys, 2008
- A review of feature selection techniques in bioinformaticsBioinformatics, 2007
- A review of associative classification miningThe Knowledge Engineering Review, 2007
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Evidential reasoning using stochastic simulation of causal modelsArtificial Intelligence, 1987