Bin2vec: learning representations of binary executable programs for security tasks
Open Access
- 1 July 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Cybersecurity
- Vol. 4 (1), 1-14
- https://doi.org/10.1186/s42400-021-00088-4
Abstract
Tackling binary program analysis problems has traditionally implied manually defining rules and heuristics, a tedious and time consuming task for human analysts. In order to improve automation and scalability, we propose an alternative direction based on distributed representations of binary programs with applicability to a number of downstream tasks. We introduce Bin2vec, a new approach leveraging Graph Convolutional Networks (GCN) along with computational program graphs in order to learn a high dimensional representation of binary executable programs. We demonstrate the versatility of this approach by using our representations to solve two semantically different binary analysis tasks – functional algorithm classification and vulnerability discovery. We compare the proposed approach to our own strong baseline as well as published results, and demonstrate improvement over state-of-the-art methods for both tasks. We evaluated Bin2vec on 49191 binaries for the functional algorithm classification task, and on 30 different CWE-IDs including at least 100 CVE entries each for the vulnerability discovery task. We set a new state-of-the-art result by reducing the classification error by 40% compared to the source-code based inst2vec approach, while working on binary code. For almost every vulnerability class in our dataset, our prediction accuracy is over 80% (and over 90% in multiple classes).Keywords
Funding Information
- USC Information Sciences Institute
This publication has 36 references indexed in Scilit:
- Automatic malware mutant detection and group classification based on the n-gram and clustering coefficientThe Journal of Supercomputing, 2015
- Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature SelectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- On the capability of static code analysis to detect security vulnerabilitiesInformation and Software Technology, 2015
- Robust and Effective Malware Detection Through Quantitative Data Flow Graph MetricsPublished by Springer Science and Business Media LLC ,2015
- Approximating Attack Surfaces with Stack TracesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- DeepWalkPublished by Association for Computing Machinery (ACM) ,2014
- Modeling and Discovering Vulnerabilities with Code Property GraphsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in AndroidPublished by Springer Science and Business Media LLC ,2013
- Finding Buffer Overflow Inducing Loops in Binary ExecutablesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- N-GRAMS-BASED FILE SIGNATURES FOR MALWARE DETECTIONPublished by INSTICC ,2009