BinDNN: Resilient Function Matching Using Deep Learning

Open Access
Lageman, Nathaniel John
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
November 07, 2016
Committee Members:
  • Patrick Drew Mcdaniel, Thesis Advisor/Co-Advisor
  • reverse engineering
  • malware
  • deep learning
Determining if two functions taken from different compiled binaries originate from the same function in the source code has many applications to malware reverse engineering. Namely, this process allows an analyst to filter large swaths of code, removing functions that have been previously observed or those that originate in shared or trusted libraries. However, this task is challenging due to the myriad factors that influence the translation between source code and assembly instructions—the instruction stream created by a compiler is heavily influenced by a number of factors including optimizations, target platforms, and runtime constraints. In this paper, we seek to advance methods for reliably testing the equivalence of functions found in different executables. By leveraging advances in deep learning and natural language processing, we design and evaluate a novel algorithm, BINDNN, that is resilient to variations in compiler, compiler optimization level, and architecture. We show that BINDNN is effective both in isolation or in conjunction with existing approaches. In the case of the latter, we boost performance by 109% when combining BINDNN with BinDiff to compare functions across architectures. This result—an improvement of 32% for BINDNN and 185% for BinDiff—demonstrates the utility of employing multiple orthogonal approaches to function matching.