Neural Program Synthesis for Compiler Fuzzing
Open Access
- Author:
- Liu, Xiao
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- November 15, 2019
- Committee Members:
- Dinghao Wu, Dissertation Advisor/Co-Advisor
Dinghao Wu, Committee Chair/Co-Chair
Mary Beth Rosson, Committee Member
Clyde Lee Giles, Committee Member
Danfeng Zhang, Outside Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- compiler testing
fuzz testing
neural program synthesis - Abstract:
- Compilers are among the most fundamental programming tools for building software However, production compilers remain buggy. GNU compiler collection (GCC), a a long-lasting software released in 1987, provided as a standard compiler for most Unix-like operating systems, has caught over 3,410 bugs from the day they were created. Fuzzing is often leveraged for stress testing purposes with newly-generated or mutated inputs to find new security vulnerabilities. In our study, we propose a grammar-based compiler fuzzing framework called DeepFuzz that continuously synthesizes well-formed C programs to trigger internal compiler errors or “bugs” as they are commonly called. In this framework, we are interested in how to apply generative deep neural networks (DNNs), such as the sequence-to-sequence model to synthesize well-formed C programs based on training through syntax-correct programs. We are also interested in how to synthesize programs using a novel form o reinforcement learning, where the model becomes its teacher to start with a random neural network with no training data and trains itself through self-play. We will us a synthesized set of new C programs to fuzz off-the-shelf C compilers, e.g., GC and Clang/LLVM. This thesis describes our analysis of neural program synthesis fo compiler fuzzing in three steps First, we conduct a first-step study by implementing DeepFuzz that deploy a sequence-to-sequence model to synthesize C programs. We have performed detailed case study on analyzing the pass rate of generating well-formed program and achieving the goal of fuzz testing, which requires a certain degree of variation In general, DeepFuzz generated 82.63% syntax valid programs and improved the testing efficacy with regards to line, function, and branch coverage. It identified previously unknown bugs, and 8 of them were confirmed by the GCC developers Second, for the cases when we could not get any or enough data to train model for representing the grammar, we build a reinforcement learning framework fo program synthesis and apply it to the BF programming language. With no training data set required, the model is initialized with random weights at the very beginning and it evolves with environment rewards provided by the target compiler being tested. During the performance of the learning iterations, the neural network mode gradually learns how to construct valid and diverse programs to improve testing efficacies under four different reward functions that we defined. We implemented the proposed method into a prototyping tool called AlphaProg. We performed a in-depth diversity analysis of the generated programs that explains the improve testing coverage of a target compiler being tested. We reported two important bug for this production compiler and they were confirmed and addressed by the project owner Third, we extend the framework to synthesize C programs, which is more challenging in terms of state space. We propose an automatic code mutation framework called FuzzBoost that is based on deep reinforcement learning. By adopting testing coverage information collected at runtime as the reward, the fuzzing agent learns t fuzz a seed program that achieves an overall goal of testing coverage improvement We implemented this new approach, and preliminary evidence showed that reinforcement fuzzing can outperform baseline random fuzzing on production compilers. I also showed that a pre-trained model can boost the fuzzing process for seed program with similar patterns This thesis solves the problem of using the DNN to synthesize new programs fo compiler fuzz testing. Specifically, the proposed framework is able to handle compiler of different programming languages. Accordingly, DeepFuzz and FuzzBoost are designed for the C compiler testing, and AlphaProg is designed for the B language compiler testing. Additionally, the generative neural networks for program synthesis can be trained with or without training data. Moreover, the model i DeepFuzz is trained based on training data but AlphaProg and FuzzBoost rely on reinforcement learning, which requires no training samples. We built prototyping tools for each study and applied them for practical use. Their effectiveness was evaluated, and they caught real bugs in off-the-shelf compilers.