Using AI Technologies To Solve Software Security Challenges

Open Access
- Author:
- Wang, Haizhou
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 23, 2024
- Committee Members:
- Dongwon Lee, Professor in Charge/Director of Graduate Studies
Peng Liu, Chair & Dissertation Advisor
Minghui Zhu, Outside Unit & Field Member
Sharon Huang, Major Field Member
Suhang Wang, Major Field Member - Keywords:
- Large Language Model
Artificial Intelligence
Software Security
Reverse Engineering
Deep Learning
Information Security - Abstract:
- Artificial intelligence (AI) has been a trending topic in recent years due to the technical breakthroughs in computer vision (CV) and natural language processing (NLP) where many tasks are nearly impossible to be solved by traditional engineering and algorithmic efforts, due their difficulties in generalizing patterns. In the field of software security, such tasks are not uncommon. For example, many tasks require high-level semantic program comprehension, such as code clone detection and logic bug detection. These tasks are similar to NLP tasks to a certain extent, so that existing methods such as static or dynamic program analysis are very ineffective in understanding the programs in terms of the business logic. For another example, information and data available in many software security tasks such as reverse engineering are not human-friendly, which may lead to generating explicit rules or heuristics extremely challenging. Therefore, it is crucial to explore what tasks in software security could be solved or solved better using modern AI technologies. In this dissertation, we aim to adopt AI technologies in three software security sub-fields: exploitation defense, vulnerability analysis and reverse engineering. In particular, we have researched three problems: 1) defending return-oriented-programming (ROP) attacks using deep learning, 2) finding user privilege related (UPR) variables using LLM workflow, and 3) pinpoint the implementation of anti dynamic analysis techniques in the binary program using LLM. The common characteristics of all three problems are the fuzziness of input and the difficulty to generalize common patterns, which are the key motivations of us adopting AI technologies and data driven methods.