A Representation-based Approach to Connect Regular Grammar and Deep Learning
Open Access
- Author:
- Zhang, Kaixuan
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- June 01, 2021
- Committee Members:
- C Lee Giles, Chair & Dissertation Advisor
Shomir Wilson, Major Field Member
Daniel Kifer, Outside Unit & Field Member
Kenneth Huang, Major Field Member
Mary Beth Rosson, Program Head/Chair - Keywords:
- Deep Leaning
Regular Grammar
Adversarial Machine Learning
Model Extraction
Topological Analysis - Abstract:
- Formal language theory has brought amazing breakthroughs in many traditional areas, including control systems, compiler design, and model verification, and continues promoting these research directions. As recent years have witnessed that deep learning research brings the long-buried power of neural networks to the surface and has brought amazing breakthroughs, it is crucial to revisit formal language theory from a new perspective. Specifically, investigation of the theoretical foundation, rather than a practical application of the connecting point obviously warrants attention. On the other hand, as the spread of deep neural networks (DNN) continues to reach multifarious branches of research, it has been found that the mystery of these powerful models is equally impressive as their capability in learning tasks. Recent work has demonstrated the vulnerability of DNN classifiers constructed for many different learning tasks, which opens the discussion of adversarial machine learning and explainable artificial intelligence. Therefore, it is imperative to apply formal language to facilitate the development of deep learning research in terms of these issues. This dissertation focused on connections and interactions between formal language theory and deep learning research. First, we investigate fine-grained characteristics of regular grammar and deterministic finite automata (DFA) from a deep learning perspective. Then we aim to comprehend some of the mysteries of the vulnerability and explainability of DNN, design generic frameworks and deployable algorithms for verification. Following the logic, the dissertation contains the following three sections: regular grammar classification and learning with recurrent neural networks, topological analysis of sample influence and category-based analysis of grammar transfer, adversarial models for deterministic finite automata and verification of recurrent neural network. In the first thread, we focus on how to differentiate regular grammar in terms of learning tasks. We introduce an entropy metric based on the concentric ring representation and categorized regular grammar into three disjoint subclasses. In addition, we provided classification theorems for different representations of regular grammar. Our second thread of study concentrates on the internal structure of regular grammar and applies a topological perspective to investigate the model-free sample influence. We develop a Shapley homology framework and propose two algorithms based on different Betti numbers. Furthermore, we established a category-based framework to probe into the mechanism of grammar transfer learning. In the third thread, we focus on the adversarial robustness of the recurrent neural network (RNN). We generalize the adversarial sample framework to an adversarial model to study the fine-grained characteristics of DFA, including transition importance and critical patterns. Also, we propose a generic framework for verification and develop an algorithm under our framework and conduct a case study to evaluate the adversarial robustness of different RNNs on a set of regular grammars. In summary, this research works as a bridge between regular grammar and machine learning to provide an open discussion on the topics and provide some guidance in practice, and we believe this is an auspicious beginning.