Analyzing the Stability, Learnability, and Precision of Symbolic Structures by Neural Networks

Restricted (Penn State Only)
- Author:
- Dave, Neisarg
- Graduate Program:
- Informatics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- August 19, 2024
- Committee Members:
- Daniel Kifer, Outside Unit & Field Member
Kenneth Huang, Major Field Member
Dongwon Lee, Professor in Charge/Director of Graduate Studies
Sarah Rajtmajer, Co-Chair & Dissertation Advisor
C Lee Giles, Co-Chair & Dissertation Advisor
Ankur Arjun Mali, Special Member - Keywords:
- Symbolic Tasks
Recurrent Neural Networks
Large Language Models
Stability
Learnability
Precision - Abstract:
- This dissertation explores the capability of neural networks, specifically Recurrent Neural Networks (RNNs), Transformers, and Large Language Models (LLMs), to learn and generalize over symbolic structures. Our research focuses on two types of symbolic structures: mathematical expressions and strings of characters. Initial observations reveal that while RNNs can achieve structural similarity for mathematical expressions, they struggle with evaluation. This necessitates a more fundamental analysis of symbolic learning in neural networks. To address this, we adopt Chomsky’s Hierarchy to establish the complexity of symbolic tasks and the automata required to solve them. This framework provides us with reference machines that neural networks need to emulate, guiding our investigation into the capabilities and limitations of these models in handling symbolic reasoning. We conducted large-scale experiments using Tomita and Dyck grammars to empirically analyze the stability of states learned by first and second-order RNNs. Our results show that stable states can be extracted even from partially trained networks, a finding significant for both the stability and the verifiability of AI systems. Additionally, we compared rule extraction methods and found that quantization-based methods outperform the equivalence query method. Our study further indicates that the perceived ability of RNNs and Transformers to handle non-regular grammars is mainly due to poor data sampling strategies. We demonstrate that RNNs are designed to reach stable states, rendering their hidden states incapable of emulating any memory component. Similarly, Transformers also fail to learn non-regular grammars. Furthermore, we show that current LLMs fail to generalize effectively over non-regular grammars and perform poorly on symbolic tasks such as addition, multiplication, and counting. By comparing the estimated parameters required by LLMs to store information necessary for solving symbolic problems with their actual performance, we highlight the inefficiencies in current models. Our findings establish RNNs, particularly second-order RNNs, as effective stable state estimators with robust quantization based rule extraction methodologies. Additionally, our research highlights the limitations of both RNNs and Large Language Models (LLMs) in learning non-regular grammars and symbolic reasoning tasks, underscoring the necessity for improved architectures that incorporate stable memory components and better learning algorithms.