Performance and Parsimony in Training Deep Neural Networks

Open Access
- Author:
- Aguasvivas Manzano, Sarah
- Graduate Program:
- Aerospace Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- July 17, 2017
- Committee Members:
- Lyle N. Long, Thesis Advisor/Co-Advisor
- Keywords:
- neural networks
model parsimony
MLP
Simulated Annealing - Abstract:
- It is known that machine learning owes multiple achievements to deep learning. However, do we know for certain when we need deeper models? Did deep learning improve artificial intelligence through the use of techniques other than what the multilayer perceptron (MLP) requires? This work is a careful effort to explore the definition of deep learning by formulating the following question: "When does deep learning help and when does it hurt?." In the search for an Occam's Razor-inspired analysis, this work performs multiple experiments on the MLP in order to illustrate when a shallow network is sufficient and when it is not and what metrics can be extracted from the raw data set in order to estimate the needed model complexity before training. A secondary purpose for this thesis is to attempt to overcome the limitations of the back propagation (BP) algorithm by using a derivative-free technique called Simulated Annealing (SA) in order to test the claims that this existing heuristic method is capable of outperforming BP in performance and plasticity. Regarding the model parsimony question, the results show that, among the metrics tested, the percentage variance in the first principal component (PCA-1%) influenced the classification performance of the MLP in the following way: The higher PCA-1% was, the shallower the MLP needed to be in order to avoid over-complicating the learning model for non-image data sets. On the other hand, Simulated Annealing achieved performances similar to an average back propagation performance for a MSE tolerance of about 4 times the tolerances needed in back propagation. However, the number of required iterations for an exploratory search like this were determined to be unpredictable and very large for lower tolerances. Moreover, the performance shown by SA was not as consistent as that of BP.