SAVING COMPUTATIONS BY EARLY INFERENCE TERMINATION

Open Access
- Author:
- Parija, Tulika
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- November 17, 2017
- Committee Members:
- Chitaranjan Das, Thesis Advisor/Co-Advisor
Vijaykrishnan Narayanan, Committee Member
John Morgan Sampson, Committee Member - Keywords:
- Deep Neural Network
Convolutional Neural Network - Abstract:
- Machine learning algorithms have seen a revival and a fast growth in popularity due to the recent increase in training data and processing capability of computers. They are being used in a number of different tasks such as image classification, object detection, speech recognition among others. Deep neural networks (DNNs) can be trained to achieve high inference accuracy, and deeper networks correspond to better accuracy of the overall network output, but also incur increasing costs in total computation. However, most networks are flat n-way classifiers, which expend equal effort for all classes in a dataset. This thesis proposes a framework to identify subsets of classes that can be classified with high accuracy using only features extracted from earlier network layers in order to reduce the average computational cost of inference. We apply our framework on the MNIST and CIFAR-10 datasets and demonstrate how our approach makes these networks more amenable for deployment on compute limited endpoint devices. We show up to 52% computation savings (42% latency reduction) for CIFAR-10 with accuracy losses of no more than 1.8%.