SAVING COMPUTATIONS BY EARLY INFERENCE TERMINATION

Restricted (Penn State Only)
Author:
Parija, Tulika
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
November 17, 2017
Committee Members:
  • Chitaranjan Das, Thesis Advisor
  • Vijaykrishnan Narayanan, Committee Member
  • John Morgan Sampson, Committee Member
Keywords:
  • Deep Neural Network
  • Convolutional Neural Network
Abstract:
Machine learning algorithms have seen a revival and a fast growth in popularity due to the recent increase in training data and processing capability of computers. They are being used in a number of different tasks such as image classification, object detection, speech recognition among others. Deep neural networks (DNNs) can be trained to achieve high inference accuracy, and deeper networks correspond to better accuracy of the overall network output, but also incur increasing costs in total computation. However, most networks are flat n-way classifiers, which expend equal effort for all classes in a dataset. This thesis proposes a framework to identify subsets of classes that can be classified with high accuracy using only features extracted from earlier network layers in order to reduce the average computational cost of inference. We apply our framework on the MNIST and CIFAR-10 datasets and demonstrate how our approach makes these networks more amenable for deployment on compute limited endpoint devices. We show up to 52% computation savings (42% latency reduction) for CIFAR-10 with accuracy losses of no more than 1.8%.