A Reconfigurable Accelerator For Neuromorphic Object Recognition

Open Access
Sabarad, Jagdish Shivaji
Graduate Program:
Computer Science and Engineering
Master of Science
Document Type:
Master Thesis
Date of Defense:
April 15, 2016
Committee Members:
  • Vijaykrishnan Narayanan, Thesis Advisor
  • Hardware Accelerator
  • HMAX
  • FPGA
  • Machine Vision
  • Computer Vision
  • Hardware Architecture
  • High Performance Computing
  • Bio-Vision
A significant challenge in creating machines with artificial vision is designing systems which can process visual information as efficiently as the human brain. Recent advances in neuroscience have enabled researchers to develop computational models of auditory, visual and learning perceptions in the human brain. Among these models, the two widely accepted algorithms that model the process of attention and recognition in the mammalian visual pathway are - the Saliency based model for visual attention and HMAX model for object recognition. One of the major burdens of these biologically plausible models is their massive computational demands. Real time implemen- tation of these biologically inspired vision algorithms, while challenging, can have a diverse and profound impact in applications like autonomous vehicle navigation, surveillance, robotics and face, text and gesture recognition. To mimic true biological systems, implementations of these algorithms must not only meet real-time performance goals, but also stringent power budgets and small form-factors. Previous attempts to parallelize the HMAX model on multi-core processors have been unable to provide real-time performance due to limited parallelism and high computational complexity. Researchers have leveraged graphics processors due to their ease of programmability and high parallelism. However, their excessive power consumption hinders deployment in embedded or low-power systems. The focus of this work is on the design and architecture of a reconfigurable hardware acceler- ator for the time consuming S2-C2 stage of the HMAX model. The accelerator leverages spatial parallelism, dedicated wide data buses with on-chip memories to provide an energy efficient solution to enable adoption into embedded systems. This work presents a systolic array-based architecture which includes a run-time reconfigurable convolution engine which can perform mul- tiple variable-sized convolutions in parallel. An automation flow is described for this accelerator which can generate optimal hardware configurations for a given algorithmic specification and also perform run-time configuration and execution seamlessly. Experimental results on Virtex-6 FPGA platforms show 5X to 11X speedups and 14X to 33X higher performance-per-Watt over a CNS-based implementation on a Tesla GPU.