using attention to enhance efficiency in video-based computer systems

Open Access
Author:
Xiao, Yang
Graduate Program:
Computer Science and Engineering
Degree:
Doctor of Philosophy
Document Type:
Dissertation
Date of Defense:
August 27, 2014
Committee Members:
  • Vijaykrishnan Narayanan, Dissertation Advisor
  • Vijaykrishnan Narayanan, Committee Chair
  • Yuan Xie, Committee Member
  • Mahmut Taylan Kandemir, Committee Member
  • Bradley Paul Wyble, Committee Member
  • Jack Sampson, Committee Member
  • Lee David Coraor, Committee Member
  • Kevin M Irick, Special Member
Keywords:
  • FPGA; neuromorphic algorithm; video-based; embedded system
Abstract:
Embedded vision systems that analyze complex scenes can bring many benefits to people’s daily lives, ranging from security surveillance, to medical help for visually impaired people, to traffic management systems. The key aspects of these systems are the processing algorithms that analyze image sequences and videos to extract useful information from a noisy background. Traditional image/video processing technologies typically apply the sliding window method to each frame for information processing, but even the potentially useful information occupies a very small portion of the input image. To achieve real-time performance, the processing units are duplicated massively to filter every possible window location of the high-resolution input image in parallel. Such a method not only consumes lots of power, but also limits the algorithm implementation on embedded systems due to the constrained computational resources and limited power supply. Alternatively, many of these algorithms are deployed on high performance servers/desktops with multiple graphics processing units (GPUs) rather than on power efficient portable devices. As opposed to high performance machine vision platforms, human brains can process the vision tasks at much lower power consumption (approximately 20W) with limited computing resources (neurons in the brain) when exposed to a complex scene. In fact, the literature has shown that people do not perceive every object in a scene equally, but instead they prioritize. Objects with ‘outstanding’ features will be chosen from their surroundings and passed on to further processing stages such as feature extraction or recognition. This biological pre-processing has been identified as the attention stage and has been well-studied in the field of visual neuroscience in the past few decades. Several computation models have been proposed to show how attention within the brain works in a hierarchical way. With the assistance of the attention stage, only the attractive regions of the input image require further processing. This results in a much lower demand of power consumption and computing resources, which makes it possible to build an embedded machine vision system that can understand complex scenes in real time. In this dissertation, a state-of-the-art bio-inspired attention algorithm called ‘Saliency [17]’ is studied. Its field programmable gate arrays (FPGA) prototype has been implemented to meet the real-time processing requirement. In addition, two extensions of the original attention system are proposed and iii evaluated. The first extension is a video-based attention system which integrates two more computing channels, ‘flicker’ and ‘motion,’ for final attention map computing. Experiment results show that the new extended system can achieve 60% power savings on image test cases and 50% savings on video test cases when it is used as an LCD power management. The second extension considers the task influence in the vision mechanism by interpreting task-specific features as bias weights in the attention computing. Accuracy of locating a task item from noisy backgrounds has been evaluated. On average, a 12.7% improvement in accuracy can be achieved when compared to the original system. Furthermore, a comprehensive vision system composed by Saliency [17] and HMAX [54] is proposed and implemented. Its off-chip bandwidth characteristic is analyzed for operating under certain bandwidth caps. As a result, a memory bandwidth aware feedback system is developed to dynamically partition available bandwidth among a set of accelerators at a small expense of the recognition accuracy. Besides the power and performance evaluation, comprehensive user experience tests are conducted to ensure that no obvious image or video quality distortion are introduced by the proposed systems.