Large-Scale Object Recognition for Embedded Wearable Platforms

Open Access
Advani, Siddharth Kishin
Graduate Program:
Computer Science and Engineering
Doctor of Philosophy
Document Type:
Date of Defense:
June 29, 2016
Committee Members:
  • Vijaykrishnan Narayanan, Dissertation Advisor
  • Jack Sampson, Committee Chair
  • Lee Giles, Committee Member
  • Kevin Irick, Committee Member
  • Mary Beth Rosson, Outside Member
  • Embedded vision systems
  • Real-time systems
Visual object recognition has been an active thrust of research in the vast field of computer vision and neuroscience, both symbiotically closing the gap between man and machine. We, as humans, use years of evolution and a tightly integrated top-down and bottom-up visual system to be able to recognize a large set of objects with a very high degree of precision. When poor lighting, occlusion and varying pose create confusion or when new unidentifiable objects come into the picture, we use context to make an educated guess. For example, consider a foreign tourist looking for a restaurant in a busy city. Even though the tourist may not be able to understand the language of that place, by recognizing known objects such as 'plate', 'cup', 'chair', 'bread', he or she can come to a very quick conclusion about it being a restaurant. As machines have evolved in their learning capabilities, technology has brought them closer to us allowing finer levels of interactions; initially from being hand-operated to then being hand-held and now being worn. However, current systems being deployed for vision-specific tasks are still either too power-hungry, require huge storage space or computationally take too long for them to be useful for any real-world scenario. Thus, as computing takes a new leap into this exciting world of 'Wearables', the need for smart cameras capable of supporting, engaging and enhancing human capabilities has never been felt more acutely. This thesis tackles one of the critical tasks of large-scale visual object recognition for embedded wearable platforms. A scalable architecture for visual object detection that is fast, accurate, light-weight and power efficient is first proposed. A Context-Aware Scalable Pipeline for Efficient Recognition - CASPER - which uses context in conjunction with a hierarchical visual recognition pipeline targeted for retail is then discussed. We are able to achieve high recognition rates across 62 object classes that are diverse in shape, size and color. The utility of using context in this visual pipeline is showcased and computing effort is reduced significantly without impacting accuracy at all. This allows for higher system throughput with limited area resources - an important factor in enhancing the capabilities of the next generation of wearable devices.