Context-N-Grab: Sentence Based Fine-Grained Affordance Grounding
Open Access
Author:
Tehri, Samarth
Graduate Program:
Computer Science and Engineering
Degree:
Master of Science
Document Type:
Master Thesis
Date of Defense:
June 20, 2024
Committee Members:
Vijaykrishnan Narayanan, Thesis Advisor/Co-Advisor John Morgan Sampson, Committee Member Chitaranjan Das, Program Head/Chair
Keywords:
computer vision affordance grounding large language model cv llm visual impairment robotics computer science engineering computer science computer engineering application system context Context-N-Grab
Abstract:
This thesis presents the development of a sentence-based fine-grained affordance grounding system, utilizing large language models (LLMs) for object-action retrieval, and object detection, to aid visually impaired users in navigating their environment. The proposed system integrates LLMs to interpret natural phrases, utilize YOLO for real-time object detection, and uses LOCATE framework for affordance grounding, linking detected objects with possible interactions. This combined approach aims to provide actionable guidance, enhance the safety and autonomy of visually impaired individuals. The system also demonstrates potential applications in lightweight robotic navigation. Experimental results show the system’s effectiveness in accurately retrieving relevant objects and actions, detecting objects in various environments, and providing precise affordance grounding. Github: https://github.com/samarthtehri/Context-N-Grab