TOWARDS UNDERSTANDING PEOPLE IN VIDEOS
Open Access
- Author:
- Raja, Anand
- Graduate Program:
- Electrical Engineering
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- February 25, 2011
- Committee Members:
- Dr Robert Collins, Thesis Advisor/Co-Advisor
Robert T Collins, Thesis Advisor/Co-Advisor
Kenneth Jenkins, Thesis Advisor/Co-Advisor
Yanxi Liu, Thesis Advisor/Co-Advisor - Keywords:
- visual storyboard
action recognition
tracking
people detection
computer vision
character matching - Abstract:
- The last few years have seen an explosion of online video content. A vast majority of these videos contain people, and understanding where people are and what they are doing from video is therefore important. This thesis presents an effort in this direction. A framework is developed for summarizing the story of a video containing people solely using visual information. Our notion of the 'story' is defined as the identification of people and the recognition of their actions through the video. Videos are divided into segments, each containing a separate shot. People are tracked through each shot based on appearance similarity and temporal continuity. As people in a shot are being tracked, appearance models are built for each of them as person-specific classifiers, trained on-the-fly. Recurring people are then matched across shots by data association on confidence scores of the person-specific classifiers. This process is completely automated and requires only a single pass of the video. Next, the visual recognition of natural human actions performed by people in the sequence is undertaken. Actions such as sitting on a chair, shaking hands or answering the phone are recognized by support vector machine classifiers that aggregate sparsely computed low level visual cues obtained from the tracked people. Features are organized in a bag-of-features representation and the classifiers are trained on annotated action clips from the Hollywood Human Actions Dataset. The efficacy of both the person tracking and action recognition is evaluated on challenging videos containing multiple shots taken at different camera angles with appreciable variation in lighting conditions, length of action as well as scale, appearance and pose of the persons involved. A total of 48 out of 53 characters are identified over 81 shots with 41 correct and 8 incorrect character matches with 83% tracking coverage. Actions are recognized with an average precision of 29.98%.