Aiding the User Input to Virtual Training Environments: Virtual Role Players with Speech and Gesture Recognition

Open Access
- Author:
- Stark, Robert Floyd
- Graduate Program:
- Information Sciences and Technology
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- April 05, 2010
- Committee Members:
- John Yen, Thesis Advisor/Co-Advisor
John Yen, Thesis Advisor/Co-Advisor
Frank Edward Ritter, Thesis Advisor/Co-Advisor - Keywords:
- human factors
computer vision
gesture recognition
Java - Abstract:
- The purpose of this thesis is to address the fact that users’ input to training systems in virtual environments is not suited to their natural skills and abilities. These skills and abilities include speaking and gesturing with their bodies. This mismatch may have negative effects on their usage of the virtual environment. One assumption guiding this thesis is that it would increase immersion to allow the users to interact with the system in the same way they interact with real people. The second assumption is that multimodal input can increase users’ performance in the training scenario, especially regarding habitual and physical skills. While people use the mouse and keyboard inputs to computers all of the time, the third assumption is that natural speech and gestures would make military virtual training systems even easier to get acquainted with and use. The fourth assumption is that more natural systems may increase the amount of training that trainees can transfer to the real world. To show the potential of the approach of multimodal input, two prototype systems were created. The design and evaluation of the first prototype are described. It was intended to show the potential of gesture recognition and multimodal fusion under both ideal theoretical circumstances and controlled, but more realistic, ones. The primary problem with the first prototype was found to be the limitations with the hand recognition and tracking system. The design of the second prototype is then described. This prototype is a fully-operational virtual checkpoint training system with multimodal input and was created based on the hand tracking and other insights from the first prototype. Then the results of a demonstration at a conference are explained, including environmental factors on its usage. The thesis ends with a discussion of the insights from the last prototype and some future work, including implementation ideas, empirical studies, and general guidelines for multimodal system design.