Internet of Multimodality: Problems in Security, Healthcare and AR/VR

Open Access
- Author:
- Zhang, Shijia
- Graduate Program:
- Computer Science and Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- September 20, 2024
- Committee Members:
- Chitaranjan Das, Program Head/Chair
Mahmut Kandemir, Major Field Member
Rui Zhang, Major Field Member
Saeed Abdullah, Outside Unit & Field Member
Mahanth Gowda, Chair & Dissertation Advisor - Keywords:
- IoT Security
Wearable Sensing
Mobile Computing
AR/VR
Healthcare - Abstract:
- This dissertation delves into advanced sensing technologies and their applications across three distinct domains: privacy risks, health monitoring, and augmented/virtual reality (AR/VR) technologies. The first section introduces iSpyU, a system that exploits zero-permission motion sensors like accelerometers and gyroscopes to recognize speech from phone-based conference calls (e.g., Skype, Zoom). Despite technical challenges such as the low sampling rate of motion sensors compared to microphones and the lack of extensive training datasets, iSpyU demonstrates that speech content can be discerned with accuracies between 53.3% and 59.9% at the word level, and 70.0% to 74.8% at the character level. This poses significant privacy concerns, as such capabilities could potentially allow malicious applications to eavesdrop on sensitive information without user consent. The second part explores the use of earphone sensors for fluid intake estimation, a crucial metric in managing hydration-related health issues such as dehydration and kidney stones. Employing microphones that pick up body vibrations and skin contact, this approach achieves strong signal extraction immune to environmental noise. The system, enhanced with robust machine learning models incorporating data augmentation and semi-supervised learning techniques, achieves a per-swallow volume estimation accuracy with approximately 19.17% error. These findings could assist in providing vital diagnostic information to healthcare providers. The third section of this project introduces EARFace, a system utilizing sensor-embedded smart earphones for reconstructing 3D facial motion. This technology opens up vast possibilities in recognizing facial expressions and monitoring emotional well-being, among other applications. It particularly benefits areas like affective computing, AR/VR, and animation rendering. The final section discusses the utilization of AR/VR headsets equipped with multiple sensors and cameras to enhance user experience and interaction in real-world settings. By gathering multimodal data, including motion sensor data and camera feeds, a robust cross-modality transformer model predicts the user’s full-body skeleton and reconstructs 3D mesh with state-of-the-art accuracy, significantly enriching user interaction and immersion in virtual environments.