SELECTIVE LEARNING USING RANDOMLY REDUCED DATASETS

Restricted
- Author:
- Verma, Ankur
- Graduate Program:
- Industrial Engineering
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- April 26, 2024
- Committee Members:
- Steven Landry, Program Head/Chair
Soundar Kumara, Chair & Dissertation Advisor
Saurabh Basu, Major Field Member
Sy-Miin Chow, Outside Unit & Field Member
Necdet Aybat, Major Field Member - Keywords:
- sensors
machine learning
neural networks
edge computing - Abstract:
- As we increasingly bring intelligence to the physical world across space, aerial, terrestrial, and underwater applications, we are generating copious amounts of multi-modal sensor data, which will surpass 73 Trillion GB by 2025. Embedded intelligence will be a key feature of next generation computing platforms. These platforms should be able to deploy embedded AI at scale, from humanoid robots to Mars rovers. Bringing intelligence to the physical world, however, means that we will often need to operate in energy, latency, bandwidth, and compute constrained environments. This demands novel scientific computing techniques to be data and compute efficient, and be usable for near real-time inference on SWaP-C (low size, weight, power, and cost) devices. In this dissertation, we solve the problem of efficiently analyzing Terabytes of sensor data generated from different applications ranging from satellites to unmanned underwater vehicles. The amount of data collected for sensing tasks in scientific computing is based on the Shannon-Nyquist sampling theorem proposed in the 1940s. Skyrocketing data infrastructure costs and time to maintain and compute on all this data are increasingly common. To address this, we investigate and develop several undersampled learning approaches, leveraging the fact that there is a lot of redundancy in real-world sensor data. We start by exploring edge-cloud hybrid computing and signal processing-based transforms to develop a foundational understanding of sparsity and representation learning. We then combine sparse approximation techniques with machine learning, and introduce a selective learning approach, where the amount of data collected is problem dependent. We develop novel shift-invariant and spectrally stable neural networks to solve real-time sensing problems formulated as classification or regression. The developed methods are evaluated on different types of data including periodic and non-periodic high frequency time series data pertinent to spectral analysis, non-periodic time series data, Human Activity Recognition (HAR) data, and unlabeled industrial energy data. We demonstrate that (i) less data can be collected while preserving information, and (ii) test accuracy improves with data augmentation (size of training data), rather than by collecting more than a certain fraction of raw data, unlike information theoretic approaches. While sampling at Nyquist rates, every data point does not have to be resolved at Nyquist and the network learns the amount of data to be collected. Our techniques and results have significant implications (orders of magnitude reduction) on the amount of data collected, computation, power, time, bandwidth, and latency required for several digital transformation applications ranging from low earth orbit economy to unmanned underwater vehicles.
Tools
-
No files available due to restrictions.