Computational Advantages in High Dimensions
Open Access
- Author:
- White, Stephen
- Graduate Program:
- Mathematics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 06, 2023
- Committee Members:
- Pierre-Emmanuel Jabin, Program Head/Chair
Anna Mazzucato, Major Field Member
Pierre-Emmanuel Jabin, Major Field Member
Alexei Novikov, Chair & Dissertation Advisor
Bharath Kumar Sriperumbudur, Outside Unit, Field & Minor Member - Keywords:
- Probability
Sparse Recovery
Restricted Isometry Property
Phase Retrieval
Dictionary Learning
Metagenomics
Inverse Problems
High-dimensional Statistics - Abstract:
- Although computational tasks in high dimensions are often characterized as suffering from a ``curse of dimensionality,'' many random objects in high dimensions also exhibit significant concentration of measure, a phenomenon which has been characterized as a countervailing ``blessing of dimensionality.'' With high-dimensional data more prevalent than ever, understanding both how to avoid the pitfalls and how to leverage the advantages of high-dimensional randomness are now essential skills for any practitioner of data science, statistics, or applied probability. In this dissertation, we present three examples whereby the dimensional structure of a problem can be exploited to improve upon existing results or solve otherwise intractable problems. In Chapter 2, we study the sparse phase retrieval problem of recovering a sparse signal from only the magnitude of its Fourier transform. By exploiting the additional freedom provided in multiple dimensions, we show that in dimension greater than one the sparse phase retrieval problem admits solutions which are several orders of magnitude faster than comparable one-dimensional approaches. In Chapter 3, we consider the problem of sparse dictionary learning, a problem in machine learning in which we seek to recover a sparsely-used dictionary matrix $\bfD$ from many samples of the form $\bfy_i = \bfD\bfx_i$. We present a polynomial-time algorithm which recovers overcomplete matrices satisfying the restricted isometry property even when sparsity is nearly linear in the dimension of the samples $\bfy_i$. We prove recovery guarantees which are significantly stronger than previously known alternatives for the overcomplete, linear-sparsity regime. Finally, in Chapter 4 we present a solution to the problem of organism detection in metagenomics. By leveraging the high dimensional structure of genomic data and its interaction with existing dimension reduction strategies, we are able to develop a surprisingly simple algorithm that significantly outperforms existing approaches. Numerical simulations confirm all theoretical results.