Regularization Methods In Functional Data Analysis

Open Access
- Author:
- Mirshani, Ardalan
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- August 05, 2019
- Committee Members:
- Matthew Logan Reimherr, Dissertation Advisor/Co-Advisor
Matthew Logan Reimherr, Committee Chair/Co-Chair
Runze Li, Committee Member
Francesca Chiaromonte, Committee Member
Kateryna Dmytrivna Makova, Outside Member
Ephraim Mont Hanks, Program Head/Chair - Keywords:
- Functional Data
Differential Privacy
Hilbert Space
Variable Selection
Reproducing Kernel Hilbert Space
Oracle Property
Elastic Net
Smooth Estimate
Density Estimation - Abstract:
- New studies, surveys, and technologies are resulting in ever richer and more informative data sets. With the development of modern technology, Functional data analysis (FDA) has become one of the most active areas in statistics. Functional data analysis, as well as other branches of statistics and machine learning, often deal with function valued parameters. Functional data and/or functional parameters may contain unexpectedly large amounts of personally identifying information, and thus developing a privacy framework for these areas is critical in the era of big data. Here, as a first problem, we consider the Statistical privacy (or statistical disclosure control) and our goal is to minimize the potential for identification of individual records or sensitive characteristics while at the same time ensuring that the released information provides accurate and valid statistical inference. In the second part, we study the problem of extracting information from large and sophisticated functional data sets. We aim to select significant predictors and produce smooth estimates in a high-dimensional function-on-scalar linear model with sub-Gaussian errors. Chapter 1 covers the main concepts and backgrounds in FDA which are going to be used in the next chapters. Chapter 2 focuses on Differential Privacy, DP, which has emerged as a mathematically rigorous definition of risk and more broadly as a framework for releasing privacy enhanced versions of a statistical summary. In this chapter, we develop an extensive theory for achieving DP with functional data or function valued parameters more generally. Our theoretical framework is based on densities over function spaces, which is of independent interest to FDA researchers, as densities have proven to be challenging to define and utilize for FDA models. For statistical disclosure control, we demonstrate how even small amounts of over smoothing or regularizing can produce releases with substantially improved utility. We carry out extensive simulations to examine the utility of privacy enhanced releases and consider applications to Diffusion Tensor imaging and high-resolution 3D facial imaging. Chapter 3 presents a new methodology, called AFSSEN, to simultaneously select significant predictors and produce smooth estimates in a high-dimensional function-on-scalar linear model with sub-Gaussian errors. Outcomes are assumed to lie in a general real separable Hilbert space, H, while parameters lie in a subspace known as a Cameron Martin space, K, which are closely related to Reproducing Kernel Hilbert Spaces, so that parameter estimates inherit particular properties, such as smoothness or periodicity, without enforcing such properties on the data. We propose a regularization method in the style of an adaptive Elastic Net penalty that involves mixing two types of functional norms, providing a fine-tune control of both the smoothing and variable selection in the estimated model. The asymptotic theory is provided in the form of a functional oracle property, and the chapter concludes with a simulation study demonstrating the advantage of using AFSSEN over existing methods in terms of prediction error and variable selection.