Data Registration and Feature Selection in Functional Regression: some methodological and computational developments

Open Access
- Author:
- Boschi, Tobia
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- February 17, 2022
- Committee Members:
- Marzia A Cremona, Special Member
Murali Haran, Major Field Member
Paul Medvedev, Outside Unit & Field Member
Francesca Chiaromonte, Chair & Dissertation Advisor
Ephraim Mont Hanks, Professor in Charge/Director of Graduate Studies
Matthew Reimherr, Dissertation Advisor - Keywords:
- Functional Linear Models
Feature Selection
Functional Registration - Abstract:
- Functional linear models leverage high-dimensional and complex data and present many open and fascinating theoretical and computational challenges. Advanced optimization techniques which involve smoothing and considering projective subspaces are essential to obtain valid estimates and reduce computational costs. This thesis focuses on two main aspects of functional regression, functional registration and feature selection, and contains three main research projects. In the first one, we propose a new low-dimensional registration procedure that exploits the relationship between response and predictor in a function-on-function regression. In this context, Functional Covariance Components (FCC) provide a flexible and powerful tool to represent the data in a low-dimensional space, capturing the most meaningful modes of dependency between the two sets of curves. In the second project, we first develop a new, highly-efficient algorithm to solve Group Elastic Net, which exploits the sparsity structure of the Augmented Lagrangian to reduce the computational burden. Next, taking advantage of the properties of Functional Principal Components, we extend our algorithm to the function-on-scalar feature selection framework, where a functional response is modeled against a huge number of potential scalar predictors. Finally, we employ advanced functional data techniques along with other statistical tools to study the evolution of the COVID-19 epidemic in Italy - using massive amounts of data that we collect, pre-process, and curate from different public sources on the epidemic, as well as socio-demographic, infrastructural and environmental factors that may affect its unfolding.