A probabilistic explanation of a natural phenomenon
Open Access
- Author:
- Artemiou, Andreas A
- Graduate Program:
- Statistics
- Degree:
- Master of Science
- Document Type:
- Master Thesis
- Date of Defense:
- None
- Committee Members:
- Bing Li, Thesis Advisor/Co-Advisor
Bing Li, Thesis Advisor/Co-Advisor - Keywords:
- Random Covariance Matrix
Principal Components
Regression
Correlation
Eigenpairs
Orientationally Uniform Distribution
Dimension Reduction - Abstract:
- Regression is the procedure that attempts to relate a $p$-dimensional vector of predictors $Xb$ with a response variable $Y$. Frequently, we deal with regression problems that have a large amount of predictors. In those cases, we try to reduce the dimension of our predictor vector. The reason we are trying to reduce the dimension, is the necessity to find the predictors that will affect our response the most. One of the most widely used methods is the Principal Components Analysis. With this analysis, I try to find the first few $d$ ($ll p$) principal components, that are generally believed to better describe the relationship between predictors $Xb$ and response $Y$. This procedure however has not been appropriately justified. In practice, it often occurs that the first few principal components are more highly correlated with the response variable, and better describe the relationship between the predictors and the response variable than the other principal components. However, there seems no logical reason for this tendency, and there are cases - albeit less often - where the first few principal components have weaker correlation with the response. There is a long standing debate on this issue among statisticians, and, todate, it has not been adequately resolved. In this thesis I ask, and attempt to answer, the following questions: Is there a tendency for the first few principal components of the predictor to be more strongly related with the response? If so, what is the reason behind this tendency? And how strong is this tendency?