A probabilistic explanation of a natural phenomenon

Open Access
Artemiou, Andreas A
Graduate Program:
Master of Science
Document Type:
Master Thesis
Date of Defense:
Committee Members:
  • Bing Li, Thesis Advisor
  • Random Covariance Matrix
  • Principal Components
  • Regression
  • Correlation
  • Eigenpairs
  • Orientationally Uniform Distribution
  • Dimension Reduction
Regression is the procedure that attempts to relate a $p$-dimensional vector of predictors $Xb$ with a response variable $Y$. Frequently, we deal with regression problems that have a large amount of predictors. In those cases, we try to reduce the dimension of our predictor vector. The reason we are trying to reduce the dimension, is the necessity to find the predictors that will affect our response the most. One of the most widely used methods is the Principal Components Analysis. With this analysis, I try to find the first few $d$ ($ll p$) principal components, that are generally believed to better describe the relationship between predictors $Xb$ and response $Y$. This procedure however has not been appropriately justified. In practice, it often occurs that the first few principal components are more highly correlated with the response variable, and better describe the relationship between the predictors and the response variable than the other principal components. However, there seems no logical reason for this tendency, and there are cases - albeit less often - where the first few principal components have weaker correlation with the response. There is a long standing debate on this issue among statisticians, and, todate, it has not been adequately resolved. In this thesis I ask, and attempt to answer, the following questions: Is there a tendency for the first few principal components of the predictor to be more strongly related with the response? If so, what is the reason behind this tendency? And how strong is this tendency?