Statistical Methods Development for Analyzing Biomarkers, Treatments and Survival Endpoints

Restricted (Penn State Only)
- Author:
- Wu, Xue
- Graduate Program:
- Biostatistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- August 28, 2024
- Committee Members:
- Dave Mauger, Major Field Member
Ming Wang, Special Member
Shouhao Zhou, Major Field Member
Vernon Chinchilli, Chair & Dissertation Advisor
Andrew Foy, Outside Unit & Field Member
Chixiang Chen, Special Member
Arthur Berg, Professor in Charge/Director of Graduate Studies - Keywords:
- Time-to-event Endpoints
Causal Inference
High Dimensional Data
Propensity Score Matching
Time-dependent Confounders
Observational Studies - Abstract:
- In clinical research, identifying biomarkers that are associated with survival outcomes and ascertaining how treatments affect survival outcomes are two major topics, leading to more personalized and effective healthcare. Identifying biomarkers, such as predictive biomarkers, helps tailor treatments to individual patients based on their biological characteristics. Additionally, quantifying the causal impact of treatments helps determine which treatments work best for specific patient subgroups. Thus, tackling and quantifying the relationship between biomarkers and treatments with the time-to-event (TTE) of clinical interest is crucial for developing personalized treatment plans and improving the accuracy of prognostic models. However, real-world datasets introduce statistical complications. A common issue for biomarker identification is the high dimensionality and associated type I error inflation due to multiple testing. Under the survival framework, how to well control family-wise error rate (FWER) still needs more endeavors. Further, for estimating causal treatment effects, imbalances in treatment assignment due to (time-varying) confounders introduce bias in observational studies. To address these gaps, we propose (1) a three-stage approach that identifies prognostic and predictive biomarkers for cancer genetic high-dimensional data with survival endpoints with effectively controlling FWERs; (2) a dynamic propensity score matching algorithm that achieves balance by utilizing the entire historical trajectory of pre-treatment confounders in treatment-control study designs, with extensions to multi-arm study designs. First, to identify prognostic and predictive biomarkers with controlling FWER in high-dimensional data, we build upon the concept of multi-splitting for p-value adjustment. Our focus is on the survival framework, specifically using the Cox proportional hazard model. We adopt an adaptive group LASSO for variable screening and selection and then derive adjusted p-values through multi-splitting and bootstrapping to correct invalid p-values caused by the penalized approach’s restrictions. We conduct extensive simulations to empirically evaluate the FWER control and model selection accuracy, demonstrating that our proposed three-stage approach outperforms existing alternative methods. Furthermore, we provide a user-friendly R software implementation and comprehensive theoretical properties to support our algorithm. We apply the proposed method to analyze two breast cancer datasets and a chronic lymphocytic leukemia dataset, showing promising findings that are verified by existing literature. Second, motivated by the longitudinal observational Chronic Renal Insufficiency Cohort (CRIC) study, we aim to investigate the effect of antihypertensive medication initiation on reducing the risk of cardiovascular disease (CVD). However, this observational data presents a unique structure involving time-varying treatment variables, time-varying confounding variables, and TTE outcome variables, where the conventional propensity score matching algorithm cannot be applied to estimate the causal effect. We introduce the dynamic propensity trajectory (DPT) framework and DPT-based matching techniques, which address time-varying treatment receipt and achieve the balance of confounders across the entire study period, encompassing both time-invariant and time-varying covariates leading up to treatment initiation. After matching, we quantify the causal treatment effects for survival outcomes post-treatment initiation. We implement the proposed methods in the CRIC study to assess the effects of the Angiotensin-Converting Enzyme Inhibitor/Angiotensin II Receptor Blocker (ACE/ARB) medication versus control and Calcium Channel Blocker (CCB) medications versus control in reducing the risk of CVD among patients with chronic kidney disease (CKD). In addition, we conduct extensive simulation studies to evaluate the properties of our approach. Lastly, we propose extending the DPT-based matching for causal inference to accommodate multiple treatments (more than two arms), allowing for a comparison of causal treatment effects on CVD outcomes among various antihypertensive medications. We generalize the dynamic propensity score to a multivariate setting and propose a multivariate DPT-based vector matching (DPTVM) algorithm. This approach aims to reduce the imbalance of time-varying and time-fixed covariates in studies with multiple levels of time-varying treatment receipts, providing unbiased pairwise comparisons among multiple medications. We conduct simulation studies to evaluate the performance of our method in comparison to existing methods. Future work will involve generalizing the DPTVM algorithm by incorporating kernel weights to account for the similarity of dynamic propensity scores, making the implementation simpler and more efficient.