Statistical Methods in Single Cell and Spatial Transcriptomics Data
Open Access
- Author:
- Singh, Roopali
- Graduate Program:
- Statistics
- Degree:
- Doctor of Philosophy
- Document Type:
- Dissertation
- Date of Defense:
- October 08, 2021
- Committee Members:
- Dajiang Liu, Outside Unit & Field Member
Xiang Zhu, Major Field Member
Qunhua Li, Major Field Member & Dissertation Advisor
Ephraim Hanks, Chair of Committee
Ephraim Mont Hanks, Professor in Charge/Director of Graduate Studies - Keywords:
- Single cell
Spatial Transcriptomics
Missing Data
Reproducibility
Bayesian NMF
Variational Inference
EM Algorithm - Abstract:
- Single cell RNA-sequencing (scRNA-seq) allows one to study the transcriptomics of different cell types in heterogeneous samples (e.g. tissues) at a single cell level. Most scRNA-seq protocols experience high levels of dropout due to the small amount of starting material, leading to a majority of reported expression levels being zero. Though missing data contain information about reproducibility, they are often excluded in the reproducibility assessment, potentially generating misleading assessments. In the first part of my dissertation, we develop a copula-based regression model to assess how the reproducibility of high-throughput experiments is affected by the choices of operational factors (e.g., platform or sequencing depth) when a large number of measurements are missing. Simulations show that our method is more accurate in detecting differences in reproducibility than existing measures of reproducibility. We illustrate the usefulness of our method by comparing the reproducibility of different library preparation platforms and studying the effect of sequencing depth on reproducibility, thereby determining the cost-effective sequencing depth that is required to achieve sufficient reproducibility. The spatial locations of these single cells are lost in scRNA-seq data. A recently emerging technology, Spatial Transcriptomics (ST), measures the gene expression in a tissue slice in situ, maintaining cells' spatial information in the tissue. However, they do not have a single-cell resolution but rather produce a group of potentially heterogeneous cells at each spot, which needs to be deconvolved to learn cell composition at each spot. In the second part of my dissertation, we develop a reference-free deconvolution method, based on Bayesian non-negative matrix factorization, to infer the cell type composition of each spot. Unlike the existing deconvolution methods, which all take reference-based approaches, our approach does not rely on scRNA-seq references. Simulations show that our method is more accurate in detecting the cell-type compositions than existing deconvolution techniques in case of varying spot size, heterogeneity, and imperfect single-cell reference. We illustrate the usefulness of our method using Mouse Brain Cerebellum data and Human Intestine Developmental data.