Computational principles and challenges in single-cell data integration
Top Cited Papers
- 3 May 2021
- journal article
- review article
- Published by Springer Science and Business Media LLC in Nature Biotechnology
- Vol. 39 (10), 1202-1215
- https://doi.org/10.1038/s41587-021-00895-7
Abstract
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.Keywords
Funding Information
- EMBL International PhD Programme
- European Commission (810296)
- Core funding from EMBL and DKFZ
- Cancer Research UK (C9545/A29580)
- Core funding from CRUK (grant number listed above) and EMBL
This publication has 146 references indexed in Scilit:
- Joint and individual variation explained (JIVE) for integrated analysis of multiple data typesThe Annals of Applied Statistics, 2013
- Genome-wide efficient mixed-model analysis for association studiesNature Genetics, 2012
- Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA allelesNature Genetics, 2012
- Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analysesNature Protocols, 2012
- Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneityBioinformatics, 2011
- New approaches to population stratification in genome-wide association studiesNature Reviews Genetics, 2010
- Statistical challenges of high-dimensional dataPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2009
- A unified mixed-model method for association mapping that accounts for multiple levels of relatednessNature Genetics, 2005
- Organization and evolution of a gene cluster for human immunoglobulin variable regions of the kappa typeJournal of Molecular Biology, 1984
- Speech discrimination by dynamic programmingCybernetics and Systems Analysis, 1972