Computational principles and challenges in single-cell data integration

Abstract
The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term ‘data integration’ has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.
Funding Information
  • EMBL International PhD Programme
  • European Commission (810296)
  • Core funding from EMBL and DKFZ
  • Cancer Research UK (C9545/A29580)
  • Core funding from CRUK (grant number listed above) and EMBL