Information‐incorporated Gaussian graphical model for gene expression data
- 2 February 2021
- journal article
- research article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 78 (2), 512-523
- https://doi.org/10.1111/biom.13428
Abstract
In the analysis of gene expression data, network approaches take a system perspective and have played an irreplaceably important role. Gaussian graphical models (GGMs) have been popular in the network analysis of gene expression data. They investigate the conditional dependence between genes and “transform” the problem of estimating network structures into a sparse estimation of precision matrices. When there is a moderate to large number of genes, the number of parameters to be estimated may overwhelm the limited sample size, leading to unreliable estimation and selection. In this article, we propose incorporating information from previous studies (for example, those deposited at PubMed) to assist estimating the network structure in the present data. It is recognized that such information can be partial, biased, or even wrong. A penalization-based estimation approach is developed, shown to have consistency properties, and realized using an effective computational algorithm. Simulation demonstrates its competitive performance under various information accuracy scenarios. The analysis of TCGA lung cancer prognostic genes leads to network structures different from the alternatives.Keywords
Funding Information
- National Natural Science Foundation of China (11701561, 11971404, Basic Scientific Project 71988101)
- National Science Foundation (CA216017, CA204120, CA196530)
This publication has 27 references indexed in Scilit:
- Joint conditional Gaussian graphical models with multiple sources of genomic dataFrontiers in Genetics, 2013
- Identification of fever and vaccine-associated gene interaction networks using ontology-based literature miningJournal of Biomedical Semantics, 2012
- High-dimensional semiparametric Gaussian copula graphical modelsThe Annals of Statistics, 2012
- Joint estimation of multiple graphical modelsBiometrika, 2011
- Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics dataBMC Systems Biology, 2011
- Sparsistency and rates of convergence in large covariance matrix estimationThe Annals of Statistics, 2009
- Neuroendocrine small cell carcinoma of the breast: report of a caseMedical Molecular Morphology, 2009
- Biomedical Literature MiningPublished by Springer Science and Business Media LLC ,2008
- A Multigene Assay Is Prognostic of Survival in Patients with Early-Stage Lung AdenocarcinomaClinical Cancer Research, 2008
- Model selection and estimation in the Gaussian graphical modelBiometrika, 2007