Synthesizing external aggregated information in the presence of population heterogeneity: A penalized empirical likelihood approach
- 2 February 2021
- journal article
- research article
- Published by Oxford University Press (OUP) in Biometrics
- Vol. 78 (2), 679-690
- https://doi.org/10.1111/biom.13429
Abstract
With the increasing availability of data in the public domain, there has been a growing interest in exploiting information from external sources to improve the analysis of smaller scale studies. An emerging challenge in the era of big data is that the subject‐level data are high dimensional, but the external information is at an aggregate level and of a lower dimension. Moreover, heterogeneity and uncertainty in the auxiliary information are often not accounted for in information synthesis. In this paper, we propose a unified framework to summarize various forms of aggregated information via estimating equations and develop a penalized empirical likelihood approach to incorporate such information in logistic regression. When the homogeneity assumption is violated, we extend the method to account for population heterogeneity among different sources of information. When the uncertainty in the external information is not negligible, we propose a variance estimator adjusting for the uncertainty. The proposed estimators are asymptotically more efficient than the conventional penalized maximum likelihood estimator and enjoy the oracle property even with a diverging number of predictors. Simulation studies show that the proposed approaches yield higher accuracy in variable selection compared with competitors. We illustrate the proposed methodologies with a pediatric kidney transplant study.Keywords
Funding Information
- NIH Clinical Center (R01CA193888)
This publication has 38 references indexed in Scilit:
- Personalized estimates of breast cancer risk in clinical practice and public healthStatistics in Medicine, 2011
- Penalized high-dimensional empirical likelihoodBiometrika, 2010
- Effects of data dimension on empirical likelihoodBiometrika, 2009
- On the adaptive elastic-net with a diverging number of parametersThe Annals of Statistics, 2009
- Extending the scope of empirical likelihoodThe Annals of Statistics, 2009
- Covariate heterogeneity in meta‐analysis: Criteria for deciding between meta‐regression and individual patient dataStatistics in Medicine, 2006
- Regularization and Variable Selection Via the Elastic NetJournal of the Royal Statistical Society Series B: Statistical Methodology, 2005
- Higher Order Properties of Gmm and Generalized Empirical Likelihood EstimatorsEconometrica, 2004
- Miscellanea. Combining parametric and empirical likelihoodsBiometrika, 2000
- Projecting Individualized Probabilities of Developing Breast Cancer for White Females Who Are Being Examined AnnuallyJNCI Journal of the National Cancer Institute, 1989