High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence
Top Cited Papers
Open Access
- 1 January 2011
- journal article
- Published by Institute of Mathematical Statistics in Electronic Journal of Statistics
- Vol. 5 (none), 935-980
- https://doi.org/10.1214/11-ejs631
Abstract
Given i.i.d. observations of a random vector X∈ℝp, we study the problem of estimating both its covariance matrix Σ*, and its inverse covariance or concentration matrix Θ*=(Σ*)−1. When X is multivariate Gaussian, the non-zero structure of Θ* is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ* is the ℓ1-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian X, since it corresponds to minimizing an ℓ1-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p,s,d), our analysis identifies other key quantities that control rates: (a) the ℓ∞-operator norm of the true covariance matrix Σ*; and (b) the ℓ∞ operator norm of the sub-matrix Γ*SS, where S indexes the graph edges, and Γ*=(Θ*)−1⊗(Θ*)−1; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ* and (d) the rate of decay 1/f(n,δ) on the probabilities {|Σ̂nij−Σ*ij|>δ}, where Σ̂n is the sample covariance based on n samples. Our first result establishes consistency of our estimate Θ̂ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees . In our second result, we show that with probability converging to one, the estimate Θ̂ correctly specifies the zero pattern of the concentration matrix Θ*. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.Keywords
This publication has 24 references indexed in Scilit:
- Sparsistency and rates of convergence in large covariance matrix estimationThe Annals of Statistics, 2009
- Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$-Constrained Quadratic Programming (Lasso)IEEE Transactions on Information Theory, 2009
- Covariance regularization by thresholdingThe Annals of Statistics, 2008
- Regularized estimation of large covariance matricesThe Annals of Statistics, 2008
- Sparse permutation invariant covariance estimationElectronic Journal of Statistics, 2008
- Estimation of Gaussian graphs by model selectionElectronic Journal of Statistics, 2008
- Sparse inverse covariance estimation with the graphical lassoBiostatistics, 2007
- High-dimensional graphs and variable selection with the LassoThe Annals of Statistics, 2006
- Covariance matrix selection and estimation via penalised normal likelihoodBiometrika, 2006
- On the distribution of the largest eigenvalue in principal components analysisThe Annals of Statistics, 2001