Kullback-Leibler distance as a measure of the information filtered from multivariate data
- 19 September 2007
- journal article
- research article
- Published by American Physical Society (APS) in Physical Review E
- Vol. 76 (3), 031123
- https://doi.org/10.1103/physreve.76.031123
Abstract
We show that the Kullback-Leibler distance is a good measure of the statistical uncertainty of correlation matrices estimated by using a finite set of data. For correlation matrices of multivariate Gaussian variables we analytically determine the expected values of the Kullback-Leibler distance of a sample correlation matrix from a reference model and we show that the expected values are known also when the specific model is unknown. We propose to make use of the Kullback-Leibler distance to estimate the information extracted from a correlation matrix by correlation filtering procedures. We also show how to use this distance to measure the stability of filtering procedures with respect to statistical uncertainty. We explain the effectiveness of our method by comparing four filtering procedures, two of them being based on spectral analysis and the other two on hierarchical clustering. We compare these techniques as applied both to simulations of factor models and empirical data. We investigate the ability of these filtering procedures in recovering the correlation matrix of models from simulations. We discuss such ability in terms of both the heterogeneity of model parameters and the length of data series. We also show that the two spectral techniques are typically more informative about the sample correlation matrix than techniques based on hierarchical clustering, whereas the latter are more stable with respect to statistical uncertainty.Keywords
This publication has 17 references indexed in Scilit:
- Effective matter superpotentials from Wishart random matricesPhysics Letters B, 2003
- Elements of Information TheoryPublished by Wiley ,2001
- Dynamic modeling of gene expression dataProceedings of the National Academy of Sciences of the United States of America, 2001
- Singular value decomposition for genome-wide expression data processing and modelingProceedings of the National Academy of Sciences, 2000
- Fundamental patterns underlying gene expression profiles: Simplicity from complexityProceedings of the National Academy of Sciences, 2000
- Hierarchical structure in financial marketsZeitschrift für Physik B Condensed Matter, 1999
- Universal and Nonuniversal Properties of Cross Correlations in Financial Time SeriesPhysical Review Letters, 1999
- Noise Dressing of Financial Correlation MatricesPhysical Review Letters, 1999
- Complex Wishart matrices and conductance in mesoscopic systems: Exact resultsJournal of Mathematical Physics, 1994
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951