Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes
- 6 October 2020
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in Journal of the ACM
- Vol. 67 (6), 1-42
- https://doi.org/10.1145/3417994
Abstract
We introduce a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ(kd2/ε2) samples are necessary and sufficient for learning a mixture of k Gaussians in Rd, up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ(kd/ε2) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in Rd admits a small compression scheme.Keywords
Funding Information
- NSERC (22R23068)
- CRM-ISM postdoctoral fellowship and an IVADO-Apogée-CFREF postdoctoral fellowship
- NSERC Discovery
This publication has 21 references indexed in Scilit:
- Fast and Near-Optimal Algorithms for Approximating Distributions by HistogramsPublished by Association for Computing Machinery (ACM) ,2015
- Disentangling GaussiansCommunications of the ACM, 2012
- PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation AssumptionLecture Notes in Computer Science, 2006
- Smallest singular value of random matrices and geometry of random polytopesAdvances in Mathematics, 2005
- Learning mixtures of separated nonspherical GaussiansThe Annals of Applied Probability, 2005
- Adaptive estimation of a quadratic functional by model selectionThe Annals of Statistics, 2000
- Learnability and the Vapnik-Chervonenkis dimensionJournal of the ACM, 1989
- Rates of Convergence of Minimum Distance Estimators and Kolmogorov's EntropyThe Annals of Statistics, 1985
- On the Uniform Convergence of Relative Frequencies of Events to Their ProbabilitiesTheory of Probability and Its Applications, 1971
- On Information and SufficiencyThe Annals of Mathematical Statistics, 1951