A supervised clustering MCMC methodology for large categorical feature spaces
Statistical Methods in Medical Research ; doi:10.1177/09622802211009258
Abstract: There is a well-established tradition within the statistics literature that explores different techniques for reducing the dimensionality of large feature spaces. The problem is central to machine learning and it has been largely explored under the unsupervised learning paradigm. We introduce a supervised clustering methodology that capitalizes on a Metropolis Hastings algorithm to optimize the partition structure of a large categorical feature space tailored towards minimizing the test error of a learning algorithm. This is a general methodology that can be applied to any supervised learning problem with a large categorical feature space. We show the benefits of the algorithm by applying this methodology to the problem of risk adjustment in competitive health insurance markets. We use a large claims data set that records ICD-10 codes, a large categorical feature space. We aim at improving risk adjustment by clustering diagnostic codes into risk groups suitable for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from a representative sample of twenty three million citizens in Colombian Healthcare System. Our results outperform common alternatives and suggest that it has potential to improve risk adjustment.
Keywords: Supervised clustering / MCMC / risk adjustment / metropolis hastings / ICD-10 / large feature space
Scifeed alert for new publicationsNever miss any articles matching your research from any publisher
- Get alerts for new papers matching your research
- Find out the new papers from selected authors
- Updated daily for 49'000+ journals and 6000+ publishers
- Define your Scifeed now
Click here to see the statistics on "Statistical Methods in Medical Research" .