Characterisation, identification, clustering, and classification of disease

Open Access

8 March 2021

journal article
research article
Published by Springer Science and Business Media LLC in Scientific Reports

Vol. 11 (1), 1-13
https://doi.org/10.1038/s41598-021-84860-z

Abstract

The importance of quantifying the distribution and determinants of multimorbidity has prompted novel data-driven classifications of disease. Applications have included improved statistical power and refined prognoses for a range of respiratory, infectious, autoimmune, and neurological diseases, with studies using molecular information, age of disease incidence, and sequences of disease onset (“disease trajectories”) to classify disease clusters. Here we consider whether easily measured risk factors such as height and BMI can effectively characterise diseases in UK Biobank data, combining established statistical methods in new but rigorous ways to provide clinically relevant comparisons and clusters of disease. Over 400 common diseases were selected for analysis using clinical and epidemiological criteria, and conventional proportional hazards models were used to estimate associations with 12 established risk factors. Several diseases had strongly sex-dependent associations of disease risk with BMI. Importantly, a large proportion of diseases affecting both sexes could be identified by their risk factors, and equivalent diseases tended to cluster adjacently. These included 10 diseases presently classified as “Symptoms, signs, and abnormal clinical and laboratory findings, not elsewhere classified”. Many clusters are associated with a shared, known pathogenesis, others suggest likely but presently unconfirmed causes. The specificity of associations and shared pathogenesis of many clustered diseases provide a new perspective on the interactions between biological pathways, risk factors, and patterns of disease such as multimorbidity.

This publication has 41 references indexed in Scilit:

Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks
Bioinformatics, 2016
Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci
Nature Genetics, 2016
The path from big data to precision medicine
Expert Review of Precision Medicine and Drug Development, 2016
Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology
Emerging Themes in Epidemiology, 2015
dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering
Bioinformatics, 2015
Increased risk of posterior circulation infarcts among ischemic stroke patients with cervical spondylosis
Neuropsychiatric Disease and Treatment, 2015
Discrimination and classification of liver cancer cells and proliferation states by Raman spectroscopic imaging
The Analyst, 2014
The Biologic Basis of Clinical Heterogeneity in Juvenile Idiopathic Arthritis
Arthritis & Rheumatology, 2014
Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients
Nature Communications, 2014
Association Between Renal Failure and Foot Ulcer or Lower-Extremity Amputation in Patients With Diabetes
Diabetes Care, 2008

Cited by 20 articles