Supervised machine learning techniques for the classification of metabolic disorders in newborns

Abstract
Motivation: During the Bavarian newborn screening programme all newborns have been tested for about 20 inherited metabolic disorders. Owing to the amount and complexity of the generated experimental data, machine learning techniques provide a promising approach to investigate novel patterns in high-dimensional metabolic data which form the source for constructing classification rules with high discriminatory power. Results: Six machine learning techniques have been investigated for their classification accuracy focusing on two metabolic disorders, phenylketo nuria (PKU) and medium-chain acyl-CoA dehydrogenase deficiency (MCADD). Logistic regression analysis led to superior classification rules (sensitivity >96.8%, specificity >99.98%) compared to all investigated algorithms. Including novel constellations of metabolites into the models, the positive predictive value could be strongly increased (PKU 71.9% versus 16.2%, MCADD 88.4% versus 54.6% compared to the established diagnostic markers). Our results clearly prove that the mined data confirm the known and indicate some novel metabolic patterns which may contribute to a better understanding of newborn metabolism. Availability: WEKA machine learning package: www.cs.waikato.ac.nz/~ml/weka and statistical software package ADE-4: http://pbil.univ-lyon1.fr/ADE-4