A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases

Open Access

9 March 2020

journal article
review article
Published by Springer Science and Business Media LLC in npj Digital Medicine

Vol. 3 (1), 1-11
https://doi.org/10.1038/s41746-020-0229-3

Abstract

Autoimmune diseases are chronic, multifactorial conditions. Through machine learning (ML), a branch of the wider field of artificial intelligence, it is possible to extract patterns within patient data, and exploit these patterns to predict patient outcomes for improved clinical management. Here, we surveyed the use of ML methods to address clinical problems in autoimmune disease. A systematic review was conducted using MEDLINE, embase and computers and applied sciences complete databases. Relevant papers included "machine learning" or "artificial intelligence" and the autoimmune diseases search term(s) in their title, abstract or key words. Exclusion criteria: studies not written in English, no real human patient data included, publication prior to 2001, studies that were not peer reviewed, non-autoimmune disease comorbidity research and review papers. 169 (of 702) studies met the criteria for inclusion. Support vector machines and random forests were the most popular ML methods used. ML models using data on multiple sclerosis, rheumatoid arthritis and inflammatory bowel disease were most common. A small proportion of studies (7.7% or 13/169) combined different data types in the modelling process. Cross-validation, combined with a separate testing set for more robust model evaluation occurred in 8.3% of papers (14/169). The field may benefit from adopting a best practice of validation, cross-validation and independent testing of ML models. Many models achieved good predictive results in simple scenarios (e.g. classification of cases and controls). Progression to more complex predictive models may be achievable in future through integration of multiple data types.

This publication has 190 references indexed in Scilit:

Use of computerized algorithm to identify individuals in need of testing for celiac disease
Journal of the American Medical Informatics Association, 2013
The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future
Genetics in Medicine, 2013
Risk estimation and risk prediction using machine-learning methods
Human Genetics, 2012
A CD4 T cell gene signature for early rheumatoid arthritis implicates interleukin 6-mediated STAT3 signalling, particularly in anti-citrullinated peptide antibody-negative disease
Annals Of The Rheumatic Diseases, 2012
Predicting Three-Year Kidney Graft Survival in Recipients with Systemic Lupus Erythematosus
ASAIO Journal, 2011
An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings
BMC Genetics, 2010
Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis
Genes & Immunity, 2010
Recent insights in the epidemiology of autoimmune diseases: Improved prevalence estimates and understanding of clustering of diseases
Journal of Autoimmunity, 2009
Abrogation of T cell quiescence characterizes patients at high risk for multiple sclerosis after the initial neurological event
Proceedings of the National Academy of Sciences of the United States of America, 2008
Epidemiology of autoimmune diseases in Denmark
Journal of Autoimmunity, 2007

Cited by 150 articles