Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records
- 8 July 2017
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 25 (1), 61-71
- https://doi.org/10.1093/jamia/ocx059
Abstract
Objective: Understanding how to identify the social determinants of health from electronic health records (EHRs) could provide important insights to understand health or disease outcomes. We developed a methodology to capture 2 rare and severe social determinants of health, homelessness and adverse childhood experiences (ACEs), from a large EHR repository.Materials and Methods: We first constructed lexicons to capture homelessness and ACE phenotypic profiles. We employed word2vec and lexical associations to mine homelessness-related words. Next, using relevance feedback, we refined the 2 profiles with iterative searches over 100 million notes from the Vanderbilt EHR. Seven assessors manually reviewed the top-ranked results of 2544 patient visits relevant for homelessness and 1000 patients relevant for ACE.Results: word2vec yielded better performance (area under the precision-recall curve [AUPRC] of 0.94) than lexical associations (AUPRC = 0.83) for extracting homelessness-related words. A comparative study of searches for the 2 phenotypes revealed a higher performance achieved for homelessness (AUPRC = 0.95) than ACE (AUPRC = 0.79). A temporal analysis of the homeless population showed that the majority experienced chronic homelessness. Most ACE patients suffered sexual (70%) and/or physical (50.6%) abuse, with the top-ranked abuser keywords being “father” (21.8%) and “mother” (15.4%). Top prevalent associated conditions for homeless patients were lack of housing (62.8%) and tobacco use disorder (61.5%), while for ACE patients it was mental disorders (36.6%–47.6%).Conclusion: We provide an efficient solution for mining homelessness and ACE information from EHRs, which can facilitate large clinical and genetic studies of these social determinants of health.Keywords
Funding Information
- National Institute of General Medical Sciences (R01 GM103859)
- National Center for Advancing Translational Sciences (UL1 TR000445)
- Patient-Centered Outcomes Research Institute (CDRN-1306-04869)
- National Institutes of Health
- Patient-Centered Outcomes Research Institute
This publication has 37 references indexed in Scilit:
- Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical RecordsPLOS ONE, 2013
- Pneumonia identification using statistical feature selectionJournal of the American Medical Informatics Association, 2012
- Portability of an algorithm to identify rheumatoid arthritis in electronic health recordsJournal of the American Medical Informatics Association, 2012
- 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical textJournal of the American Medical Informatics Association, 2011
- Social Relationships and Mortality Risk: A Meta-analytic ReviewPLoS Medicine, 2010
- Identifying Patient Smoking Status from Medical Discharge RecordsJournal of the American Medical Informatics Association, 2008
- Homelessness, Health Status, and Health Care UseAmerican Journal of Public Health, 2007
- Hospitalization Costs Associated with Homelessness in New York CityThe New England Journal of Medicine, 1998
- Relationship of Childhood Abuse and Household Dysfunction to Many of the Leading Causes of Death in Adults: The Adverse Childhood Experiences (ACE) StudyAmerican Journal of Preventive Medicine, 1998
- Bootstrap Methods: Another Look at the JackknifeThe Annals of Statistics, 1979