Pneumonia identification using statistical feature selection

Open Access

1 September 2012

journal article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 19 (5), 817-823
https://doi.org/10.1136/amiajnl-2011-000752

Abstract

Objective This paper describes a natural language processing system for the task of pneumonia identification. Based on the information extracted from the narrative reports associated with a patient, the task is to identify whether or not the patient is positive for pneumonia. Design A binary classifier was employed to identify pneumonia from a dataset of multiple types of clinical notes created for 426 patients during their stay in the intensive care unit. For this purpose, three types of features were considered: (1) word n-grams, (2) Unified Medical Language System (UMLS) concepts, and (3) assertion values associated with pneumonia expressions. System performance was greatly increased by a feature selection approach which uses statistical significance testing to rank features based on their association with the two categories of pneumonia identification. Results Besides testing our system on the entire cohort of 426 patients (unrestricted dataset), we also used a smaller subset of 236 patients (restricted dataset). The performance of the system was compared with the results of a baseline previously proposed for these two datasets. The best results achieved by the system (85.71 and 81.67 F1-measure) are significantly better than the baseline results (50.70 and 49.10 F1-measure) on the restricted and unrestricted datasets, respectively. Conclusion Using a statistical feature selection approach that allows the feature extractor to consider only the most informative features from the feature space significantly improves the performance over a baseline that uses all the features from the same feature space. Extracting the assertion value for pneumonia expressions further improves the system performance.

This publication has 22 references indexed in Scilit:

2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text
Journal of the American Medical Informatics Association, 2011
Genetic Variation in theFASGene and Associations with Acute Lung Injury
American Journal of Respiratory and Critical Care Medicine, 2011
What can natural language processing do for clinical decision support?
Journal of Biomedical Informatics, 2009
Infectious Diseases Society of America/American Thoracic Society Consensus Guidelines on the Management of Community-Acquired Pneumonia in Adults
Clinical Infectious Diseases, 2007
Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation
Journal of Biomedical Informatics, 2006
Natural Language Processing in the Electronic Medical Record: Assessing Clinician Adherence to Tobacco Treatment Guidelines
American Journal of Preventive Medicine, 2005
Use of computerized surveillance to detect nosocomial pneumonia in neonatal intensive care unit patients
American Journal of Infection Control, 2005
Extracting information on pneumonia in infants using natural language processing of radiology reports
Journal of Biomedical Informatics, 2005
MediClass: A System for Detecting and Classifying Encounter-based Clinical Events in Any Electronic Medical Record
Journal of the American Medical Informatics Association, 2005
A Comparison of Classification Algorithms to Automatically Identify Chest X-Ray Reports That Support Pneumonia
Journal of Biomedical Informatics, 2001

Cited by 53 articles