Identifying Patient Smoking Status from Medical Discharge Records

Abstract
Clinical narrative records contain much useful information. However, most clinical narratives are in the form of fragmented English free text, showing the characteristics of a clinical sublanguage. This makes their linguistic processing, search, and retrieval challenging.1 Traditional natural language processing (NLP) tools are not designed for the fragmented free text found in narrative clinical records; therefore, they do not perform well on this type of data.2 Limited access to clinical records has been a barrier to the widespread development of medical language processing (MLP) technologies. In the absence of a standardized, publicly available ground truth that encourages the development of MLP systems and allows their head-to-head comparison, successful MLP efforts have been limited, e.g., MedLEE3 and Symtxt.4 A few MLP systems have been developed,5 and such efforts have successfully shown the usefulness of MLP in clinical settings.6–8

This publication has 38 references indexed in Scilit: