Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.

TitleIdentification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.
Publication TypeJournal Article
Year of Publication2016
AuthorsGrundmeier RW, Masino AJ, T Casper C, Dean JM, Bell J, Enriquez R, Deakyne S, Chamberlain JM, Alpern ER
Corporate AuthorsPediatric Emergency Care Applied Research Network
JournalAppl Clin Inform
Volume7
Issue4
Pagination1051-1068
Date Published2016 Nov 09
ISSN1869-0327
Abstract

BACKGROUND: Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed.

OBJECTIVE: To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement.

METHODS: Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English "stop words" and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures.

RESULTS: There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927).

CONCLUSIONS: NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.

DOI10.4338/ACI-2016-08-RA-0129
Alternate JournalAppl Clin Inform
PubMed ID27826610