Development of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.

TitleDevelopment of a phenotype ontology for autism spectrum disorder by natural language processing on electronic health records.
Publication TypeJournal Article
Year of Publication2022
AuthorsZhao M, Havrilla J, Peng J, Drye M, Fecher M, Guthrie W, Tunc B, Schultz R, Wang K, Zhou Y
JournalJ Neurodev Disord
Volume14
Issue1
Pagination32
Date Published2022 May 23
ISSN1866-1955
KeywordsAutism Spectrum Disorder, Electronic Health Records, Humans, Natural Language Processing, Phenotype, Vocabulary
Abstract

BACKGROUND: Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives.

METHODS: To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method.

RESULTS: Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders.

CONCLUSION: Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.

DOI10.1186/s11689-022-09442-0
Alternate JournalJ Neurodev Disord
PubMed ID35606697