Background: Presentation, treatment and outcomes of pulmonary embolism (PE) in adults varies by age and sex. Recently, some rule-based natural language processing (NLP) tools have been developed to detect PE from radiology reports. It is unknown whether variations in age and sex influence the accuracy of the NLP models to detect PE in radiology reports. Methods: We used patient data from the Mass General Brigham (MGB) Health System between 2016 and 2021. The NLP models developed by Verma et al. and Johnson et al. were applied to radiology reports to determine PE among a randomly selected sample of 1,712 patients (52.3% female and 45.3% ≥65 years old). Two independent physicians conducted a manual chart review of patient records with predefined criteria, serving as the reference standard. Accuracy metrics were ascertained across age (≥65 vs. <65 years) and sex (female vs. male) subgroups. Weighted estimates were established based on the total number of hospitalizations at MGB (n=381,642, 54.0% female and 47.2% ≥65 years old). Results: In the weighted sample, the prevalence of PE was 2.0% (7,708/ 381,642). The weighted estimates of the NLP models resulted in high sensitivity (range 71.0% to 89.3%) and specificity (range 96.8% to 98.6) for patients ≥65 and <65 years of age (Table, Part A). Similarly, both models resulted in high sensitivity (range 76.9% to 88.2%) and specificity (range 96.6% to 98.7%) for females and males (Table, Part B). However, positive predictive values were low for both models in the two age categories (range 36.8% to 51.0%) and sex groups (range 35.1% to 54.7%) (Table, Parts A and B). Conclusions: Despite some variations in the accuracy of the rule-based NLP models, positive predictive values across age and sex groups were low. Further improvements are necessary before rule-based NLP tools can be reliably utilized for PE identification from imaging reports.
Support the authors with ResearchCoin