Background:
Systemic Lupus Erythematosus (SLE) is a complex, relapsing-remitting disease, posing challenges in diagnosis and management. Traditional disease activity indices often fail to capture its dynamic nature, hindering effective therapy guidance. Leveraging Electronic Health Records (EHR) through data mining and machine learning offers a promising approach to understanding disease complexity and prognostic trajectories, specifically disease flares.
Objectives:
To validate a machine-learning methodology for identifying SLE phenotypes and flare trajectories in an outpatient setting.
Methods:
An observational retrospective monocenter study was performed using EHR of our Tertiary Care University Hospital. First, we developed a SLE Data Mart combining all HER sources. Then a machine learning algorithm, based on Natural Language Processing (NLP), was created to characterize disease complexity and flares of SLE pts in a primary cohort of adult SLE pts with at least one hospitalization. Further, we validated this algorithm in a second cohort of SLE pts followed only in outpatient setting (internal validation cohort). The inclusion criteria of the validation cohort were: 1) SLE diagnosis (according to ACR/EULAR 2019 criteria); 2) Age > 18; 3) No hospitalizations for SLE disease 4) at least 1 year follow-up, 5) at least 1.5 contacts/year in the period between January 2012 and December 2020; 6) at least one laboratory value available for the patient during follow-up. For each patient, clinical reports including demographics, anamnesis, clinical symptoms, laboratory values, medication orders and therapy, were extracted from the Data Mart, through the NLP pipeline: 1) presence of 8 different SLE clinical domains (hematological, muco-cutaneous, articular, renal, systemic, neurologic, vascular involvement and serositis); 2) disease complexity based on the combination of the involvement of single or multiple organ domains, as well as therapy escalation (low, medium, high); 3) disease flares. Baseline and longitudinal descriptive analyses were performed using median and interquartile values for numerical values and percentage for categorical ones. A p-value<0.05 was considered as significant
Results:
A total of 255 SLE pts with at least one hospitalization were identified in our EHR and considered as primary cohort, while 91 SLE pts were included in the internal validation cohort. The 2 cohorts were comparable for age, sex and disease duration. The median number of clinical domains involved at baseline was higher in the primary cohort [4 (2.5-5)] than in the validation cohort [2 (1, 2.5)], (<0.01); Differences in clinical phenotype were confirmed in the longitudinal analysis, in which the median number of clinical domains involved was higher in the primary cohort [5 (4-6)] compared to the validation cohort [4 (3-4)],(p<0.01). At baseline, SLE complexity was categorized as low, medium and high (13.7%, 34.5% and 51.8% in the primary cohort and 47.3%, 35.2% and 17.6% in the validation cohort, respectively, p_low < 0.01, pmedium > 0.01, p_high < 0.01). The more complex SLE phenotype (i.e. higher number of domains involved) observed in the primary cohort was also confirmed by the higher number of flares [5.0 (2.0-9.0 vs 3 (1-5)], and therefore the higher number of clinical contacts (17.0 (11.0-25.5) vs 12 [6-19.5]), respectively (p<0.01 for both comparisons). Median number of flares significantly increased with disease complexity in the primary cohort [(3.5 (2.0-6.0), 4.0 (2.0-8.0), 6 (3.0-9.2), p<0.05], while they were comparable in the validation cohort [3 (1.0-5.0), 3 (1.0-5.0), 3 (1.0-6.0)]. In addition, the use of steroids was higher in the primary cohort (78.6%), as compared to the validation cohort (52.7%), as well as conventional immunosuppressive treatment intake (73.2% vs 45%) and biologic treatment (29.0% vs 9.8%) (p<0.00001 for all comparisons). The percentage of pts treated with antimalarial was comparable (79.8 vs 87.5%, p=ns).
Conclusion:
The machine learning algorithm effectively describes SLE heterogeneity, enabling the characterization of clinical phenotypes and longitudinal trajectories based on clinical complexity.
REFERENCES:
NIL. Acknowledgements:
This project received financial support from AstraZeneca.
Disclosure of Interests:
Silvia Laura Bosello: None declared, Livia Lilli: None declared, Carlotta Masciocchi: None declared, Laura Antenucci: None declared, Jacopo Lenkowicz: None declared, Augusta Ortolan: None declared, Pier Giacomo Cerasuolo: None declared, Lucia Lanzo: None declared, Silvia Piunno: None declared, Gabriella Castellino Astrazeneca, Marco Gorini Astrazeneca, Stefano Patarnello: None declared, Maria Antonietta D'Agostino: None declared.