Abstract Following radiation therapy, a significant challenge in brain metastases (BM) management is differentiating radiation-induced-treatment effect (TrE) from tumor recurrence (TuR). TrE can be indistinguishable from TuR using conventional MRI. Advanced imaging techniques (e.g., perfusion MRI, PET/MRI) are not consistently used, and the standardized Response Assessment in Neuro-Oncology for brain metastases (RANO-BM) is sensitive to inter-reader variability. The performance of an artificial intelligence (AI)-driven risk-of-progression (AiRiP) classifier, which has been shown to capture pathophysiologic differences between TrE and TuR on routine MRI, was compared to that of clinical assessments and advanced imaging methods, in a multi-institutional setting. A total of n=261 lesions with pathologically-confirmed diagnoses in n=189 patients were analyzed. 115 lesions (73 TuR, 42 TrE) from site 1, 86 lesions (38 TuR, 48 TrE) from site 2, and 60 lesions (33 TuR, 27 TrE) from site 3 were used for training and testing the AiRiP-model. Gd-T1w, T2w, FLAIR MRI were preprocessed, and lesions were segmented by experts. Texture features (n=856) were extracted from each lesion. Random-forest classifier was employed for 3-fold cross-validation. Top-performing AiRiP-features, RANO-BM criteria, perfusion MRI and PET/MRI were evaluated in a sub-group analysis. For n=51 lesions on the test-set (site 3), 14 were classified as stable disease and 37 as TuR using RANO-BM (accuracy=54.1%). AiRiP-model achieved an accuracy of 76.5% on the same test-set and accurately classified 78.6% of the stable lesions as TrE or TuR. For another subset of lesions (n=27) on the same test-set, perfusion MRI and AiRiP-model achieved an accuracy of 59.3% and 70.4%, respectively. Lastly, for a subset of lesions (n=35) on the test-set (site 2), multimodal (perfusion, PET) imaging and AiRiP-model accurately classified 60% and 74.3% of lesions, respectively. 15 lesions were considered indeterminate via multimodal imaging, 73.3% of which AiRiP-model accurately classified as TrE or TuR. Our results suggest AI-driven models on conventional MRI may reliably distinguish TuR from TrE.