Abstract Predicting brain age from T1-weighted MRI is a promising marker for understanding brain aging and its associated conditions. While deep learning models have shown success in reducing the Mean Absolute Error (MAE) of predicted brain age, concerns about robust and accurate generalization in new data limit their clinical applicability. The large number of trainable parameters, combined with limited medical imaging training data, contribute to this challenge, often resulting in a generalization gap where there is a significant discrepancy between model performance on training data versus unseen data. In this study, we assess a deep model, SFCN-reg, based on the VGG-16 architecture, and address the generalization gap through comprehensive preprocessing, extensive data augmentation, and model regularization. Using training data from the UK Biobank, we demonstrate substantial improvements in model performance. Specifically, our approach reduces the generalization MAE by 44% (from 5.25 to 2.96 years) in the Alzheimer’s Disease Neuroimaging Initiative dataset and by 22% (from 4.35 to 3.40 years) in the Australian Imaging, Biomarker and Lifestyle dataset. Furthermore, we achieve a 29% reduction in scan-rescan error (from 0.86 to 0.61 years) while enhancing the model’s robustness to registration errors. Feature importance maps highlight anatomical regions used to predict age. These results highlight the critical role of high-quality preprocessing and robust training techniques in improving accuracy and narrowing the generalization gap, both necessary steps towards the clinical use of brain age prediction models. Our study makes valuable contributions to neuroimaging research by offering a potential pathway to improve the clinical applicability of deep learning models.