Abstract Machine learning algorithms are increasingly used to identify brain connectivity biomarkers linked to behavior and clinical outcomes. However, non-standard methodological choices in neuroimaging datasets, especially those with families or twins, have prevented robust machine learning applications. Additionally, prioritizing prediction accuracy over biological interpretability has made it challenging to understand the biological processes behind psychopathology. In this study, we employed a linear support vector regression model to study the relationship between resting-state functional connectivity networks and chronological age using data from the Human Connectome Project. We examined the effect of shared variance from twins and siblings by using cross-validation, either randomly assigning or keeping family members together. We also compared models with and without a Pearson feature filter and utilized a network enrichment approach to identify predictive brain networks. Results indicated that not accounting for shared family variance inflated prediction performance, and the Pearson filter reduced accuracy and reliability. Enhancing biological interpretability was achieved by inverting the machine learning model and applying network-level enrichment on the connectome, while directly using regression coefficients as feature weights led to misleading interpretations. Our findings offer crucial insights for applying machine learning to neuroimaging data, emphasizing the value of network enrichment for comprehensible biological interpretation.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.