Abstract Background Childhood maltreatment has been associated with gray matter alterations, particularly within limbic and prefrontal regions. However, findings are heterogeneous, potentially due to differing methodologies and sample characteristics. Here, we investigate the cross-cohort replicability of gray matter correlates of childhood maltreatment across large clinical and non-clinical adult samples using harmonized assessment, preprocessing and analysis pipelines. Methods Three independent adult cohorts comprising a total of N=3225 individuals (healthy control [HC]: n=1898 and participants with major depressive disorder [MDD]: n=1327) underwent structural MRI and maltreatment assessment via the Childhood Trauma Questionnaire (CTQ). Associations between childhood maltreatment and voxel-based gray matter volume (GMV) were tested on a wholebrain level in two steps: 1) pooling all three cohorts together to harvest maximum statistical power (applying a voxel-wise FWE-corrected threshold of p FWE <.05) and 2) investigating the replicability of effects by assessing cross-cohort spatial overlap of significant voxels at two liberal uncorrected thresholds (p unc <.001 and p unc <.01). Twelve statistical models were tested, that varied in maltreatment operationalizations, subsamples and covariates. Results Pooling cohorts yielded no significant maltreatment-GMV associations when controlling for lifetime MDD diagnosis. Dropping MDD diagnosis as a covariate yielded significant negative effects of maltreatment within widespread clusters across temporal regions, a fusiform-lingual-parahippocampal complex, the thalamus and the orbitofrontal cortex (k=4970, p FWE <.05). Including only HC subsamples, small clusters emerged either when using the CTQ sum score (k=99, p FWE <.05, orbitofrontal) or when investigating severe forms of maltreatment in HCs (k=132, p FWE <.05, cerebellum). The largest effect size when pooling all three cohorts was partial R 2 =.022. Replicability analyses using a liberal uncorrected thresholding at p unc <.001 yielded maltreatment-GMV associations within all single cohorts and across all statistical models. However, these associations were effectively non-replicable across cohorts, which was largely consistent across statistical models. Even extending the significance threshold to a liberal threshold of p unc <.01 yielded only marginal replicability across cohorts. Conclusions Gray matter correlates of childhood maltreatment, measured with the CTQ, are non-replicable across large cohorts when adequately controlling for depression diagnosis, even when employing harmonized study protocols, lenient statistical thresholds and exploring various maltreatment operationalizations and subgroups. Previous findings may have been inflated by inadequate control for confounding diagnosis effects or due to publication bias. Our findings underscore the importance of a paradigm shift towards investigating the replicability of neuroimaging findings.