Abstract The mechanisms by which humans perceptually organise individual regions of a visual scene to generate a coherent scene representation remain largely unknown. Our perception of statistical regularities has been relatively well-studied in simple stimuli, and explicit computational mechanisms that use low-level image features (e.g., luminance, contrast energy) to explain these perceptions have been described. Here, we investigate to what extent observers can effectively use such low-level information present in isolated naturalistic scene regions to facilitate associations between said regions. Across two experiments, participants were shown an isolated standard patch, then required to select which of two subsequently presented patches came from the same scene as the standard (2AFC). In Experiment 1, participants were consistently above chance when performing such association judgements. Additionally, participants’ responses were well-predicted by a generalised linear multilevel model (GLMM) employing predictors based on low-level feature similarity metrics (specifically, pixel-wise luminance and phase-invariant structure correlations). In Experiment 2, participants were presented with thresholded image regions, or regions reduced to only their edge content. Their performance was significantly poorer when they viewed unaltered image regions. Nonetheless, the model still correlated well with participants’ judgments. Our findings suggest that image region associations can be reduced to low-level feature correlations, providing evidence for the contribution of such basic features to judgements made on complex visual stimuli.