Abstract The hippocampus and ventromedial prefrontal cortex (vmPFC) play key roles in numerous cognitive domains including mind-wandering, episodic memory and imagining the future. Perspectives differ on precisely how they support these diverse functions, but there is general agreement that it involves constructing representations comprised of numerous elements. Visual scenes have been deployed extensively in cognitive neuroscience because they are paradigmatic multi-element stimuli. However, it remains unclear whether scenes, rather than other types of multi-feature stimuli, preferentially engage hippocampus and vmPFC. Here we leveraged the high temporal resolution of magnetoencephalography to test participants as they gradually built scene imagery from three successive auditorily-presented object descriptions and an imagined 3D space. This was contrasted with constructing mental images of non-scene arrays that were composed of three objects and an imagined 2D space. The scene and array stimuli were, therefore, highly matched, and this paradigm permitted a closer examination of step-by-step mental construction than has been undertaken previously. We observed modulation of theta power in our two regions of interest – anterior hippocampus during the initial stage, and in vmPFC during the first two stages, of scene relative to array construction. Moreover, the scene-specific anterior hippocampal activity during the first construction stage was driven by the vmPFC, with mutual entrainment between the two brain regions thereafter. These findings suggest that hippocampal and vmPFC neural activity is especially tuned to scene representations during the earliest stage of their formation, with implications for theories of how these brain areas enable cognitive functions such as episodic memory.