The classic view that neural populations in sensory cortices preferentially encode responses to incoming stimuli has been strongly challenged by recent experimental studies. Despite the fact that a large fraction of variance of visual responses in rodents can be attributed to behavioral state and movements, trial-history, and salience, the effects of contextual modulations and expectations on sensory-evoked responses in visual and association areas remain elusive. Here, we present a comprehensive experimental and theoretical study showing that hierarchically connected visual and association areas differentially encode the temporal context and expectation of naturalistic visual stimuli, consistent with the theory of hierarchical predictive coding. We measured neural responses to expected and unexpected sequences of natural scenes in the primary visual cortex (V1), the posterior medial higher order visual area (PM), and retrosplenial cortex (RSP) using 2-photon imaging in behaving mice collected through the Allen Institute Mindscope's OpenScope program. We found that information about image identity in neural population activity depended on the temporal context of transitions preceding each scene, and decreased along the hierarchy. Furthermore, our analyses revealed that the conjunctive encoding of temporal context and image identity was modulated by expectations of sequential events. In V1 and PM, we found enhanced and specific responses to unexpected oddball images, signaling stimulus-specific expectation violation. In contrast, in RSP the population response to oddball presentation recapitulated the missing expected image rather than the oddball image. These differential responses along the hierarchy are consistent with classic theories of hierarchical predictive coding whereby higher areas encode predictions and lower areas encode deviations from expectation. We further found evidence for drift in visual responses on the timescale of minutes. Although activity drift was present in all areas, population responses in V1 and PM, but not in RSP, maintained stable encoding of visual information and representational geometry. Instead we found that RSP drift was independent of stimulus information, suggesting a role in generating an internal model of the environment in the temporal domain. Overall, our results establish temporal context and expectation as substantial encoding dimensions in the visual cortex subject to fast representational drift and suggest that hierarchically connected areas instantiate a predictive coding mechanism.