Studying factors that contribute to scene memorability is important for understanding human vision and memory. Here we demonstrated in two different eye-tracking datasets that the higher the fixation map consistency (also called inter-observer congruency of fixation maps) of a scene, the higher its memorability is. Fixation map consistency and, more importantly, its correlation to scene memorability were the highest in the first 2 s of viewing, suggesting that scene features (other than center bias) that contribute to producing more consistent fixation maps early in viewing may also be important for scene encoding. We also found that although fixation count was positively correlated with scene memorability, it was not significantly correlated with fixation map consistency, suggesting that these eye-tracking measures reflect different attentional mechanisms. Using the proxies of scene semantics and mediation analyses, we found that the relationship between scene semantics and scene memorability was partially (but not fully) mediated by attentional mechanisms. Finally, we found that fixation map consistency, fixation count, and scene semantics all significantly and differently contributed to scene memorability. Together, these results suggest 2 s of eye-tracking can complement computer vision-based algorithms in better predicting scene memorability.