Abstract Visual perception plays a critical role in navigating 3D space and extracting semantic information crucial to survival. Even though visual stimulation on the retina is fundamentally 2D, we seem to perceive the world around us in vivid 3D effortlessly. This reconstructed 3D space is allocentric and faithfully represents the external 3D world. How can we recreate stable 3D visual space so promptly and reliably? To solve this mystery, we have developed new concepts MePMoS (Memory-Prediction-Motion-Sensing) and NHT (Neural Holography Tomography). These models state that visual signal processing must be primarily top-down, starting from memory and prediction. Our brains predict and construct the expected 3D space holographically using traveling alpha brainwaves. Thus, 3D space is represented by the three time signals in three directions. To test this hypothesis, we designed reaction time (RT) experiments to observe predicted space-to-time conversion, especially as a function of distance. We placed LED strips on a horizontal plane to cover distances from close up to 2.5 m or 5 m, either using a 1D or a 2D lattice. Participants were instructed to promptly report observed LED patterns at various distances. As expected, stimulation at the fixation cue location always gave the fastest RT. Additional RT delays were proportional to the distance from the cue. Furthermore, both covert attention (without eye movements) and overt attention (with eye movements) created the same RT delays, and both binocular and monocular views resulted in the same RTs. These findings strongly support our predictions, in which the observed RT-depth dependence is indicative of the spatiotemporal conversion required for constructing allocentric 3D space. After all, we perceive and measure 3D space by time as Einstein postulated a century ago.