Abstract Bees possess remarkable cognitive abilities in on-the-fly visual learning, making them an ideal model for studying active information acquisition and representation. In this study, we investigated the minimal circuitry required for active vision in bees by considering their flight behaviours during visual pattern scanning. By developing a neural network model inspired by the insect visual system, we examined the influence of scanning behaviour on optic lobe connectivity and neural activity. Through the incorporation of non-associative learning and exposure to diverse natural images, we obtained compelling results that align with neurobiological observations. Our findings reveal that active scanning and non-associative learning dynamically shape the connectivity within the visual lobe, resulting in an efficient representation of visual input. Interestingly, we observed self-organization in orientation-selective neurons in the lobula region, characterized by sparse responses to orthogonal bar movements. These dynamic orientation-selective cells cover various orientations, exhibiting a bias towards the speed and contrast of input sampling. To assess the effectiveness of this spatiotemporal coding for pattern recognition, we integrated our model with the mushroom body circuitry underlying associative learning. Notably, our model demonstrated impressive performance across several pattern recognition tasks, suggesting a similar coding system within the bee visual system. Overall, this study integrates behavioural experiments, neurobiological findings, and computational models to reveal how complex visual features can be condensed through spatiotemporal encoding in the lobula neurons, facilitating efficient sampling of visual cues for identifying rewarding foraging resources. Our findings have broader implications for understanding active vision in diverse animals, including humans, and offer valuable insights for the application of bio-inspired principles in the design of autonomous robots.