Imaging through scattering media has been a pervasive issue across many fields. Recently, the optronic fully convolutional neural network (OP-FCNN), an opto-electronic deep learning (DL) method using cascaded optical convolutional layers, has shown feasibility in speckle reconstruction and significance for imaging through scattering media. However, achieving higher performance requires cascading deeper optical layers, posing challenges for compact system designs. Here, we propose the optronic Transformer (OPT), a compact opto-electronic DL structure proven effective for imaging through scattering media. The core of OPT utilizes spatial light modulators (SLMs) and optical micro-lens array shift systems to implement the attention mechanism in optics, enabling global information acquisition without cascading extensive optical layers. In contrast to the complex optronic designs of existing OP-FCNN, OPT achieves impressive imaging performance armed with the optronic Transformer with single-head attention, significantly reducing computational complexity and the scales of the optical system. Our OPT's average imaging performance on handwritten digits and letters achieves 0.80, 0.88, 0.82, and 15.29 dB for JI, PCC, SSIM, and PSNR, respectively. The proposed OPT demonstrates superiority and low complexity for imaging through scattering media, presenting a promising opto-electronic solution for designing compact and high-performance DL imaging systems.