Tropical and subtropical Asia is the major rice-producing region in the world, but the complexity of the cropping system and the diversity of the topography bring challenges to the accurate monitoring of rice cultivation. To address this difficulty, a new deep learning model, ESF-Seg, is proposed in this study to extract the annual tropical rice distribution using monthly averaged time-series Sentinel-1 VH data. The ESF-Seg adopts the Efficient Adaptive Sparse Transformer (EAT) to remove redundant information from input features. The Channel Attention Bridge Block (CAB) and Spatial Attention Bridge Block (SAB) modules are introduced to refine the information. Meanwhile, with the FreqFusion-KAN (FreqK) module, the loss of information can be reduced through the multi-scale feature fusion strategy. The proposed method is evaluated in the Hainan Province of China, an important tropical arable zone with diverse crop resources and complicated croplands. First, ablation experiments are conducted. Compared to the classical SegFormer model, the ESF-Seg model improves on the mIOU by 4.99% and on the mPA by 2.65%. Subsequently, compared to the RF, U-Net, and the original SegFormer model, the overall accuracy (OA) of the ESF-Seg model on the validation samples increased by 11.02%, 2.01%, and 1.33%, and the F1 score improved by 0.0756, 0.0624, and 0.0490, reaching 98.31% and 0.9506, respectively. Furthermore, products showing the annual rice distribution from 2019 to 2023 in Hainan are generated, which exhibit good alignments with the statistical data, surpassing other existing products with an RMSE of 5.4004 Kha. As indicated by the rice mapping products, the proposed method preserves the integrity of the rice parcels in the fragmented croplands, thus providing a new opportunity for the continuous monitoring of tropical rice distribution with high accuracy.