In this letter, to better supplement the advantages of features at different levels and improve the feature extraction ability of the network, a novel multi-level feature interaction transformer network (MFITN) is proposed for pansharpening, aiming to fuse multispectral (MS) and panchromatic (PAN) images. In MFITN, a multi-level feature interaction transformer encoding module is designed to extract and correct global multi-level features by considering the modality difference between source images. These features are then fused using the proposed multi-level feature mixing (MFM) operation, which enables features to fuse interactively to obtain richer information. Furthermore, the global features are fed into a CNN-based local decoding module to better reconstruct high-spatial-resolution multispectral (HRMS) images. Additionally, based on the spatial consistency between MS and PAN images, a band compression loss is defined to improve the fidelity of fused images. Numerous simulated and real experiments demonstrate that the proposed method has the optimal performance compared to state-of-the-art methods. Specifically, the proposed method improves the SAM metric by 7.89% and 6.41% compared to the second-best comparison approach on Pléiades and WorldView-3, respectively.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.