Infrared and visible image fusion (IVIF) aims to fuse these two modal images to generate a single image with rich textures and clear targets. Most current deep learning-based fusion methods directly fuse the features of these two modal images, without fully considering their specific attributes, which causes the fusion image to be more inclined to contain the features of a certain modality. In this paper, a two-stage feature transfer and supplement fusion network (FTSFN) is proposed for IVIF. In the first stage, a feature transfer network (FTN) is proposed to reduce the domain gap between the two modal images by transferring the modal features from one to another. Based on the constructed FTN and the input images, two networks, FTN ir and FTN vis , are pre-trained to obtain the optimized infrared and visible features. In the second stage, a feature supplement fusion network (FSFN) is built by constructing two network branches with shared weights to achieve the fusion of the optimized features. In FSFN, two feature supplement modules, the intensity-based feature supplement module (IFSM) and gradient-based feature supplement module (GFSM), are designed to complement the intensity and texture information of the two optimized features. In addition, to better train the FTNs and FTSFN, different loss functions are defined by exploiting the domain features of the source images. Extensive experiments on the widely used fusion datasets have verified the effectiveness and superiority of the proposed FTSFN in terms of subjective perception and objective evaluation. Specifically, the proposed method can obtain fused images with better contrast and saliency information compared to other methods. In addition, our method improves the mutual information (MI) metrics by 33.3%, 10.0% and 11.6% compared to the second-best comparison approach on TNO, INO, and RoadScene datasets, respectively.