Genome-wide selection (GS) represents a contemporary methodology that harnesses a comprehensive array of molecular markers across the entire genome. However, challenges such as lack of informative molecular markers and selection of appropriate and efficient GS model(s) have confined most GS-based breeding efforts to the realm of laboratory simulations (Wang et al., 2023). Compared to the conventional prediction models, the machine learning (ML) algorithm provides new insights for solving challenges such as big data analysis and high-performance parallel computing. GS using ML also has some limitations at the current stage such as limitations in model selection. Here, the MFMGP software is a fusion model that is based on a variety of ML training methods. The normalization fusion method with exponential decay weights involves assigning weights to the prediction results of each model and applying the exponential decay to these weights, so that more recent and/or more relevant model predictions have higher weights. Then, a weighted average of the model's prediction results is calculated to obtain the final fusion prediction by normalizing these weights (Figure 1a). The software of MFMGP for interactive GS analyses was made available at website: http://www.biohuaxing.com/#/MFMGP. To verify the prediction accuracy of the MFMGP model, we compared MFMGP with seven commonly used GS models. These included the classical GS model (GBLUP), four ML-based models (LightGBM, SVR, XGBoost and HGBoost) and two DL-based (DNNGP and DeepCCR) models. In rice, we utilized a natural population, which consists of 3024 (3KRG) Asian cultivated rice accessions to construct the training population (Table S1). The GS accuracy of MFMGP was compared using the phenotype datasets of 2110 rice accessions for 13 yield-related and morphological traits with over 1.0 M SNPs (Figure 1b,c; Table S2). The results of the 10-fold cross-validation (CV) indicated that MFMGP exhibited the highest prediction accuracy for all 13 tested traits, with an average accuracy of 0.53, significantly (P < 0.01) higher than that of the GBLUP model (average value = 0.36). At the same time, the prediction accuracy of MFMGP also significantly higher compared to the average of four ML models (average value = 0.45) and two DL methods (average value = 0.34) (Tables S2 and S3). Comparatively, the prediction accuracy of MFMGP had an average improved advantage of 52.9% over GBLUP, 18.4% over other all ML models, 4.2% over the best model from the four integrated ML methods and 73.3% over the DL models. Additionally, MFMGP had the smallest root mean square error (RMSE) in all 13 traits, or an average 11.1% reduced RMSE over GBLUP, 5.8% reduced RMSE over ML and 24.3% reduced RMSE over DL (Tables S2 and S4). With the sample size of 2110, the computation time using CPU (Server Configuration: Intel®X®(R)CPU E7-8860 v3 @2.20GHZ), the MFMGP model spans a slightly longer duration than the four tested ML models, but significantly shorter than the GBLUP method and DL (using GPU) methods (Table S5). We then used six traits from the 2000 Iranian bread wheat dataset to compare the prediction accuracy of the eight models using 33 709 SNPs (Figure 1d; Table S2). Compared to other seven models, the average prediction accuracy of MFMGP for all six traits was 0.65 as compared with GBLUP (0.32), DeepCCR (0.59), DNNGP (0.57), HGBoost (0.63), LightGBM (0.63), SVR (0.28) and XGBoost (0.62). The prediction accuracy of MFMGP had an average improved advantage of 2.9% over the best model from the four integrated ML methods. Using 1 122 352 SNPs and four traits from 1245 cotton accessions, MFMGP showed the highest prediction accuracy and lowest RMSE values among all methods (Figure 1e; Table S2). On average, MFMGP had an improved prediction accuracy by 12.1% and reduced RMSE by 21.9% for the four traits, when compared to the other seven methods and improved prediction accuracy by 3.5% when compared to the four integrated ML methods. Using 32 599 markers and four traits of 6210 maize samples, MFMGP showed an average prediction accuracy of 0.85, again the highest among the eight methods used, except for DTT with a similar prediction accuracy to SVR (Figure 1f; Table S2). To explore the predictive ability of MFMGP in animals, we used the IMF content phenotype and 39 614 markers of 1490 pig samples for comparing the prediction of the eight methods (Figure 1g; Table S2). MFMGP performed best among all the methods with an average improved prediction accuracy of 24.5% over GBLUP, 57.6% over the ML models, 16.2% over the best model from the four integrated ML methods and 18.5% over the DL models. To investigate the impact of trait heritability, we compared the low heritability trait data of RBSSD (H2 = 0.38) with the high heritability traits, GL (H2 = 0.94) and GW (H2 = 0.94) using MFMGP. We utilized the RBSSD phenotypic data in 2017 as the training population (n = 1277) to predict their phenotypes under two independent environments, yielding the prediction accuracies of 0.36 in 2016 (n = 606) and 0.34 in 2019 (n = 676), respectively. However, when we used the GL and GW from 2017 to predict their phenotypic values in 2015 and 2016 (n = 760), the prediction accuracy of GL and GW reached very high average values of 0.91 and 0.92, respectively (Figure 1h). The four density plots all showed that the angles between the y = x and the fitted regression line were very small in the repeated experiments across different environments (Figure S1). To verify the influence of subspecific differences on GS accuracy, we randomly selected two subgroups with the same number accessions (n = 500) from Xian and Geng. We used MFMGP to analyse two representative traits (GW and HD), and found that the prediction accuracy of Geng was higher than that of Xian for GW, but the opposite was true for HD. Additionally, we used the Xian subgroup as the training population to predict the accuracy of the Geng subgroup, as well as used the Geng as the training population to test the prediction accuracy of the Xian. The results showed that the prediction accuracy of one subgroup for another was extremely low (Figure S2A). The same cautions should be taken when GS is to be applied to breeding for disease resistance. As Figure S2B clearly demonstrated, the highly virulent race (V) had a much higher prediction accuracy than the weak virulent races C4 and C5. To verify the impact of different population sizes on GS, we randomly selected nine accession numbers for GS. The GS analysis results showed that the prediction accuracy of the trait improved gradually with the increase of population sizes (Figure 1i). In summary, we developed a ML fusion model for predicting the phenotypes of breeding populations for complex traits using GS. Compared with other methods, MFMGP was proven to have the following advantages. (1) Improved prediction accuracy: MFMGP was able to integrate the strengths of many classical models and reduce the biases associated with single classical models. (2) Reduced overfitting: MFMGP was able to mitigate the problem of overfitting training data commonly encountered by other single models. (3) Enhanced generalization ability: MFMGP could better capture the complex patterns and diversity in the data. (4) Robustness to errors: MFMGP could effectively reduce prediction errors due to anomalies or specific circumstances by single models through synthesizing the predictions of multiple models. (5) Exploitation of model complementarity. Currently, most GS experiments focus on predicting performances of single traits of specific populations in specific environments, neglecting the fact that most plant and animal breeding programmes are aiming at improving multiple target traits across target environments (particularly plants). The most significant factors affecting predictive accuracy are heritability and sample size. As the key parameter of the genotype–phenotype association, the higher a trait's heritability is, the more accurate a GS model would be. Conversely, low heritability leads to lower model prediction accuracy. Insufficient sample size reduces representativeness of the training population due to increased sampling error, resulting in biased estimates of genetic parameters and reduced prediction accuracy. Thus, it is necessary to collect more phenotypes of training populations of appropriate sizes across multiple target environments such that trait genetic effects and their interactions with environments can be adequately estimated and integrated into the MFMGP model. As the plant and animal functional and population genomic research progress rapidly, the greatest challenge is how to integrate accurate functional information of many genes and allelic effects on target traits into the MFMGP model in GS applications in plant and animal breeding and eventually realizing breeding by design in future. This work was supported by the National Natural Science Foundation of China (U21A20214), Natural Science Foundation of Anhui Province (2308085QC91) and National Natural Science Foundation of China (32301783 and 32101768) (Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-CSIAF-202303); Nanfan special project, CAAS (YYLH2309, YBXM2322, YYLH2401)). The authors declare no conflicts of interest. Z.L. and S.J. designed the experiments. J.H., S.J., W.W., F.Z., E.L. and Y.S. provided the phenotype data and performed the statistical analysis. C.Z., Q.L., Y.Y., F.L., Z.X. and F.L. performed the bioinformatic analyses. C.Z., M.L. and Z.L. wrote the manuscript. The data that support the findings of this study are available on request from the corresponding author upon reasonable request. Table S1–S5 Supplementary Tables. Figure S1–S3 Supplementary Figures. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.