ABSTRACT Recent literature suggests machine learning methods can capture interactions between loci and therefore could outperform linear models when predicting traits with relevant epistatic effects. However, investigating this empirically requires data with high mapping resolution and phenotypes for traits with known non-additive gene action. The objective of the present study was to compare the performance of linear (GBLUP, BayesB and elastic net [ENET]) methods to a non-parametric tree-based ensemble (gradient boosting machine – GBM) method for genomic prediction of complex traits in mice. The dataset used contained phenotypic and genotypic information for 835 animals from 6 non-overlapping generations. Traits analyzed were bone mineral density (BMD), body weight at 10, 15 and 20 weeks (BW10, BW15 and BW20), fat percentage (FAT%), circulating cholesterol (CHOL), glucose (GLUC), insulin (INS) and triglycerides (TGL), and urine creatinine (UCRT). After quality control, the genotype dataset contained 50,112 SNP markers. Animals from older generations were considered as a reference subset, while animals in the latest generation as candidates for the validation subset. We also evaluated the impact of different levels of connectedness between reference and validation sets. Model performance was measured as the Pearson’s correlation coefficient and mean squared error (MSE) between adjusted phenotypes and the model’s prediction for animals in the validation subset. Outcomes were also compared across models by checking the overlapping top markers and animals. Linear models outperformed GBM for seven out of ten traits. For these models, accuracy was proportional to the trait’s heritability. For traits BMD, CHOL and GLU, the GBM model showed better prediction accuracy and lower MSE. Interestingly, for these three traits there is evidence in literature of a relevant portion of phenotypic variance being explained by epistatic effects. We noticed that for lower connectedness, i.e., imposing a gap of one to two generations between reference and validation populations, the superior performance of GBM was only maintained for GLU. Using a subset of top markers selected from a GBM model helped for some of the traits to improve accuracy of prediction when these were fitted into linear and GBM models. The GBM model showed consistently fewer markers and animals in common among the top ranked than linear models. Our results indicate that GBM is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Nevertheless, our results indicate that GBM is a competitive method to predict complex traits in an outbred mice population, especially for traits with assumed epistatic effects.