Abstract Low generalization to the patient cohort and variety of experimental conditions in the proteomic search for disease biomarkers are among the main reasons for the bumpy road of quantitative proteomics from discovery stage to clinical validation. Only a small fraction of biomarkers discovered so far by proteomic analysis reaches clinical trials. Here, we presented a machine learning-based workflow for proteomics data analysis, which partially solves some of these issues. In particular, we used a customized decision tree model, which was regulated using a newly introduced parameter, min_cohorts_leaf, that resulted in better generalization of trained models. Further, we analyzed the trend of feature importance’s curve as a function of min_cohorts_leaf parameter and found that it could be used for accurate feature selection to obtain a list of proteins with significantly improved generalization. Finally, we demonstrated that the recently introduced DirectMS1 search algorithm for protein identification and quantitation provides a simple, yet, a highly efficient solution for the problem of combining multiple data sets obtained using different experimental settings. The developed workflow was tested using five published LC-MS/MS data sets obtained in the large consortia studies of Alzheimer’s disease brain samples. The selected data sets consist of 535 files in total analyzed using label-free single-shot data-dependent or data-independent acquisitions. Using the proposed modified ExtraTrees model we found that the expressions of two proteins involved in ferroptosis Serotransferrin TRFE and DNA repair nuclease/redox regulator APEX1, are important for explaining a lack of dementia for patients with the presence of neuritic plaques and neurofibrillary tangles.