In the monitoring and analysis of physiological and biochemical indicators of athletes, traditional data mining (DM) technology cannot extract compelling features and laws when processing high-dimensional and complex multivariate data, and the accuracy of the analysis results is low. The lack of real-time monitoring of the dynamically changing physiological state makes it impossible to detect athletes’ overtraining or fatigue in time, which affects the training effect and the health of athletes. This paper constructs an improved XGBoost (eXtreme Gradient Boosting) model to clean and normalize the collected physiological and biochemical data, remove outliers and fill in missing values, and construct a variable set representing the characteristics of different training periods to provide high-quality input data for subsequent model analysis. This paper combines the SHAP (SHapley Additive exPlanations) method to quantify the importance of each feature, selects the variables that contribute most to the recognition of the training state to optimize the model input, reduce the model complexity, and improve the computational efficiency. Based on the original XGBoost model, the loss function can be adjusted and the adaptive learning rate mechanism can be added to enable the model better to capture the dynamic changes of physiological and biochemical indicators and improve the prediction accuracy. Combined with the prediction results of the improved model, a real-time monitoring system was designed to track the changes in the physiological state of athletes during different training periods, and to issue an alarm when abnormal trends were detected to assist coaches in adjusting training plans. The experimental results show that in the feature evaluation, three key physiological indicators, namely blood oxygen saturation, blood lactate concentration, and heart rate, are extracted, which reduces the computational complexity of the subsequent model. In the four training stages of the basic period, load period, high-intensity period and recovery period, the loss values of the XGBoost model were approximately 0.5, 0.42, 0.4 and 0.35 respectively. In the monitoring data of 4 batches of football players, with 100 players in each batch, the accuracy rate remained above 0.83 and the response time was below 2 s. The experiment proved the effectiveness of the research model in the monitoring and analysis of physiological and biochemical indicators.