Forest canopy height (FCH) is a critical parameter for forest management and ecosystem modeling, but there is a lack of accurate FCH distribution in large areas. To address this issue, this study selected Wuyishan National Park in China as a case study to explore the calibration method for mapping FCH in a complex subtropical mountainous region based on ZiYuan-3 (ZY3) stereo imagery and limited Unmanned Aerial Vehicle (UAV) LiDAR data. Pearson’s correlation analysis, Categorical Boosting (CatBoost) feature importance analysis, and causal effect analysis were used to examine major factors causing extraction errors of digital surface model (DSM) data from ZY3 stereo imagery. Different machine learning algorithms were compared and used to calibrate the DSM and FCH results. The results indicate that the DSM extraction accuracy based on ZY3 stereo imagery is primarily influenced by slope aspect, elevation, and vegetation characteristics. These influences were particularly notable in areas with a complex topography and dense vegetation coverage. A Bayesian-optimized CatBoost model with directly calibrating the original FCH (the difference between the DSM from ZY3 and high-precision digital elevation model (DEM) data) demonstrated the best prediction performance. This model produced the FCH map at a 4 m spatial resolution, the root mean square error (RMSE) was reduced from 6.47 m based on initial stereo imagery to 3.99 m after calibration, and the relative RMSE (rRMSE) was reduced from 36.52% to 22.53%. The study demonstrates the feasibility of using ZY3 imagery for regional forest canopy height mapping and confirms the superior performance of using the CatBoost algorithm in enhancing FCH calibration accuracy. These findings provide valuable insights into the multidimensional impacts of key environmental factors on FCH extraction, supporting precise forest monitoring and carbon stock assessment in complex terrains in subtropical regions.