Fetal examinations are a significant and challenging field of healthcare. Cardiotocography is the most commonly used method for monitoring fetal heart rate and uterine contractions. As a promising alternative to cardiotocography, fetal phonocardiography is beginning to emerge. It is an entirely non-invasive, passive, and low-cost method. However, it is tough to estimate the ideal form of the fetal sound signal in most cases due to the presence of disturbances. The disturbances originate from movements or rotations of the fetal body, making fetal heart sound processing difficult. This study presents an automatic method for segmenting the fetal heart sounds in a phonocardiographic signal that is loaded with different types of disturbances and analyzes which of these disturbances most affect segmentation accuracy. To provide a comprehensive investigation, we propose a hybrid classifier based on Transformer and eXtreme Gradient Boosting, short for XGBoost, to improve segmentation performance by decision-making integration. 2000 segments of data from the Research Resource for Complex Physiologic Signals, PhysioNet repository, and created synthetic data (873 recordings) were used for the experiment. In the S1 label, our proposed method ranks first among all compared algorithms in precision, recall, F1, and accuracy score, tying with Transformer in recall score. It achieves an accuracy increase of 5% and 1.3% compared to XGBoost and Transformer, respectively. Similarly, in the S2 label, there is a precision score increase of 5.8% and 3.7% compared to XGBoost and Transformer, respectively. In general, our proposed method shows effective and promising performance..