Deep learning-based methods have recently shown remarkable advancements in multi-exposure image fusion (MEF), demonstrating significant achievements in improving the fusion quality. Despite their success, the majority of reference images in MEF are artificially generated, inevitably introducing a portion of low-quality ones. Existing methods either utilize these mixed-quality reference images for supervised learning or heavily depend on source images for unsupervised learning, making the fusion results challenging to accurately reflect real-world illumination conditions. To overcome the impact of unreliable factors in references, we propose a self-adaptive mean teacher-based semi-supervised learning framework tailored for MEF, termed SAMT-MEF. Its self-adaptiveness is reflected from two perspectives. Firstly, we establish a self-adaptive set to retain the best-ever outputs from the teacher as pseudo labels, employing a well-crafted hybrid metric for its updates. Secondly, we employ contrastive learning to assist the self-adaptive set further in alleviating overfitting to inferior pseudo labels. Our proposed method, backed by abundant empirical evidence, outperforms state-of-the-art methods quantitatively and qualitatively on both reference and non-reference datasets. Furthermore, in some scenarios, the fusion results surpass the reference images, showcasing superior performance in practical applications. Source code are publicly available at https://github.com/hqj9994ever/SAMT-MEF.