Introduction Arrhythmia is an important indication of underlying cardiovascular diseases (CVD) and is prevalent worldwide. Accurate diagnosis of arrhythmia is crucial for timely and effective treatment. Electrocardiogram (ECG) plays a key role in the diagnosis of arrhythmia. With the continuous development of deep learning and machine learning processes in the clinical field, ECG processing algorithms have significantly advanced the field with timely and accurate diagnosis of arrhythmia. Methods In this study, we combined the wavelet time-frequency maps with the novel Swin Transformer deep learning model for the automatic detection of cardiac arrhythmias. In specific practice, we used the MIT-BIH arrhythmia dataset, and to improve the signal quality, we removed the high-frequency noise, artifacts, electromyographic noise and respiratory motion effects in the ECG signals by the wavelet thresholding method; we used the complex Morlet wavelet for the feature extraction, and plotted wavelet time-frequency maps to visualise the time-frequency information of the ECG; we introduced the Swin Transformer model for classification and achieve high classification accuracy of ECG signals through hierarchical construction and self attention mechanism, and combines windowed multi-head self-attention (W-MSA) and shifted window-based multi-head self-attention (SW-MSA) to comprehensively utilise the local and global information. Results To enhance the confidence of the experimental results, we evaluated the performance using intra-patient and inter-patient paradigm analyses, and the model classification accuracies reached 99.34% and 98.37%, respectively, which are better than the currently available detection methods. Discussion The results reveal that our proposed method is superior to currently available methods for detecting arrhythmia ECG. This provides a new idea for ECG based arrhythmia diagnosis.