Recently, cross-scene hyperspectral image classification (HSIC) has attracted increasing attention, alleviating the dilemma of no labeled samples in the target domain. Although collaborative source and target training has dominated this field, training effective feature extractors and overcoming intractable domain gaps remains challenging. To cope with this issue, we propose a multi-level unsupervised domain adaptation (MLUDA) framework, which comprises image-, feature-, and logic-level alignment between domains to fully investigate the comprehensive spectral-spatial information. Specifically, at the image level, we propose an innovative domain adaptation method named GuidedPGC based on classic image matching techniques and the guided filter. The adaptation results are physically explainable with intuitive visual observations. Regarding the feature level, we design a multi-branch cross attention structure (MBCA) specifically for HSIC, which enhances the interaction between the features from the source and target domains through dot-product attention. Finally, at the logic level, we adopt a supervised contrastive learning (SCL) approach that incorporates a pseudo-label strategy and local maximum mean discrepancy loss, increasing inter-class distance across diverse domains and further improving the classification performance. Experimental results on three benchmark cross-scene datasets demonstrate that our proposed method consistently outperforms the compared approaches. The source code is available at https://github.com/cfcys/MLUDA.