In recent decades, numerous perceptual authentication hashing schemes have been proposed for image content authentication. However, most of these schemes are based on a single spatial or transform domain, and they fail to provide satisfactory robustness and discrimination capability when facing complex image manipulations in real scenarios. In this work, we present a perceptual authentication hashing scheme based on Convolutional Neural Network (CNN) that leverages both spatial and frequency domains. Specifically, we construct two separate streams for spatial and transform domains. Then, we introduce a feature fusion module to merge the features of these two domains to generate a hash sequence. Besides, we design a frequency domain channel filter and a frequency attention module for the frequency domain, and introduce a frequency domain loss function to optimize model training. Based on large-scale testing datasets, our scheme demonstrates superior performance compared to state-of-the-art schemes, as evidenced by Receiver Operating Characteristic (ROC) curves.