In recent years, the application of deep neural networks in the field of image classification has become increasingly widespread. However, existing researches have exposed the inherent vulnerabilities and opacity of these network models, rendering them susceptible to carefully crafted adversarial attacks. This poses significant security risks to their practical deployment. Given that the majority of models in real-world applications operate in a black-box manner, current research on black-box attacks targeting classification models remains inadequate, facing significant challenges such as incomplete robustness evaluations, low success rates of black-box attacks, and excessive resource consumption during the attack process. To address these challenges, this paper innovatively proposes a black-box attack algorithm based on a cycle-consistent generative network. This algorithm directly generates adversarial perturbations through a generative network that integrates an attention mechanism, deeply learning the complex relationship between adversarial samples and clean sample data. It supports both targeted and untargeted attack modes. Experimental validation on the SAT dataset demonstrates that our method achieves an average white-box attack success rate exceeding 96% in classification tasks and an average success rate of up to 69.4% in black-box transfer attacks, exhibiting excellent transferability across different models. This research contributes significantly to enhancing the effectiveness of black-box attacks, reducing attack costs, and providing novel ideas and methods for enhancing the security of deep learning models.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.