Few-shot anomaly classification (FSAC) is a vital task in manufacturing industry. Recent methods focus on utilizing CLIP in zero/few normal shot anomaly detection instead of custom models. However, there is a lack of specific text prompts in anomaly classification and most of them ignore the modality gap between image and text. Meanwhile, there is distribution discrepancy between the pre-trained and the target data. To provide a remedy, in this paper, we propose a method to boost CLIP for few-normal-shot anomaly classification, dubbed CLIP-FSAC, which contains two-stage of training and alternating fine-tuning with two modality-specific adapters. Specifically, in the first stage, we train image adapter with text representation output from text encoder and introduce an image-to-text tuning to enhance multi-modal interaction and facilitate a better language-compatible visual representation. In the second stage, we freeze the image adapter to train the text adapter. Both of them are constrained by fusion-text contrastive loss. Comprehensive experiment results are provided for evaluating our method in few-normal-shot anomaly classification, which outperforms the state-of-the-art method by 12.2%, 10.9%, 10.4% AUROC on VisA for 1, 2, and 4-shot settings.