Deep learning faces challenges in the surface defect segmentation of strip steel. Firstly, insufficient processing of feature maps leads to the loss of task-specific feature information. Secondly, the segmentation of defects with long-tail distributions is not accurate enough. To address these issues, a pixel-level deep segmentation method called task-specific encoder–decoder network (TSEDNet) is proposed to construct an end-to-end defect segmentation model. TSEDNet includes the encoder-multi-decoder structure based on domain knowledge settings tailored to specific tasks, which can achieve effective feature representation and significantly reduce the impact of imbalanced defect quantities. Additionally, a novel metric learning method is introduced to optimize decoder selection. Furthermore, the feature fusion module based on metric learning is proposed to utilize general features for restoring task-specific details, thereby enhancing pixel-level segmentation accuracy. Through experiments and industrial validation, the defect segmentation network demonstrates superior performance compared to other advanced segmentation methods and proves its applicability in practical scenarios.