An unbiased assessment of sperm morphology and motility is crucial for assessing fertility potential and guiding visual feedback for microrobotic manipulation. Automated analysis and selection of optimal sperm are essential for in vitro fertilization treatments, such as robotic intracytoplasmic sperm injection. However, conventional image processing methods face limitations in analyzing small sperm objects under microscopic imaging. While convolutional neural networks (CNNs) have brought promising advancements in microscopic image analysis, previous CNN methods have struggled to accurately differentiate tiny objects. These methods often require staining or fluorescence techniques to enhance visual contrast between sperm and culture medium, leading to clinical impracticality. To address these limitations, we introduce a novel sperm recognition network named the sperm feature-correlated network (SFCNet), for accurate and efficient segmentation and tracking of minute sperm objects. The SFCNet employs innovative modules, including collateral multi-scale convolution, cross-scale feature map guide, atrous spatial pyramid convolution with pooling, lateral attention, and multi-scale tracking proposal, to preserve essential sperm details despite their small size. Experimental results indicate that the SFCNet surpassed the state-of-the-art models designed for segmenting or tracking small objects, achieving up to a 28.39% higher Sorensen-Dice coefficient in segmentation and a 10.33% higher average precision in tracking. Additionally, the SFCNet excelled in sperm morphometric analysis, achieving errors below 15%. Moreover, the SFCNet also secured top-tier performance in sperm motility analysis, acquiring errors below 13% in seven sperm motility parameters. Note to Practitioners —This study is stimulated by the need to analyze the quality of motile sperms and select the optimal one for in vitro fertilization. Existing methods for detecting sperm fall short as they require a relatively high-magnification microscopic image or the usage of stain or fluorescence to increase sperm visualization, which limits the selection process or even makes the sperm clinically unavailable. To overcome these limitations, the present work proposes a new framework based on deep learning, which includes the design of extracting multi-scale sperm features. Experimental results suggest that the proposed method can perform better than existing methods in real-time analysis of multiple motile sperms' morphology and motility at 20 $\times$ objective. In the future, there is a high potential for fertility specialists and healthcare workers to apply the presented framework in fertility treatment with higher accuracy and efficiency.