Background Surgical robots are gaining increasing popularity because of their capability to improve the precision of pedicle screw placement. However, current surgical robots rely on unimodal computed tomography (CT) images as baseline images, limiting their visualization to vertebral bone structures and excluding soft tissue structures such as intervertebral discs and nerves. This inherent limitation significantly restricts the applicability of surgical robots. To address this issue and further enhance the safety and accuracy of robot-assisted pedicle screw placement, this study will develop a software system for surgical robots based on multimodal image fusion. Such a system can extend the application range of surgical robots, such as surgical channel establishment, nerve decompression, and other related operations. Methods Initially, imaging data of the patients included in the study are collected. Professional workstations are employed to establish, train, validate, and optimize algorithms for vertebral bone segmentation in CT and magnetic resonance (MR) images, intervertebral disc segmentation in MR images, nerve segmentation in MR images, and registration fusion of CT and MR images. Subsequently, a spine application model containing independent modules for vertebrae, intervertebral discs, and nerves is constructed, and a software system for surgical robots based on multimodal image fusion is designed. Finally, the software system is clinically validated. Discussion We will develop a software system based on multimodal image fusion for surgical robots, which can be applied to surgical access establishment, nerve decompression, and other operations not only for robot-assisted nail placement. The development of this software system is important. First, it can improve the accuracy of pedicle screw placement, percutaneous vertebroplasty, percutaneous kyphoplasty, and other surgeries. Second, it can reduce the number of fluoroscopies, shorten the operation time, and reduce surgical complications. In addition, it would be helpful to expand the application range of surgical robots by providing key imaging data for surgical robots to realize surgical channel establishment, nerve decompression, and other operations.