Our study presents the development and implementation of a neural network-based smart city security monitoring system tailored for the urban environment of Beijing. Leveraging multimodal data integration, the system processes over 1,200 hours of video footage, 800 hours of audio recordings, and 400 hours of thermal data to provide comprehensive surveillance and real-time anomaly detection. The system achieved high accuracy rates of 96% for overcrowding detection, 93% for unauthorized access, and 90% for unattended objects, with corresponding precision rates of 96%, 95%, and 93%. The recall rates were slightly lower, at 89%, 87%, and 85%, respectively. Our system's implementation of edge computing enabled rapid response times, recorded at 1.5 seconds for subway stations, 2.0 seconds for Tiananmen Square, and 1.2 seconds for public transport hubs. These results underscore the system's effectiveness in delivering real-time monitoring and timely alerts, crucial for managing high-density areas and critical infrastructure in Beijing. The integration of advanced AI techniques, including transfer learning and Generative Adversarial Networks (GANs), further enhanced the system's adaptability and robustness in detecting rare and unlabeled events. This study highlights the system's potential to significantly improve urban security infrastructure, offering a scalable and efficient solution for smart city applications.