Object detection is a crucial task in the field of remote sensing. Currently, frame-based algorithms have demonstrated impressive performance. However, research on remote sensing applying event cameras has not yet been conducted. Meanwhile, there are still three issues to address: 1) Remote sensing targets are often disrupted by complex backgrounds, resulting in poor detection performance, especially in extremely challenging environments (e.g., low-light, motion blur, and occlusion scenarios). 2) Mainstream deep learning neural networks primarily employ discrete random sampling training strategies, which limits the system to leverage continuous temporal information. 3) The distribution shift problem arising from uneven data in streaming training poses challenges for temporal object detection. In this work, we provide the Remote Sensing Event-based Object Detection Dataset (RSEOD), which is the first remote sensing dataset utilizing event cameras while including various intractable scenarios, providing a novel perspective for object detection in challenging scenarios. Additionally, we innovatively propose an event-based streaming training strategy that utilizes asynchronous event streams to address detection challenges caused by prolonged occlusion and out-of-focus. Moreover, we introduce a reversible normalization criterion (RevNorm) to eliminate non-stationary information in temporal data, proposing a Streaming Bidirectional Feature Pyramid Network (SBFPN) to facilitate recursive data transmission along the temporal dimension. Extensive experiments on the RSEOD Dataset demonstrate that our method achieves 38.1% mAP@0.5:0.95 and 55.8% mAP@0.5, outperforming all other state-of-the-art object detection approaches (e.g., YOLOv8, YOLOv10, YOLOv11, DINO, RTDETR, RTDETRv2, SODFormer). The dataset and code are released at https://github.com/Jushl/ESVT.