Predicting urban traffic is of great importance to intelligent transportation systems and public safety, yet is very challenging because of two aspects: 1) complex spatio-temporal correlations of urban traffic, including spatial correlations between locations along with temporal correlations among timestamps; 2) diversity of such spatio-temporal correlations, which vary from location to location and depend on the surrounding geographical information, e.g., points of interests and road networks. To tackle these challenges, we proposed a deep-meta-learning based model, entitled ST-MetaNet, to collectively predict traffic in all location at once. ST-MetaNet employs a sequence-to-sequence architecture, consisting of an encoder to learn historical information and a decoder to make predictions step by step. In specific, the encoder and decoder have the same network structure, consisting of a recurrent neural network to encode the traffic, a meta graph attention network to capture diverse spatial correlations, and a meta recurrent neural network to consider diverse temporal correlations. Extensive experiments were conducted based on two real-world datasets to illustrate the effectiveness of ST-MetaNet beyond several state-of-the-art methods.