Abstract Recent achievements in large-scale pre-trained models like GPT-3 and PanGu-α showed amazing performances in many downstream tasks, which makes AI friendlier toward industrial users. Deep learning has been recognized as the most promising technology for pharmaceuticals, a powerful molecule pre-trained model could save researchers tons of time. In chemistry classes, the students learn two molecule representations, the molecular formula and the structure formula, and learn to translate them from one way to the other. Inspired by this, we developed a novel deep learning architecture using a graph-to-sequence asymmetric conditional variational autoencoders, called PanGu Drug Model, which can appropriately characterize the molecule from both representations and improve the performance of downstream drug discovery tasks. After pretrained with 1.7 billion small molecules, our proposed model achieved the state-of-the-art results in 20 drug discovery tasks, such as molecule property prediction (predict ADMET properties, compound-target interactions, drug-drug interactions and chemical reaction productivity), molecule generation and molecule optimization. A new drug screening library of 100 million drug-like small molecules with 99.68% novelty was generated by PanGu molecule generator, which could efficiently produce novel compounds with similar physiochemical properties to given distribution, this library could be used to supplement existing compound databases. In addition, PanGu molecule optimizer could optimize the chemical structures of starting molecule with improved molecular property of interest. An automatic multi-objective optimization web application implemented by PanGu Drug Model is available at http://www.pangu-drug.com/ .