ResearchHub | Open Science Community

A new version of ResearchHub is available.Try it now

Healthy Research Rewards

ResearchHub is incentivizing healthy research behavior. At this time, first authors of open access papers are eligible for rewards. Visit the publications tab to view your eligible publications.

Got it

TC

Tiejun Cheng

Author with expertise in Computational Methods in Drug Discovery

Achievements

Cited Author

Open Access Advocate

Key Stats

Upvotes received:

0

Publications:

11

(73% Open Access)

Cited by:

9,921

h-index:

27

/

i10-index:

35

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

PubChem in 2021: new data content and improved web interfaces

Sunghwan Kim et al.Oct 12, 2020

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Molecular Biology

Physical And Theoretical Chemistry

0

Paper

Molecular Biology

Save

PubChem 2019 update: improved access to chemical data

Sunghwan Kim et al.Oct 26, 2018

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.

Molecular Biology

0

Paper

Save

PubChem 2023 update

Sunghwan Kim et al.Oct 13, 2022

Abstract PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the ‘standardize’ option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.

Molecular Biology

Computational Theory And Mathematics

0

Paper

Molecular Biology

Save

Computation of Octanol−Water Partition Coefficients by Guiding an Additive Model with Knowledge

Tiejun Cheng et al.Nov 1, 2007

We have developed a new method, i.e., XLOGP3, for logP computation. XLOGP3 predicts the logP value of a query compound by using the known logP value of a reference compound as a starting point. The difference in the logP values of the query compound and the reference compound is then estimated by an additive model. The additive model implemented in XLOGP3 uses a total of 87 atom/group types and two correction factors as descriptors. It is calibrated on a training set of 8199 organic compounds with reliable logP data through a multivariate linear regression analysis. For a given query compound, the compound showing the highest structural similarity in the training set will be selected as the reference compound. Structural similarity is quantified based on topological torsion descriptors. XLOGP3 has been tested along with its predecessor, i.e., XLOGP2, as well as several popular logP methods on two independent test sets: one contains 406 small-molecule drugs approved by the FDA and the other contains 219 oligopeptides. On both test sets, XLOGP3 produces more accurate predictions than most of the other methods with average unsigned errors of 0.24−0.51 units. Compared to conventional additive methods, XLOGP3 does not rely on an extensive classification of fragments and correction factors in order to improve accuracy. It is also able to utilize the ever-increasing experimentally measured logP data more effectively.

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

PubChem BioAssay: 2017 update

Yanli Wang et al.Nov 9, 2016

PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing.

Molecular Biology

0

Paper

Save

Comparative Assessment of Scoring Functions on a Diverse Test Set

Tiejun Cheng et al.Apr 9, 2009

Scoring functions are widely applied to the evaluation of protein−ligand binding in structure-based drug design. We have conducted a comparative assessment of 16 popular scoring functions implemented in main-stream commercial software or released by academic research groups. A set of 195 diverse protein−ligand complexes with high-resolution crystal structures and reliable binding constants were selected through a systematic nonredundant sampling of the PDBbind database and used as the primary test set in our study. All scoring functions were evaluated in three aspects, that is, "docking power", "ranking power", and "scoring power", and all evaluations were independent from the context of molecular docking or virtual screening. As for "docking power", six scoring functions, including GOLD::ASP, DS::PLP1, DrugScorePDB, GlideScore-SP, DS::LigScore, and GOLD::ChemScore, achieved success rates over 70% when the acceptance cutoff was root-mean-square deviation < 2.0 Å. Combining these scoring functions into consensus scoring schemes improved the success rates to 80% or even higher. As for "ranking power" and "scoring power", the top four scoring functions on the primary test set were X-Score, DrugScoreCSD, DS::PLP, and SYBYL::ChemScore. They were able to correctly rank the protein−ligand complexes containing the same type of protein with success rates around 50%. Correlation coefficients between the experimental binding constants and the binding scores computed by these scoring functions ranged from 0.545 to 0.644. Besides the primary test set, each scoring function was also tested on four additional test sets, each consisting of a certain number of protein−ligand complexes containing one particular type of protein. Our study serves as an updated benchmark for evaluating the general performance of today's scoring functions. Our results indicate that no single scoring function consistently outperforms others in all three aspects. Thus, it is important in practice to choose the appropriate scoring functions for different purposes.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Save

Evaluation of the performance of four molecular docking programs on a diverse set of protein‐ligand complexes

Xun Li et al.Feb 1, 2010

Abstract Many molecular docking programs are available nowadays, and thus it is of great practical value to evaluate and compare their performance. We have conducted an extensive evaluation of four popular commercial molecular docking programs, including Glide, GOLD, LigandFit, and Surflex. Our test set consists of 195 protein‐ligand complexes with high‐resolution crystal structures (resolution ≤2.5 Å) and reliable binding data [dissociation constant ( K d ) or inhibition constant ( K i )], which are selected from the PDBbind database with an emphasis on diversity. The top‐ranked solutions produced by these programs are compared to the native ligand binding poses observed in crystal structures. Glide and GOLD demonstrate better accuracy than the other two on the entire test set. Their results are also less sensitive to the starting structures for docking. Comparison of the results produced by these programs at three different computation levels reveal that their accuracy are not always proportional to CPU cost as one may expect. The binding scores of the top‐ranked solutions produced by these programs are in low to moderate correlations with experimentally measured binding data. Further analyses on the outcomes of these programs on three suites of subsets of protein‐ligand complexes indicate that these programs are less capable to handle really flexible ligands and relatively flat binding sites, and they have different preferences to hydrophilic/hydrophobic binding sites. Our evaluation can help other researchers to make reasonable choices among available molecular docking programs. It is also valuable for program developers to improve their methods further. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

PubChem BioAssay: 2014 update

Yanli Wang et al.Nov 5, 2013

PubChem's BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying critical genes responsible for a biological process or disease condition. The mission of PubChem is to serve the community by providing free and easy access to all deposited data. To this end, PubChem BioAssay is integrated into the National Center for Biotechnology Information retrieval system, making them searchable by Entrez queries and cross-linked to other biomedical information archived at National Center for Biotechnology Information. Moreover, PubChem BioAssay provides web-based and programmatic tools allowing users to search, access and analyze bioassay test results and metadata. In this work, we provide an update for the PubChem BioAssay resource, such as information content growth, new developments supporting data integration and search, and the recently deployed PubChem Upload to streamline chemical structure and bioassay submissions.

0

Paper

Save

iCn3D: From Web-based 3D Viewer to Structural Analysis Tool in Batch Mode

Jiyao Wang et al.Sep 11, 2021

Abstract iCn3D was initially developed as a web-based 3D molecular viewer. It then evolved from visualization into a full-featured interactive structural analysis software. It became a collaborative research instrument through the sharing of permanent, shortened URLs that encapsulate not only annotated visual molecular scenes, but also all underlying data and analysis scripts in a FAIR manner. More recently, with the growth of structural databases, the need to analyze large structural datasets systematically led us to use Python scripts and convert the code to be used in Node.js scripts. We showed a few examples of Python scripts at https://github.com/ncbi/icn3d/tree/master/icn3dpython to export secondary structures or PNG images from iCn3D. Users just need to replace the URL in the Python scripts to export other annotations from iCn3D. Furthermore, any interactive iCn3D feature can be converted into a Node.js script to be run in batch mode, enabling an interactive analysis performed on one or a handful of protein complexes to be scaled up to analysis features of large ensembles of structures. Currently available Node.js analysis scripts examples are available at https://github.com/ncbi/icn3d/tree/master/icn3dnode . This development will enable ensemble analyses on growing structural databases such as AlphaFold or RoseTTAFold on one hand and Electron Microscopy on the other. In this paper, we also review new features such as DelPhi electrostatic potential, 3D view of mutations, alignment of multiple chains, assembly of multiple structures by realignment, dynamic symmetry calculation, 2D cartoons at different levels, interactive contact maps, and use of iCn3D in Jupyter Notebook as described at https://pypi.org/project/icn3dpy .

Molecular Biology

10

Paper

Molecular Biology

Save

PubChem 2025 update

Sunghwan Kim et al.Nov 18, 2024

Abstract PubChem (https://pubchem.ncbi.nlm.nih.gov) is a large and highly-integrated public chemical database resource at NIH. In the past two years, significant updates were made to PubChem. With additions from over 130 new sources, PubChem contains >1000 data sources, 119 million compounds, 322 million substances and 295 million bioactivities. New interfaces, such as the consolidated literature panel and the patent knowledge panel, were developed. The consolidated literature panel combines all references about a compound into a single list, allowing users to easily find, sort, and export all relevant articles for a chemical in one place. The patent knowledge panels for a given query chemical or gene display chemicals, genes, and diseases co-mentioned with the query in patent documents, helping users to explore relationships between co-occurring entities within patent documents. PubChemRDF was expanded to include the co-occurrence data underlying the literature knowledge panel, enabling users to exploit semantic web technologies to explore entity relationships based on the co-occurrences in the scientific literature. The usability and accessibility of information on chemicals with non-discrete structures (e.g. biologics, minerals, polymers, UVCBs and glycans) were greatly improved with dedicated web pages that provide a comprehensive view of all available information in PubChem for these chemicals.

Molecular Biology

Computational Theory And Mathematics

0

Paper

Molecular Biology

Save

Load More