ResearchHub | Open Science Community

0

Beware of q2!

Alexander Golbraikh et al.Jan 1, 2002

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

3,439

0

Save

0

The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models

Alexander Tropsha et al.Apr 1, 2003

Abstract This paper emphasizes the importance of rigorous validation as a crucial, integral component of Quantitative Structure Property Relationship (QSPR) model development. We consider some examples of published QSPR models, which in spite of their high fitted accuracy for the training sets and apparent mechanistic appeal, fail rigorous validation tests, and, thus, may lack practical utility as reliable screening tools. We present a set of simple guidelines for developing validated and predictive QSPR models. To this end, we discuss several validation strategies including (1) randomization of the modelled property, also called Y‐scrambling, (2) multiple leave‐many‐out cross‐validations, and (3) external validation using rational division of a dataset into training and test sets. We also highlight the need to establish the domain of model applicability in the chemical space to flag molecules for which predictions may be unreliable, and discuss some algorithms that can be used for this purpose. We advocate the broad use of these guidelines in the development of predictive QSPR models.

Philosophy

Artificial Intelligence

0

Paper

Save

QSAR Modeling: Where Have You Been? Where Are You Going To?

Artem Cherkasov et al.Dec 18, 2013

Quantitative structure–activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.

Pharmacology

Materials Chemistry

0

Paper

Save

Deep reinforcement learning for de novo drug design

Mariya Popova et al.Jul 6, 2018

We propose a novel computational strategy for de novo design of molecules with desired properties termed ReLeaSE (Reinforcement Learning for Structural Evolution). Based on deep and reinforcement learning approaches, ReLeaSE integrates two deep neural networks - generative and predictive - that are trained separately but employed jointly to generate novel targeted chemical libraries. ReLeaSE employs simple representation of molecules by their SMILES strings only. Generative models are trained with stack-augmented memory network to produce chemically feasible SMILES strings, and predictive models are derived to forecast the desired properties of the de novo generated compounds. In the first phase of the method, generative and predictive models are trained separately with a supervised learning algorithm. In the second phase, both models are trained jointly with the reinforcement learning approach to bias the generation of new chemical structures towards those with the desired physical and/or biological properties. In the proof-of-concept study, we have employed the ReLeaSE method to design chemical libraries with a bias toward structural complexity or biased toward compounds with either maximal, minimal, or specific range of physical properties such as melting point or hydrophobicity, as well as to develop novel putative inhibitors of JAK2. The approach proposed herein can find a general use for generating targeted chemical libraries of novel compounds optimized for either a single desired property or multiple properties.

Philosophy

Artificial Intelligence

0

Paper

Save

Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research

Denis Fourches et al.Jun 24, 2010

ADVERTISEMENT RETURN TO ISSUEPerspectiveNEXTTrust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ResearchDenis Fourches†, Eugene Muratov†‡, and Alexander Tropsha*†View Author Information Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, North Carolina 27599, and Laboratory of Theoretical Chemistry, Department of Molecular Structure, A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, 65080, Ukraine* To whom correspondence should be addressed. E-mail: [email protected]†University of North Carolina at Chapel Hill.‡A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine.Cite this: J. Chem. Inf. Model. 2010, 50, 7, 1189–1204Publication Date (Web):June 24, 2010Publication History Received5 May 2010Published online24 June 2010Published inissue 26 July 2010https://pubs.acs.org/doi/10.1021/ci100176xhttps://doi.org/10.1021/ci100176xreview-articleACS PublicationsCopyright © 2010 American Chemical SocietyRequest reuse permissionsArticle Views6405Altmetric-Citations559LEARN ABOUT THESE METRICSArticle Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated. Share Add toView InAdd Full Text with ReferenceAdd Description ExportRISCitationCitation and abstractCitation and referencesMore Options Share onFacebookTwitterWechatLinked InRedditEmail Other access optionsGet e-Alertsclose SUBJECTS:Bioinformatics and computational biology,Chemical structure,Molecular structure,Software,Structure activity relationship Get e-Alerts

Molecular Biology

Materials Chemistry

0

Paper

Save

Rational selection of training and test sets for the development of validated QSAR models.

Alexander Golbraikh et al.Jan 1, 2003

Philosophy

Artificial Intelligence

0

Paper

Save

Beware of R²: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

David Alexánder et al.Jun 22, 2015

The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R(2), as a measure of model fit and predictive power in QSAR and QSPR modeling. R(2) (or r(2)) has been used in various contexts in the literature in conjunction with training and test data for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha ( J. Mol. Graphics Modell. 2002 , 20 , 269 - 276 ) in a strict statistical manner. Shortcomings in these criteria are identified, and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data but rather to guide the application of R(2) as a model fit statistic. Examples are used to illustrate both correct and incorrect uses of R(2). Reporting the root-mean-square error or equivalent measures of dispersion, which are typically of more practical importance than R(2), is also encouraged, and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.

Artificial Intelligence

Analytical Chemistry

0

Paper

Artificial Intelligence

570

0

Save

0

Universal fragment descriptors for predicting properties of inorganic crystals

Olexandr Isayev et al.Jun 5, 2017

Abstract Although historically materials discovery has been driven by a laborious trial-and-error process, knowledge-driven materials design can now be enabled by the rational combination of Machine Learning methods and materials databases. Here, data from the AFLOW repository for ab initio calculations is combined with Quantitative Materials Structure-Property Relationship models to predict important properties: metal/insulator classification, band gap energy, bulk/shear moduli, Debye temperature and heat capacities. The prediction’s accuracy compares well with the quality of the training data for virtually any stoichiometric inorganic crystalline material, reciprocating the available thermomechanical experimental data. The universality of the approach is attributed to the construction of the descriptors: Property-Labelled Materials Fragments. The representations require only minimal structural input allowing straightforward implementations of simple heuristic design rules.

Organic Chemistry

Condensed Matter Physics

0

Paper

Save

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Alexander Golbraikh et al.Jan 1, 2002

Ecology

Philosophy

0

Paper

Save

Novel Variable Selection Quantitative Structure−Property Relationship Approach Based on thek-Nearest-Neighbor Principle

Weifan Zheng et al.Nov 19, 1999

A novel automated variable selection quantitative structure-activity relationship (QSAR) method, based on the kappa-nearest neighbor principle (kNN-QSAR) has been developed. The kNN-QSAR method explores formally the active analogue approach, which implies that similar compounds display similar profiles of pharmacological activities. The activity of each compound is predicted as the average activity of K most chemically similar compounds from the data set. The robustness of a QSAR model is characterized by the value of cross-validated R2 (q2) using the leave-one-out cross-validation method. The chemical structures are characterized by multiple topological descriptors such as molecular connectivity indices or atom pairs. The chemical similarity is evaluated by Euclidean distances between compounds in multidimensional descriptor space, and the optimal subset of descriptors is selected using simulated annealing as a stochastic optimization algorithm. The application of the kNN-QSAR method to 58 estrogen receptor ligands as well as to several other groups of pharmacologically active compounds yielded QSAR models with q2 values of 0.6 or higher. Due to its relative simplicity, high degree of automation, nonlinear nature, and computational efficiency, this method could be applied routinely to a large variety of experimental data sets.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

444

0

Save

Beware of q2!

The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models

QSAR Modeling: Where Have You Been? Where Are You Going To?

Deep reinforcement learning for de novo drug design

Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research

Rational selection of training and test sets for the development of validated QSAR models.

Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

Universal fragment descriptors for predicting properties of inorganic crystals

Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection.

Novel Variable Selection Quantitative Structure−Property Relationship Approach Based on thek-Nearest-Neighbor Principle

Beware of R²: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models