ResearchHub | Open Science Community

Functional genomic hypothesis generation and experimentation by a robot scientist

Ross King et al.Jan 1, 2004

The question of whether it is possible to automate the scientific process is of both great theoretical interest1,2 and increasing practical importance because, in many scientific areas, data are being generated much faster than they can be effectively analysed. We describe a physically implemented robotic system that applies techniques from artificial intelligence3,4,5,6,7,8 to carry out cycles of scientific experimentation. The system automatically originates hypotheses to explain observations, devises experiments to test these hypotheses, physically runs the experiments using a laboratory robot, interprets the results to falsify hypotheses inconsistent with the data, and then repeats the cycle. Here we apply the system to the determination of gene function using deletion mutants of yeast (Saccharomyces cerevisiae) and auxotrophic growth experiments9. We built and tested a detailed logical model (involving genes, proteins and metabolites) of the aromatic amino acid synthesis pathway. In biological experiments that automatically reconstruct parts of this model, we show that an intelligent experiment selection strategy is competitive with human performance and significantly outperforms, with a cost decrease of 3-fold and 100-fold (respectively), both cheapest and random-experiment selection.

Genetics

Artificial Intelligence

0

Paper

Save

The Automation of Science

Ross King et al.Apr 2, 2009

The basis of science is the hypothetico-deductive method and the recording of experiments in sufficient detail to enable reproducibility. We report the development of Robot Scientist “Adam,” which advances the automation of both. Adam has autonomously generated functional genomics hypotheses about the yeast Saccharomyces cerevisiae and experimentally tested these hypotheses by using laboratory automation. We have confirmed Adam's conclusions through manual experiments. To describe Adam's research, we have developed an ontology and logical language. The resulting formalization involves over 10,000 different research units in a nested treelike structure, 10 levels deep, that relates the 6.6 million biomass measurements to their logical description. This formalization describes how a machine contributed to scientific knowledge.

Philosophy

Artificial Intelligence

0

Paper

Save

Identification and application of the concepts important for accurate and reliable protein secondary structure prediction

Ross King et al.Nov 1, 1996

M

R

A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.

Philosophy

Biochemistry

0

Paper

Save

Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops

Gareth Catchpole et al.Sep 26, 2005

There is current debate whether genetically modified (GM) plants might contain unexpected, potentially undesirable changes in overall metabolite composition. However, appropriate analytical technology and acceptable metrics of compositional similarity require development. We describe a comprehensive comparison of total metabolites in field-grown GM and conventional potato tubers using a hierarchical approach initiating with rapid metabolome “fingerprinting” to guide more detailed profiling of metabolites where significant differences are suspected. Central to this strategy are data analysis procedures able to generate validated, reproducible metrics of comparison from complex metabolome data. We show that, apart from targeted changes, these GM potatoes in this study appear substantially equivalent to traditional cultivars.

Artificial Intelligence

Biochemistry

0

Paper

Artificial Intelligence

382

0

Save

6

NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding

Kanix Wang et al.Nov 6, 2020

Machine reading is essential for unlocking valuable knowledge contained in the millions of existing biomedical documents. Over the last two decades 1,2 , the most dramatic advances in machine-reading have followed in the wake of critical corpus development 3 . Large, well-annotated corpora have been associated with punctuated advances in machine reading methodology and automated knowledge extraction systems in the same way that ImageNet 4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named-entity analysis tool for biomedicine: (a) a new, Named-Entity Recognition Ontology (NERO) developed specifically for describing entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named-entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named-entity recognition automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

Philosophy

Artificial Intelligence

6

Paper

Save

A genetic trap in yeast for inhibitors of SARS-CoV-2 main protease

Hanna Alalam et al.Sep 16, 2021

ABSTRACT The ongoing COVID-19 pandemic urges searches for antiviral agents that can block infection or ameliorate its symptoms. Using dissimilar search strategies for new antivirals will improve our overall chances of finding effective treatments. Here, we have established an experimental platform for screening of small molecule inhibitors of SARS-CoV-2 main protease in Saccharomyces cerevisiae cells, genetically engineered to enhance cellular uptake of small molecules in the environment. The system consists of a fusion of the E. coli toxin MazF and its antitoxin MazE, with insertion of a protease cleavage site in the linker peptide connecting the MazE and MazF moieties. Expression of the viral protease confers cleavage of the MazEF fusion, releasing the MazF toxin from its antitoxin, resulting in growth inhibition. In the presence of a small molecule inhibiting the protease, cleavage is blocked and the MazF toxin remains inhibited, promoting growth. The system thus allows positive selection for inhibitors. The engineered yeast strain is tagged with a fluorescent marker protein, allowing precise monitoring of its growth in the presence or absence of inhibitor. We detect an established main protease inhibitor down to 10 μM by a robust growth increase. The system is suitable for robotized large-scale screens. It allows in vivo evaluation of drug candidates, and is rapidly adaptable for new variants of the protease with deviant site specificities. IMPORTANCE The COVID-19 pandemic may continue several years before vaccination campaigns can put an end it globally. Thus, the need for discovery of new antiviral drug candidates will remain. We have engineered a system in yeast cells for detection of small molecule inhibitors of one attractive drug target of SARS-CoV-2, its main protease which is required for viral replication. To detect inhibitors in live cells brings the advantage that only compounds capable of entering the cell and remain stable there, will score in the system. Moreover, by its design in yeast, the system is rapidly adaptable for tuning of detection level, eventual modification of protease cleavage site in case of future mutant variants of the SARS-CoV-2 main protease, or even for other proteases.

Biochemistry

Paleontology

8

Paper

Save

Predicting agronomic traits and associated genomic regions in diverse rice landraces using marker stability

Oghenejokpeme Orhobor et al.Oct 15, 2019

To secure the world's food supply it is essential that we improve our knowledge of the genetic underpinnings of complex agronomic traits. In this paper, we report our findings from performing trait prediction and association mapping using marker stability in diverse rice landraces. We used the least absolute shrinkage and selection operator as our marker selection algorithm, and considered twelve real agronomic traits and a hundred simulated traits using a population with approximately a hundred thousand markers. For trait prediction, we considered several statistical/machine learning methods. We found that some of the methods considered performed best when preselected markers using marker stability were used. However, our results also show that one might need to make a trade-off between model size and performance for some learning methods. For association mapping, we compared marker stability to the genome-wide efficient mixed-model analysis (GEMMA), and for the simulated traits, we found that marker stability significantly outperforms GEMMA. For the real traits, marker stability successfully identifies multiple associated markers, which often entail those selected by GEMMA. Further analysis of the markers selected for the real traits using marker stability showed that they are located in known quantitative trait loci (QTL) using the QTL Annotation Rice Online database. Furthermore, co-functional network prediction of the selected markers using RiceNet v2 also showed association to known controlling genes. We argue that a wide adoption of the marker stability approach for the prediction of agronomic traits and association mapping could improve global rice breeding efforts.

Genetics

Biotechnology

0

Paper

Save

An experimental target-based platform in yeast for screening Plasmodium vivax deoxyhypusine synthase inhibitors

Suélen Silva et al.Jan 1, 2023

The enzyme deoxyhypusine synthase (DHS) catalyzes the first step in the post-translational modification of the eukaryotic translation factor 5A (eIF5A). This is the only protein known to contain the amino acid hypusine, which results from this modification. Both eIF5A and DHS are essential for cell viability in eukaryotes, and inhibiting DHS can be a promising strategy for the development of new therapeutic alternatives. The human and parasitic orthologous proteins are different enough to render selective targeting against infectious diseases; however, no DHS inhibitor selective for the parasite ortholog has previously been reported. Here, we established a yeast surrogate genetics platform to identify inhibitors of DHS from Plasmodium vivax, one of the major causative agents of malaria. We constructed genetically modified Saccharomyces cerevisiae strains expressing DHS genes from Homo sapiens (HsDHS) or P. vivax (PvDHS) in place of the endogenous DHS gene from S. cerevisiae. This new strain background was ~60-fold more sensitive to an inhibitor of human DHS than the one previously used. Initially, a virtual screen using datasets from the ChEMBL-NTD database was performed. Candidate ligands were tested in growth assays using the newly generated yeast strains expressing heterologous DHS genes. Among these, two showed promise by preferentially reducing the growth of the PvDHS-expressing strain. Further, in a robotized assay, we screened 400 compounds from the Pathogen Box library using the same S. cerevisiae strains, and one compound preferentially reduced the growth of the PvDHS-expressing yeast strain. Western blot revealed that these compounds significantly reduced eIF5A hypusination in yeast. Our study demonstrates that this yeast-based platform is suitable for identifying and verifying candidate small molecule DHS inhibitors, selective for the parasite over the human ortholog.

Genetics

Ecology

0

Paper

Save

An Evaluation of Machine-learning for Predicting Phenotype: Studies in Yeast, Rice and Wheat

Nastasiya Grinberg et al.Feb 3, 2017

R

O

N

In phenotype prediction, the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods (elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM)), with two state-of-the-art classical statistical genetics methods (including genomic BLUP). Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all phenotypes considered standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. When applied to the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure, which suggests one way to improve standard machine learning methods when population structure is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save