ResearchHub | Open Science Community

Frederick Roth

Author with expertise in Standards and Guidelines for Genetic Variant Interpretation

Dana-Farber Cancer Institute, University of Toronto, Donnelly College

+ 7 more

Achievements

Open Access Advocate

Cited Author

Key Stats

Upvotes received:

Publications:

(69% Open Access)

Cited by:

h-index:

i10-index:

104

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

How is this calculated?

Publications

Why clinical trials are terminated

Theodore Pak et al.May 6, 2020

Background: Evidence-based clinical practice relies on unbiased reporting of negative results. Meta-analysis of drug safety and efficacy across many clinical trials is difficult given the unconstrained nature of reasons that are provided to ClinicalTrials.gov to explain clinical trial terminations. Methods and Findings: We scanned all trials in ClinicalTrials.gov marked with the “terminated” status (N=3122), meaning the trial had been stopped before the scheduled end date. Under the current reporting framework, any number of reasons may be given for termination, and these need not conform to a controlled vocabulary. Here we develop a controlled vocabulary for trial termination, and map each terminated trial to as many as three vocabulary terms. Mapping to this “ontology of termination” allows further analysis and conclusions. First, we identify the subset of terminated trials that ended citing safety concerns (6.2%) or failure to establish efficacy (10.8%), and were further able to stratify these rates across trials of different phases. Second, we examine termination reasons where a stricter data model could have preserved more evidentiary value, either because the data model was misused (7.6%) or because the given reason left unclear whether the decision to terminate was based on analysis of the data (74.9%, with 20.4% mentioning a decision-maker that may have had access to the data). Third, we show that imposing a controlled vocabulary of reasons for termination would avoid ambiguity and improve the evidentiary value of clinical trials. Conclusions: We encourage wider use of an “ontology of termination” and propose four questions that should be posed on trial termination. These simple steps would promote transparency and enable ready access to negative trial results for meta-analysis.

Paper

Save

Binary interactome models of inner- versus outer-complexome organisation

L. Lambourne et al.Oct 24, 2023

+36

Summary Hundreds of different protein complexes that perform important functions across all cellular processes, collectively comprising the “complexome” of an organism, have been identified 1 . However, less is known about the fraction of the interactome that exists outside the complexome, in the “outer-complexome”. To investigate features of “inner”- versus outer-complexome organisation in yeast, we generated a high-quality atlas of binary protein-protein interactions (PPIs), combining three previous maps 2–4 and a new reference all-by-all binary interactome map. A greater proportion of interactions in our map are in the outer-complexome, in comparison to those found by affinity purification followed by mass spectrometry 5–7 or in literature curated datasets 8–11 . In addition, recent advances in deep learning predictions of PPI structures 12 mirror the existing experimentally resolved structures in being largely focused on the inner complexome and missing most interactions in the outer-complexome. Our new PPI network suggests that the outer-complexome contains considerably more PPIs than the inner-complexome, and integration with functional similarity networks 13–15 reveals that interactions in the inner-complexome are highly detectable and correspond to pairs of proteins with high functional similarity, while proteins connected by more transient, harder-to-detect interactions in the outer-complexome, exhibit higher functional heterogeneity.

Interactome

Computational Biology

Biology

Paper

Interactome

Save

MaveRegistry: a collaboration platform for multiplexed assays of variant effect

Da Kuang et al.Oct 24, 2023

Abstract Summary Multiplexed assays of variant effect (MAVEs) are capable of experimentally testing all possible single nucleotide or amino acid variants in selected genomic regions, generating ‘variant effect maps’, which provide biochemical insight and functional evidence to enable more rapid and accurate clinical interpretation of human variation. Because the international community applying MAVE approaches is growing rapidly, we developed the online MaveRegistry platform to catalyze collaboration, reduce redundant efforts, allow stakeholders to nominate targets, and enable tracking and sharing of progress on ongoing MAVE projects. Availability and implementation https://registry.varianteffect.org Contact fritz.roth@utoronto.ca

Multiplexing

Computational Biology

Computer Science

Paper

Multiplexing

Save

Next-generation large-scale binary protein interaction network for Drosophila

Hong-Wen Tang et al.Oct 24, 2023

+37

Abstract Generating reference maps of the interactome networks underlying most cellular functions can greatly illuminate genetic studies by providing a protein-centric approach to finding new components of existing pathways, complexes, and processes. Here, we applied state-of-the-art experimental and bioinformatics methods to identify high-confidence binary protein-protein interactions (PPIs) for Drosophila melanogaster . We performed four all-by-all yeast two-hybrid (Y2H) screens of >10,000 Drosophila proteins, resulting in the ‘FlyBi’ dataset of 8,723 PPIs among 2,939 proteins. As part of this effort, we tested subsets of our data and data from previous PPI datasets using an orthogonal assay, which allowed us to normalize data quality across datasets. Next, we integrated our FlyBi data with previous PPI data, resulting in an expanded, high-confidence binary Drosophila reference interaction network, DroRI, comprising 17,232 interactions among 6,511 proteins. These data are accessible through the Molecular Interaction Search Tool (MIST) and other databases. To assess the utility of the PPI resource, we used novel interactions from the FlyBi dataset to generate an autophagy interaction network that we validated in vivo using two different autophagy-related assays. We found that deformed wings ( dwg ) encodes a protein that is both a regulator and a target of autophagy. Altogether, the resources generated in this project provide a strong foundation for building high-confidence new hypotheses regarding protein networks and function.

Interactome

Drosophila Melanogaster

Computational Biology

Paper

Interactome

Save

Empowering rare variant burden-based gene-trait association studies via optimized computational predictor choice

Da Kuang et al.Oct 24, 2023

Abstract Background Causal gene/trait relationships can be identified via observation of an excess (or reduced) burden of rare variation in a given gene within humans who have that trait. Although computational predictors can improve the power of such ‘burden’ tests, it is unclear which are optimal for this task. Method Using 140 gene-trait combinations with a reported rare-variant burden association, we evaluated the ability of 20 computational predictors to predict human traits. We used the best-performing predictors to increase the power of genome-wide rare variant burden scans based on ∼450K UK Biobank participants. Results Two predictors—VARITY and REVEL—outperformed all others in predicting human traits in the UK Biobank from missense variation. Genome-scale burden scans using the two best-performing predictors identified 1,038 gene-trait associations (FDR < 5%), including 567 (55%) that had not been previously reported. We explore 54 cardiovascular gene-trait associations (including 15 not reported in other burden scans) in greater depth. Conclusions Rigorous selection of computational missense variant effect predictors can improve the power of rare-variant burden scans for human gene-trait associations, yielding many new associations with potential value in informing mechanistic understanding and therapeutic development. The strategy we describe here is generalizable to future computational variant effect predictors, traits and organisms.

Paper

Save

An open-source platform to distribute and interpret data from multiplexed assays of variant effect

Daniel Esposito et al.May 7, 2020

Multiplex Assays of Variant Effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here we present MaveDB, a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first of these applications, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.

Paper

Save

Pacybara: Accurate long-read sequencing for barcoded mutagenized allelic libraries

Jochen Weile et al.Oct 24, 2023

+11

Long read sequencing technologies, an attractive solution for many applications, often suffer from higher error rates. Alignment of multiple reads can improve base-calling accuracy, but some applications, e.g. sequencing mutagenized libraries where multiple distinct clones differ by one or few variants, require the use of barcodes or unique molecular identifiers. Unfortunately, sequencing errors can interfere with correct barcode identification, and a given barcode sequence may be linked to multiple independent clones within a given library. Here we focus on the target application of sequencing mutagenized libraries in the context of multiplexed assays of variant effects (MAVEs). MAVEs are increasingly used to create comprehensive genotype-phenotype maps that can aid clinical variant interpretation. Many MAVE methods use long-read sequencing of barcoded mutant libraries for accurate association of barcode with genotype. Existing long-read sequencing pipelines do not account for inaccurate sequencing or non-unique barcodes. Here, we describe Pacybara, which handles these issues by clustering long reads based on the similarities of (error-prone) barcodes while also detecting barcodes that have been associated with multiple genotypes. Pacybara also detects recombinant (chimeric) clones and reduces false positive indel calls. In three example applications, we show that Pacybara identifies and correctly resolves these issues.

Paper

Save

A Common Class of Transcripts with 5'-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification

Can Cenik et al.May 7, 2020

Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the Exon Junction Complex (EJC) at non-canonical 5' proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ~20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for non-canonical binding by the Exon Junction Complex.

Paper

Save

Quantifying Immune-Based Counterselection of Somatic Mutations

Fan Yang et al.May 7, 2020

It is now well established that somatic mutations in protein-coding regions can generate neoantigens, and that these can be recognized by the immune system and contribute to clearance of developing cancers. However, there is currently no model that can quantitatively predict the neoantigenic effect of any given somatic mutation. Here, we examined signatures of immune selection pressure on the distribution of somatic mutations. We quantified the extent to which somatic mutations are significantly depleted in peptides that are predicted to be displayed by major histocompatibility complex (MHC) class I proteins. We characterized the dependence of this depletion on expression level. We then examined whether immune selection pressure on somatic mutations changes depending on whether the patient had either one or two MHC-encoding alleles that can display the peptide. Our results indicate that MHC-encoding alleles are, in general, incompletely dominant, i.e., that having two copies of the display-enabling allele is more effective in displaying that peptide than having just one copy. More generally, a quantitative understanding of counter-selection of identifiable subclasses of neoantigenic somatic variation could guide immunotherapy or aid in developing personalized cancer vaccines.

Somatic Cell

Biology

Major Histocompatibility Complex

Paper

Somatic Cell

Biology

Save

CNTN5-/+ or EHMT2-/+ iPSC-Derived Neurons from Individuals with Autism Develop Hyperactive Neuronal Networks

Éric Deneault et al.May 7, 2020

+15

Induced pluripotent stem cell (iPSC)-derived cortical neurons are increasingly used as a model to study developmental aspects of Autism Spectrum Disorder (ASD), which is clinically and genetically heterogeneous. To study the complex relationship of rare (penetrant) variant(s) and common (weaker) polygenic risk variant(s) to ASD, isogenic iPSC-derived neurons from probands and family-based controls, for modeling, is critical. We developed a standardized set of procedures, designed to control for heterogeneity in reprogramming and differentiation, and generated 53 different iPSC-derived glutamatergic neuronal lines from 25 participants from 12 unrelated families with ASD (14 ASD-affected individuals, 3 unaffected siblings, 8 unaffected parents). Heterozygous de novo (7 families; 16p11.2, NRXN1, DLGAP2, CAPRIN1, VIP, ANOS1, THRA) and rare-inherited (2 families; CNTN5, AGBL4) presumed-damaging variants were characterized in ASD risk genes/loci. In three additional families, functional candidates for ASD (SET), and combinations of putative etiologic variants (GLI3/KIF21A and EHMT2/UBE2I combinations in separate families), were modeled. We used a large-scale multi-electrode array (MEA) as our primary high-throughput phenotyping assay, followed by patch clamp recordings. Our most compelling new results revealed a consistent spontaneous network hyperactivity in neurons deficient for CNTN5 or EHMT2. Our biobank of iPSC-derived neurons and accompanying genomic data are available to accelerate ASD research.

Induced Pluripotent Stem Cell

Epigenetic Reprogramming

Autism

Paper

Induced Pluripotent Stem Cell

Epigenetic Reprogramming

Save