ResearchHub | Open Science Community

mtDNA Variation and Analysis Using Mitomap and Mitomaster

Marie Lott et al.Dec 1, 2013

Abstract The Mitomap database of human mitochondrial DNA (mtDNA) information has been an important compilation of mtDNA variation for researchers, clinicians, and genetic counselors for the past 25 years. The Mitomap protocol shows how users may look up human mitochondrial gene loci, search for public mitochondrial sequences, and browse or search for reported general population nucleotide variants as well as those reported in clinical disease. Within Mitomap is the powerful sequence analysis tool for human mitochondrial DNA, Mitomaster. The Mitomaster protocol gives step‐by‐step instructions showing how to submit sequences to identify nucleotide variants relative to the rCRS, determine the haplogroup, and view species conservation. User‐supplied sequences, GenBank identifiers, and single‐nucleotide variants may be analyzed. Curr. Protoc. Bioinform . 44:1.23.1‐1.23.26. © 2013 by John Wiley & Sons, Inc.

Genetics

Molecular Biology

0

Paper

Save

Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer

Francesca Petralia et al.Nov 25, 2020

We report a comprehensive proteogenomics analysis, including whole-genome sequencing, RNA sequencing, and proteomics and phosphoproteomics profiling, of 218 tumors across 7 histological types of childhood brain cancer: low-grade glioma (n = 93), ependymoma (32), high-grade glioma (25), medulloblastoma (22), ganglioglioma (18), craniopharyngioma (16), and atypical teratoid rhabdoid tumor (12). Proteomics data identify common biological themes that span histological boundaries, suggesting that treatments used for one histological type may be applied effectively to other tumors sharing similar proteomics features. Immune landscape characterization reveals diverse tumor microenvironments across and within diagnoses. Proteomics data further reveal functional effects of somatic mutations and copy number variations (CNVs) not evident in transcriptomics data. Kinase-substrate association and co-expression network analysis identify important biological mechanisms of tumorigenesis. This is the first large-scale proteogenomics analysis across traditional histological boundaries to uncover foundational pediatric brain tumor biology and inform rational treatment selection.

Genetics

Oncology

14

Paper

Save

Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing

Chao Wu et al.Jun 13, 2019

Abstract Background Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. Here we present a machine learning-based method to distinguish artifacts from bona fide Single Nucleotide Variants (SNVs) detected by NGS from tumor specimens. Methods A cohort of 11,278 SNVs identified through clinical sequencing of tumor specimens were collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A three-class (real, artifact and uncertain) model was developed on the training set, fine-tuned using the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants. Results The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5,587 SNVs of the test set. 1,252 out of 1,341 true positive variants were identified as real, 4,143 out of 4,246 false positive calls were deemed artifacts, while only 192(3.4%) SNVs were labeled as “uncertain” with zero misclassification between the true positives and artifacts in the test set. Conclusions We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received a definitive label and thus were exempt from manual review. This framework could improve quality and efficiency of variant review process in clinical labs.

Artificial Intelligence

Cancer Research

0

Paper

Artificial Intelligence

1

0

Save

0

Phen2Gene: Rapid Phenotype-Driven Gene Prioritization for Rare Diseases

Mengge Zhao et al.Dec 10, 2019

Human Phenotype Ontology (HPO) terms are increasingly used in diagnostic settings to aid in the characterization of patient phenotypes. The HPO annotation database is updated frequently and can provide detailed phenotype knowledge on various human diseases, and many HPO terms are now mapped to candidate causal genes with binary relationships. To further improve the genetic diagnosis of rare diseases, we incorporated these HPO annotations, gene-disease databases, and gene-gene databases in a probabilistic model to build a novel HPO-driven gene prioritization tool, Phen2Gene. Phen2Gene accesses a database built upon this information called the HPO2Gene Knowledgebase (H2GKB), which provides weighted and ranked gene lists for every HPO term. Phen2Gene is then able to access the H2GKB for patient-specific lists of HPO terms or PhenoPackets descriptions supported by GA4GH (http://phenopackets.org/), calculate a prioritized gene list based on a probabilistic model, and output gene-disease relationships with great accuracy. Phen2Gene outperforms existing gene prioritization tools in speed, and acts as a real-time phenotype driven gene prioritization tool to aid the clinical diagnosis of rare undiagnosed diseases. In addition to a command line tool released under the MIT license (https://github.com/WGLab/Phen2Gene), we also developed a web server and web service (https://phen2gene.wglab.org/) for running the tool via web interface or RESTful API queries. Finally, we have curated a large amount of benchmarking data for phenotype-to-gene tools involving 197 patients across 76 scientific articles and 85 patients' de-identified HPO term data from CHOP.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

Rapid and Accurate Interpretation of Clinical Exomes Using Phenoxome: a Computational Phenotype-driven Approach

Chao Wu et al.Mar 2, 2018

Clinical exome sequencing (CES) has become the preferred diagnostic platform for complex pediatric disorders with suspected monogenic etiologies, solving up to 20%-50% of cases depending on indication. Despite rapid advancements in CES analysis, the major challenge still resides in identifying the casual variants among the thousands of variants detected during CES testing, and thus establishing a molecular diagnosis. To improve the clinical exome diagnostic efficiency, we developed Phenoxome, a robust phenotype-driven model that adopts a network-based approach to facilitate automated variant prioritization and subsequent classification. Phenoxome dissects the phenotypic manifestation of a patient in conjunction with their genomic profile to filter and then prioritize putative pathogenic variants. To validate our method, we have compiled a clinical cohort of 105 positive patient samples (i.e. at least one reported pathogenic variant) that represent a wide range of genetic heterogeneity from The Children's Hospital of Philadelphia. Our approach identifies the causative variants within the top 5, 10, or 25 candidates in more than 50%, 71%, or 88% of these patient samples respectively. Furthermore, we show that our method is optimized for clinical testing by yielding superior ranking of the pathogenic variants compared to current state-of-art methods. The web application of Phenoxome is available to the public at http://phenoxome.chop.edu/.

Genetics

Molecular Biology

0

Paper

Save

Evaluating the impact ofin silicopredictors on clinical variant classification

Emma Wilcox et al.Aug 10, 2021

Abstract Background In silico evidence is important to consider when interpreting genetic variants. According to the ACMG/AMP, in silico evidence is applied at the supporting strength level using the PP3 and BP4 criteria, for pathogenic and benign evidence, respectively. While PP3 has been determined to be one of the most commonly applied criteria, less is known about the effect of these two criteria on variant classification outcomes. Methods In this study, a total of 727 missense variants curated by Clinical Genome Resource (ClinGen) Variant Curation Expert Panels (VCEPs) were analyzed to determine how often PP3 and BP4 were applied and how often they influenced final variant classifications. The current categorical system of variant classification was compared with a point-based system being developed by the ClinGen Sequence Variant Interpretation Working Group. In addition, the performance of four in silico tools (REVEL, VEST, FATHMM, and MPC) was assessed by using a gold set of 237 variants (classified as benign or pathogenic independent of PP3 or BP4) to calculate pathogenicity likelihood ratios. Results Collectively, the PP3 and BP4 criteria were applied by ClinGen VCEPs to 55% of missense variants in this data set. Removing in silico criteria from variants where they were originally applied caused variants to change classification from pathogenic to likely pathogenic (14%), likely pathogenic to variant of uncertain significance (VUS) (24%), or likely benign to VUS (64%). The proportion of downgrades with the categorical classification system was similar to that of the point-based system, though the latter resolved borderline classifications. REVEL and VEST performed at a level consistent with moderate strength towards either benign or pathogenic evidence, while FATHMM performed at the supporting level. Conclusions Overall, this study demonstrates that in silico criteria PP3 and BP4 are commonly applied in variant classification and often affect the final classification. Our results suggest that when sufficient thresholds for in silico predictors are established, PP3 and BP4 may be appropriate to use at a moderate strength. However, further calibration with larger datasets is needed to optimize the performance of current in silico tools given the impact they have on clinical variant classification.

Genetics

Microbiology

1

Paper

Save

Improved detection of evolutionary selection highlights potential bias from different sequencing strategies in complex genomic-regions

Tristan Hayeck et al.Sep 30, 2021

Abstract Balancing selection occurs when multiple alleles are kept at elevated frequencies in equilibrium due to opposing evolutionary pressures. A new statistical method was developed to test for selection using efficient Bayesian techniques. Selection signals in three different data sets, generated with variable sequencing technologies, were compared: clinical trios, HLA NGS typed samples, and whole-genome long-read samples. Genome-wide, selection was observed across multiple gene families whose biological functions favor diversification, revealing established targets as well as 45 novel genes under selection. Using high-resolution HLA typing and long-read sequencing data, for the characterization of the MHC, revealed strong selection in expected peptide-binding domains as well as previously understudied intronic and intergenic regions of the MHC. Surprisingly, SIRPA , demonstrated dramatic selection signal, second only to the MHC in most settings. In conclusion, employing novel statistical approaches and improved sequencing technologies is critical to properly analyze complex genomic regions.

Genetics

Artificial Intelligence

1

Paper

Genetics

Artificial Intelligence

0

Save

0

AnthOligo: Automating the design of oligonucleotides for capture/enrichment technologies

Padmini Jayaraman et al.Dec 12, 2019

Summary: A number of methods have been devised to address the need for targeted genomic resequencing. One of these methods, Region-specific extraction (RSE) of DNA is characterized by the capture of long DNA fragments (15-20 kb) by magnetic beads, after enzymatic extension of oligonucleotides hybridized to selected genomic regions. Facilitating the selection of the most optimal capture oligos targeting a region of interest, satisfying the properties of temperature (Tm) and entropy (ΔG), while minimizing the formation of primer dimers in a pooled experiment is therefore necessary. Manual design and selection of oligos becomes an extremely arduous task complicated by factors such as length of the target region and number of targeted regions. Here we describe, AnthOligo, a web-based application developed to optimally automate the process of generation of oligo sequences to be used for the targeting and capturing the continuum of large and complex genomic regions. Apart from generating oligos for RSE, this program may have wider applications in the design of customizable internal oligos to be used as baits for gene panel analysis or even probes for large-scale comparative genomic hybridization (CGH) array processes. Implementation and Availability: The application written in Java8 and run on Tomcat9 is a lightweight Java Spring MVC framework that provides the user with a simple interface to upload an input file in BED format and customize parameters for each task. A Redis-like MapReduce framework is implemented to run sub-tasks in parallel to optimize time and system resources alongside a task-queuing system that runs submitted jobs as a server-side background daemon. The task of probe design in AnthOligo commences when a user uploads an input file and concludes with the generation of a result-set containing an optimal set of region-specific oligos. AnthOligo is currently available as a public web application with URL: https://antholigo.chop.edu. Correspondence to: Dimitrios Monos (monosd@chop.edu) or Mahdi Sarmady (sarmadym@chop.edu)

Genetics

Molecular Biology

0

Paper

Save

Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing datasets

Perry Evans et al.May 30, 2018

Recent advances in DNA sequencing technologies have expanded our understanding of the molecular underpinnings for several genetic disorders, and increased the utilization of genomic tests by clinicians. Given the paucity of evidence to assess each variant, and the difficulty of experimentally evaluating a variant’s clinical significance, many of the thousand variants that can be generated by clinical tests are reported as variants of unknown clinical significance. However, the creation of population-scale variant databases can significantly improve clinical variant interpretation. Specifically, pathogenicity prediction for novel missense variants can now utilize features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Several computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant dataset. Here we introduce one variant dataset derived from clinical sequencing panels, and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This dataset is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further utilize this dataset to demonstrate the necessity of disease-specific classifiers, and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant level features. PathoPredictor achieves an average precision greater than 90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. Accumulation of larger clinical variant datasets and their utilization to train existing pathogenicity metrics can significantly enhance their performance in a disease and gene-specific manner.

Genetics

Microbiology

0

Paper

Save

ExomeSlicer: a resource for the development and validation of exome-based clinical panels

Rojeen Niazi et al.Jan 16, 2018

Exome-based panels (exome slices) are becoming the preferred diagnostic strategy especially for genetically heterogeneous disorders. The advantages of this approach include enabling frequent updates to gene content without the need for re-designing, reflexing to exome analysis bioinformatically without requiring additional sequencing, and streamlining laboratory operation by using established exome kits and protocols. Despite their increasing use, there are currently no guidelines or appropriate resources to support their clinical implementation. Here, we highlight principles and important considerations for the clinical development and validation of exome-based panels, guided by clinical data from a diagnostic epilepsy panel using this approach. We also present a novel, publically accessible web-based resource, ExomeSlicer, and demonstrate its clinical utility in predicting gene-specific and exome-wide technically challenging regions that are not amenable to Next Generation Sequencing (NGS), and that might significantly lead to increased post hoc Sanger fill in burden. Using this tool, we also characterize > 2000 low complexity, GC-rich and/or high homology, regions across the exome that can be a source of false positive or false negative variant calls thus potentially leading to misdiagnoses in tested patients. NOTE: RN and MAG. are co-first authors on this manuscript.

Genetics

Cancer Research

0

Paper

Genetics

Cancer Research

0

Save