ResearchHub | Open Science Community

pycoMeth: A toolbox for differential methylation testing from Nanopore methylation calls

René Snajder et al.Oct 24, 2023

A bstract Advances in base and methylation calling of Oxford Nanopore Technologies (ONT) sequencing data have opened up the possibility for joint profiling of genomic and epigenetic variation on the same long reads. Existing data storage and analysis frameworks that were developed for CpG-methylation arrays or short-read bisulfite sequencing data have severe shortcomings for handling of ONT data, failing to fully exploit methylation profiles obtained from long read technologies. To address these issues, we present pycoMeth , a toolbox to store, manage and analyse DNA methylation data obtained from long-read ONT sequencing data. Our toolbox centers around a new storage format called MetH5 , which allows simultaneously for efficient storage of and rapid data access for read-level and reference-anchored methylation call data. Building on this storage format, we propose efficient algorithms for the segmentation and differential methylation testing of methylation calls from ONT data. Our methods draw from read-group and read-level information, as well as methylation call uncertainties, and allow for de novo discovery of methylation patterns and differentially methylated regions in a haplotyped multi-sample setting. We show that MetH5 is more efficient than existing solutions for storing ONT methylation calls, and carry out benchmarking for segmentation and differential methylation analysis, demonstrating increased performance and sensitivity of pycoMeth compared to existing solutions.

Methylation

Dna Methylation

Computer Science

33

Paper

Save

Genomic variations and epigenomic landscape of the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel

Adrien Léger et al.Oct 24, 2023

Abstract The teleost medaka ( Oryzias latipes ) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains ( HdrR , HNI and HSOK ). Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder individuals. We have exploited this feature to create an inbred panel resource: the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel. This panel of 80 near-isogenic inbred lines contains a large amount of genetic variation inherited from the original wild population. We used Oxford Nanopore Technologies (ONT) long read data to further investigate the genomic and epigenomic landscapes of a subset of the MIKK panel. Nanopore sequencing allowed us to identify a much greater variety of high-quality structural variants compared with Illumina sequencing. We also present results and methods using a pan-genome graph representation of 12 individual medaka lines from the MIKK panel. This graph-based reference MIKK panel genome revealed novel differences between the MIKK panel lines compared to standard linear reference genomes. We found additional MIKK panel-specific genomic content that would be missing from linear reference alignment approaches. We were also able to identify and quantify the presence of repeat elements in each of the lines. Finally, we investigated line-specific CpG methylation and performed differential DNA methylation analysis across the 12 lines. We thus present a detailed analysis of the MIKK panel genomes using long and short read sequence technologies, creating a MIKK panel specific pan genome reference dataset allowing for the investigation of novel variation types that would be elusive using standard approaches.

Biology

Genetics

Inbred Strain

59

Paper

Save

The Medaka Inbred Kiyosu-Karlsruhe (MIKK) Panel

Tomas Fitzgerald et al.Oct 24, 2023

Abstract Unraveling the relationship between genetic variation and phenotypic traits remains a fundamental challenge in biology. Mapping variants underlying complex traits while controlling for confounding environmental factors is often problematic. To address this, we have established a vertebrate genetic resource specifically to allow for robust genotype-to-phenotype investigations. The teleost medaka ( Oryzias latipes ) is an established genetic model system with a long history of genetic research and a high tolerance to inbreeding from the wild. Here we present the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel: the first near-isogenic panel of 80 inbred lines in a vertebrate model derived from a wild founder population. Inbred lines provide fixed genomes that are a prerequisite for the replication of studies, studies which vary both the genetics and environment in a controlled manner and functional testing. The MIKK panel will therefore enable phenotype-to-genotype association studies of complex genetic traits while allowing for careful control of interacting factors, with numerous applications in genetic research, human health, and drug development and fundamental biology. Here we present a detailed characterisation of the genetic variation across the MIKK panel, which provides a rich and unique genetic resource to the community by enabling large-scale experiments for mapping complex traits.

Biology

Inbreeding

Genetics

19

Paper

Save

Natural genetic variation quantitatively regulates heart rate and dimension

Jakob Gierten et al.Oct 24, 2023

The polygenic contribution to heart development and function along the health-disease continuum remains unresolved. To gain insight into the genetic basis of quantitative cardiac phenotypes, we utilize highly inbred Japanese rice fish models, Oryzias latipes, and Oryzias sakaizumii. Employing automated quantification of embryonic heart rates as core metric, we profiled phenotype variability across five inbred strains. We observed maximal phenotypic contrast between individuals of the HO5 and the HdrR strain. HO5 showed elevated heart rates associated with embryonic ventricular hypoplasia and impaired adult cardiac function. This contrast served as the basis for genome-wide mapping. In a segregation population of 1192 HO5 x HdrR F2 embryos, we mapped 59 loci (173 genes) associated with heart rate. Experimental validation of the top 12 candidate genes in loss-of-function models revealed their causal and distinct impact on heart rate, development, ventricle size, and arrhythmia. Our study uncovers new diagnostic and therapeutic targets for developmental and electrophysiological cardiac diseases and provides a novel scalable approach to investigate the intricate genetic architecture of the vertebrate heart.

Variation (Astronomy)

Biology

Dimension (Graph Theory)

1

Paper

Variation (Astronomy)

1

0

Save

48

Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures

Tobias Rausch et al.Oct 24, 2023

Summary Cancer genomes harbor a broad spectrum of structural variants (SV) driving tumorigenesis, a relevant subset of which are likely to escape discovery in short reads. We employed Oxford Nanopore Technologies (ONT) sequencing in a paired diagnostic and post-therapy medulloblastoma to unravel the haplotype-resolved somatic genetic and epigenetic landscape. We assemble complex rearrangements and such associated with telomeric sequences, including a 1.55 Megabasepair chromothripsis event. We uncover a complex SV pattern termed ‘templated insertion thread’, characterized by short (mostly <1kb) insertions showing prevalent self-concatenation into highly amplified structures of up to 50kbp in size. Templated insertion threads occur in 3% of cancers, with a prevalence ranging to 74% in liposarcoma, and frequent colocalization with chromothripsis. We also perform long-read based methylome profiling and discover allele-specific methylation (ASM) effects, complex rearrangements exhibiting differential methylation, and differential promoter methylation in seven cancer-driver genes. Our study shows the potential of long-read sequencing in cancer. Graphical abstract I) We investigate a single patient with chromothriptic sonic hedgehog medulloblastoma (Li-Fraumeni syndrome), with tissue samples taken from blood, the primary tumor at diagnosis, and a post-treatment (relapse) tumor. II) Data on the three samples has been collected from four sources, 1) Illumina whole-genome, 2) Illumina transcriptome sequencing, 3) Illumina Infinium HumanMethylation450k, as well as 4) long-read whole-genome sequencing using Oxford Nanopore Technologies (ONT) sequencing. III) An integrative analysis combines genomic, epigenomic as well as transcriptomic data to provide a comprehensive analysis of this heavily rearranged tumor sample. Long and short read sequencing data is used to inform the analysis of complex structural genomic variants and methylation called from haplotyped ONT reads and validated through the methylation array data allows for a haplotype-resolved study of genomic and epigenomic variation, which can then be examined for transcriptional effect. IV) This integrative analysis allows us to identify a large number of inter- and intra-chromosomal genomic rearrangements (A) including a complex rearrangement pattern we term templated insertion threads (B) , as well as sample-specific and haplotype specific methylation patterns of known cancer genes (C) .

Biology

Epigenetics

Chromothripsis

48

Paper

Save

RNA modifications detection by comparative Nanopore direct RNA sequencing

Adrien Léger et al.May 6, 2020

RNA molecules undergo a vast array of chemical post-transcriptional modifications (PTMs) that can affect their structure and interaction properties. To date, over 150 naturally occurring PTMs have been identified, however the overwhelming majority of their functions remain elusive. In recent years, a small number of PTMs have been successfully mapped to the transcriptome using experimental approaches relying on high-throughput sequencing. Oxford Nanopore direct-RNA sequencing (DRS) technology has been shown to be sensitive to RNA modifications. We developed and validated Nanocompore, a robust analytical framework to evaluate the presence of modifications in DRS data. To do so, we compare an RNA sample of interest against a non-modified control sample. Our strategy does not require a training set and allows the use of replicates to model biological variability. Here, we demonstrate the ability of Nanocompore to detect RNA modifications at single-molecule resolution in human polyA+ RNAs, as well as in targeted non-coding RNAs. Our results correlate well with orthogonal methods, confirm previous observations on the distribution of N6-methyladenosine sites and provide novel insights into the distribution of RNA modifications in the coding and non-coding transcriptomes. The latest version of Nanocompore can be obtained at https://github.com/tleonardi/nanocompore/

Rna

Computational Biology

Nanopore Sequencing

0

Paper

Rna

Computational Biology

0

Save

1

Nanopore ReCappable Sequencing maps SARS-CoV-2 5′ capping sites and provides new insights into the structure of sgRNAs

Camilla Ugolini et al.Oct 24, 2023

Abstract The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5′ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.

Nanopore Sequencing

Computational Biology

Biology

1

Paper

Nanopore Sequencing

Computational Biology

0

Save