ResearchHub | Open Science Community

JAFFAL: Detecting fusion genes with long read transcriptome sequencing

N. Davidson et al.Apr 26, 2021

Abstract Massively parallel short read transcriptome sequencing has greatly expanded our knowledge of fusion genes which are drivers of tumor initiation and progression. In cancer, many fusions are also important diagnostic markers and targets for therapy. Long read transcriptome sequencing allows the full length of fusion transcripts to be discovered, however, this data has a high rate of errors and fusion finding algorithms designed for short reads do not work. While numerous fusion finding algorithms now exist for short read RNA sequencing data, there are few methods to detect fusions using third generation or long read sequencing data. Fusion finding in long read sequencing will allow the discovery of the full isoform structure of fusion genes. Here we present JAFFAL, a method to identify fusions from long-read transcriptome sequencing. We validated JAFFAL using simulation, cell line and patient data from Nanopore and PacBio. We show that fusions can be accurately detected in long read data with JAFFAL, providing better accuracy than other long read fusion finders and with similar performance as state-of-the-art methods applied to short read data. By comparing Nanopore transcriptome sequencing protocols we find that numerous chimeric molecules are generated during cDNA library preparation that are absent when RNA is sequenced directly. We demonstrate that JAFFAL enables fusions to be detected at the level of individual cells, when applied to long read single cell sequencing. Moreover, we demonstrate JAFFAL can identify fusions spanning three genes, highlighting the utility of long reads to characterise the transcriptional products of complex structural rearrangements with unprecedented resolution. JAFFAL is open source and available as part of the JAFFA package at https://github.com/Oshlack/JAFFA/wiki .

ALLSorts: a RNA-Seq classifier for B-Cell Acute Lymphoblastic Leukemia

Breon Schmidt et al.Aug 1, 2021

Abstract B-cell acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Subtypes within B-ALL are distinguished by characteristic structural variants and mutations, which in some instances strongly correlate with responses to treatment. The World Health Organisation (WHO) recognises seven distinct classifications, or subtypes , as of 2016. However, recent studies have demonstrated that B-ALL can be segmented into 23 subtypes based on a combination of genomic features and gene expression profiles. A method to identify a patient’s subtype would have clear clinical utility. Despite this, no publically available classification methods using RNA-Seq exist for this purpose. Here we present ALLSorts: a publicly available method that uses RNA-Seq data to classify B-ALL samples to 18 known subtypes and five meta-subtypes. ALLSorts is the result of a hierarchical supervised machine learning algorithm applied to a training set of 1223 B-ALL samples aggregated from multiple cohorts. Validation revealed that ALLSorts can accurately attribute samples to subtypes and can attribute multiple subtypes to a sample. Furthermore, when applied to both paediatric and adult cohorts, ALLSorts was able to classify previously undefined samples into subtypes. ALLSorts is available and documented on GitHub ( https://github.com/Oshlack/AllSorts/ ). Key Points ALLSorts is a gene expression classifier for B-cell acute lymphoblastic leukemia, which predicts 18 distinct genomic subtypes - including those designated by the World Health Organisation (WHO) and provisional entities. Trained and validated on over 2300 B-ALL samples, representing each subtype and a variety of clinical features. Correctly identified subtypes in 91% of cases in a held-out dataset and between 82-93% across a newly combined cohort of paediatric and adult samples. ALLSorts assigned subtypes to samples with previously unknown driver events. ALLsorts is an accurate, comprehensive and freely available classification tool that distinguishes subtypes of B-cell acute lymphoblastic leukemia from RNA-sequencing.