Abstract Spliced fusion-transcripts are typically identified by RNA-seq without elucidating the causal genomic breakpoints. However, non poly(A)-enriched RNA-seq contains large proportions of intronic reads spanning also genomic breakpoints. Using 1.274 RNA-seq samples, we investigated what additional information is embedded in non poly(A)-enriched RNA-seq data. Here, we present our novel, graph-based, Dr. Disco algorithm that makes use of both intronic and exonic RNA-seq reads to identify not only fusion transcripts but also genomic breakpoints in gene but also in intergenic regions. Dr. Disco identified TMPRSS2-ERG fusions with genomic breakpoints and other transcribed rearrangements from multiple RNA-sequencing cohorts. In breast cancer and glioma samples Dr. Disco identified rearrangement hotspots near CCND1 and MDM2 and could directly associate this with increased expression. A comparison with matched DNA-sequencing revealed that most genomic breakpoints are not, or minimally, transcribed while also revealing highly expressed translocations missed by DNA-seq. By using the full potential of non poly(A)-enriched RNA-seq data, Dr. Disco can reliably identify expressed genomic breakpoints and their transcriptional effects.
This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.