ResearchHub | Open Science Community

Long-read whole genome analysis of human single cells

Joanna Hård et al.Apr 13, 2021

Abstract With long-read sequencing, we have entered an era where individual genomes are routinely assembled to near completion and where complex genetic variation can efficiently be resolved. Here, we demonstrate that long reads can be applied to study the genomic architecture of individual human cells. Clonally expanded CD8+ T-cells from a human donor were used as starting material for a droplet-based multiple displacement amplification (dMDA) to generate long molecules with minimal amplification bias. PacBio HiFi sequencing generated up to 20 Gb data and 40% genome coverage per single cell. The data allowed for accurate detection and haplotype phasing of single nucleotide variants (SNVs), structural variants (SVs), and tandem repeats, including in genomic regions inaccessible by short reads. Somatic SNVs were detected in the nuclear genome and mitochondrial DNA. An average of 1278 high-confidence SVs per cell were discovered in the PacBio data, nearly four times as many compared to those found in Illumina dMDA data from clonally related cells. Single-cell de novo assembly resulted in a genome size of up to 598 Mb and 1762 (12.8%) complete gene models. In summary, the work presented here demonstrates the utility of whole genome amplification combined with long-read sequencing toward the characterization of the full spectrum of genetic variation at the single-cell level.

Genetics

Molecular Biology

188

Paper

Save

pyCancerSig: subclassifying human cancer with comprehensive single nucleotide, structural and microsatellite mutational signature deconstruction from whole genome sequencing

Jessada Thutkawkorapin et al.Sep 30, 2019

Abstract Background DNA damage accumulates over the course of cancer development. The often-substantial amount of somatic mutations in cancer poses a challenge to traditional methods to characterize tumors based on driver mutations. However, advances in machine learning technology can take advantage of this substantial amount of data. Results We developed a command line interface python package, pyCancerSig, to perform sample profiling by integrating single nucleotide variation (SNV), structural variation (SV) and microsatellite instability (MSI) profiles into a unified profile. It also provides a command to decipher underlying cancer processes, employing an unsupervised learning technique, Non-negative Matrix Factorization, and a command to visualize the results. The package accepts common standard file formats (vcf, bam). The program was evaluated using a cohort of breast- and colorectal cancer from The Cancer Genome Atlas project (TCGA). The result showed that by integrating multiple mutations modes, the tool can correctly identify cases with known clear mutational signatures and can strengthen signatures in cases with unclear signal from an SNV-only profile. Conclusions pyCancerSig has demonstrated its capability in identifying known and unknown cancer processes, and at the same time, illuminates the association within and between the mutation modes.

Genetics

Cancer Research

0

Paper

Save

A combination of long and short read genomics reveals frequent p-arm breakpoints within chromosome 21 complex genomic rearrangements

Jakob Schuy et al.Jun 1, 2024

Genetics

Molecular Biology

0

Paper

Save

Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants

Maxime Garcia et al.May 9, 2018

Summary: Whole-genome sequencing (WGS) is a cornerstone of precision medicine, but portable and reproducible open-source workflows for WGS analyses of germline and somatic variants are lacking. We present Sarek, a modular, comprehensive, and easy-to-install workflow, combining a range of software for the identification and annotation of single-nucleotide variants (SNVs), insertion and deletion variants (indels), structural variants, tumor sample heterogeneity, and karyotyping from germline or paired tumor/normal samples. Sarek is implemented in a bioinformatics workflow language (Nextflow) with Docker and Singularity compatible containers, ensuring easy deployment and full reproducibility at any Linux based compute cluster or cloud computing environment. Sarek supports the human reference genomes GRCh37 and GRCh38, and can readily be used both as a core production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. Availability: Source code and instructions for local installation are available at GitHub (https://github.com/SciLifeLab/Sarek) under the MIT open-source license, and we invite the research community to contribute additional functionality as a collaborative open-source development project.

Genetics

Molecular Biology

0

Paper

Save

Rare variants in dynein heavy chain genes in two individuals with situs inversus and developmental dyslexia

Andrea Bieder et al.Mar 31, 2020

Background: Developmental dyslexia (DD) is a neurodevelopmental learning disorder with high heritability. A number of candidate susceptibility genes have been identified, some of which are linked to the function of the cilium, an organelle regulating left-right asymmetry development in the embryo. Furthermore, it has been suggested that disrupted left-right asymmetry of the brain may play a role in neurodevelopmental disorders such as DD. Methods: Here, we studied two individuals with co-occurring situs inversus (SI) and DD using whole genome sequencing to identify single nucleotide variants or copy number variations of importance for DD and SI. Results: Individual 1 had primary ciliary dyskinesia (PCD), a rare, autosomal recessive disorder with oto-sino-pulmonary phenotype and SI. We identified two rare nonsynonymous variants in the dynein axonemal heavy chain 5 gene (DNAH5): c.7502G>C;p.(R2501P), a previously reported variant predicted to be damaging and c.12043T>G;p.(Y4015D), a novel variant predicted to be damaging. Ultrastructural analysis of the cilia revealed a lack of outer dynein arms and normal inner dynein arms. MRI of the brain revealed no significant abnormalities. Individual 2 had non-syndromic SI and DD. In individual 2, one rare variant (c.9110A>G;p.(H3037R)) in the dynein axonemal heavy chain 11 gene (DNAH11), coding for another component of the outer dynein arm, was identified. Conclusions: We identified the likely genetic cause of SI and PCD in one individual, and a possibly significant heterozygosity in the other, both involving dynein genes. Given the present evidence, it is unclear if the identified variants also predispose to DD, but further studies into the association are warranted.

Genetics

Molecular Biology

0

Paper

Save

Single-cell multimodal omics and directly reprogrammed neurons to probe reduced penetrance in Frontotemporal Dementia

Karthick Natarajan et al.Sep 22, 2020

Disclaimer “This manuscript has been withdrawn by the authors as it was submitted and made public without the full consent of all the authors. Therefore, the authors do not wish this work to be cited as a reference for the project. If you have any questions, please contact the corresponding author.”

Genetics

Law

0

Paper

Save

Linked-read whole-genome sequencing resolves common and private structural variants in multiple myeloma

Lucía Peña-Pérez et al.Dec 9, 2021

ABSTRACT Multiple myeloma (MM) is an incurable and aggressive plasma cell malignancy characterized by a complex karyotype with multiple structural variants (SVs) and copy number variations (CNVs). Linked-read whole-genome sequencing (lrWGS) allows for refined detection and reconstruction of SVs by providing long-range genetic information from standard short-read sequencing. This makes lrWGS an attractive solution for capturing the full genomic complexity of MM. Here we show that high-quality lrWGS data can be generated from low numbers of FACS sorted cells without DNA purification. Using this protocol, we analyzed FACS sorted MM cells from 37 MM patients with lrWGS. We found high concordance between lrWGS and FISH for the detection of recurrent translocations and CNVs. Outside of the regions investigated by FISH, we identified >150 additional SVs and CNVs across the cohort. Analysis of the lrWGS data allowed for resolving the structure of diverse SVs affecting the MYC and t(11;14) loci causing the duplication of genes and gene regulatory elements. In addition, we identified private SVs causing the dysregulation of genes recurrently involved in translocations with the IGH locus and show that these can alter the molecular classification of the MM. Overall, we conclude that lrWGS allows for the detection of aberrations critical for MM prognostics and provides a feasible route for providing comprehensive genetics. Implementing lrWGS could provide more accurate clinical prognostics, facilitate genomic medicine initiatives, and greatly improve the stratification of patients included in clinical trials. KEY POINTS - Linked-read WGS can be performed without DNA purification and allows for resolving the diverse structural variants found in multiple myeloma. - Linked-read WGS can, as a stand-alone assay, provide comprehensive genetics in myeloma and other diseases with complex genomes.

Genetics

Plant Science

1

Paper

Save

Multi-omics analysis detail a submicroscopic inv(15)(q14q15) generating fusion transcripts and MEIS2 and NUSAP1 haploinsufficiency

Marlene Ek et al.Dec 5, 2024

Abstract Inversions are balanced structural variants that often remain undetected in genetic diagnostics. We present a female proband with a de novo Chromosome 15 paracentric inversion, disrupting MEIS2 and NUSAP1 . The inversion was detected by short-read genome sequencing and confirmed with adaptive long-read sequencing. The breakpoint junction analysis revealed a 96 bp (bp) deletion and an 18 bp insertion in the two junctions, suggesting that the rearrangement arose through a replicative error. Transcriptome sequencing of cultured fibroblasts revealed normal MEIS2 levels and 0.61-fold decreased expression of NUSAP1 . Furthermore, three fusion transcripts were detected and confirmed by Sanger sequencing. Heterozygous loss of MEIS2 (MIM# 600987) is associated with a cleft palate, heart malformations, and intellectual impairment, which overlap with the clinical symptoms observed in the proband. The observed fusion transcripts are likely non-functional, and MEIS2 haploinsufficiency is the likely disease causative mechanism. Altogether, this study’s findings illustrate the importance of including inversions in rare disease diagnostic testing and highlight the value of long read sequencing for the validation and characterization of such variants.

Genetics

Molecular Biology

0

Paper

Save

Detecting transposable elements in long read genomes using sTELLeR

Kristine Sæther et al.Nov 18, 2024

Abstract Motivation Repeat elements such as transposable elements (TE), are highly repetitive DNA sequences that compose around 50% of the genome. TEs such as Alu, SVA, HERV and L1 elements can cause disease through disrupting genes, causing frameshift mutations or altering splicing patters. These are elements challenging to characterize using short-read genome sequencing (srGS), due to its read length and TEs repetitive nature. Long read genome sequencing (lrGS) enables bridging of TEs, allowing increased resolution across repetitive DNA sequences. lrGS therefore present an opportunity for improved TE detection and analysis, not only from a research perspective, but also for future clinical detection. When choosing a lrGS TE caller, parameters such as runtime, CPU hours, sensitivity, precision and compatibility with inclusion into pipelines are crucial for efficient detection. Results We therefore developed sTELLeR, (s) Transposable ELement in Long (e) Read, for accurate, fast and effective TE detection. Particularly, sTELLeR exhibit higher precision and sensitivity for calling of Alu elements than similar tools. The caller is 5-48x as fast and uses <2% of the CPU hours compared to competitive callers. The caller is haplotype aware and output results in a VCF file, enabling compatibility with other variant callers and downstream analysis. Availability sTELLeR is a python-based tool and is available at https://github.com/kristinebilgrav/sTELLeR. Altogether, we show that sTELLeR is a fast, sensitive and precise caller for detection of TE elements, and can easily be implemented into variant calling workflows. Supplementary information Supplementary data are available at Bioinformatics online.

Genetics

Molecular Biology

0

Paper

Save

Rare coding variants inNOX4link high superoxide levels to psoriatic arthritis mutilans

Sailan Wang et al.Jun 7, 2023

Summary Psoriatic arthritis mutilans (PAM) is the rarest and most severe form of psoriatic arthritis. PAM is characterized by erosions of the small joints of hands and feet and osteolysis leading to joint disruption. Despite its severity, the underlying mechanisms are unknown, and no candidate susceptibility genes have hitherto been identified. We aimed to investigate the genetic basis of PAM. We performed massive parallel sequencing of sixty-one patients’ genomes from the PAM Nordic cohort. We validated the rare variants found by Sanger sequencing and genotyped additional psoriasis, psoriatic arthritis, and control cohorts. We then tested the role of the variants using in vivo and in vitro models. We found rare variants with a minor allele frequency (MAF) below 0.0001 in the NADPH oxidase 4 ( NOX4 ) in four patients. In silico predictions show that the identified variants are potentially damaging. NOXs are the only enzymes producing reactive oxygen species (ROS). ROS are highly reactive molecules important role in the regulation of signal transduction. NOX4 is specifically involved in the differentiation of osteoclasts, the cells implicated in bone resorption. Functional follow-up studies using cell culture, zebrafish models, and measurement of ROS in patients uncovered that the NOX4 variants found in this study increase the levels of ROS both in vitro and in vivo. We propose NOX4 as the first candidate susceptibility gene for PAM. Our study links high levels of ROS caused by NOX4 variants to the development of PAM, opening the possibility for a potential therapeutic target.

Genetics

Immunology

0

Paper

Genetics

Immunology

0

Save