ResearchHub | Open Science Community

Within-patient genetic diversity of SARS-CoV-2

Jack Kuipers et al.Oct 12, 2020

Abstract SARS-CoV-2, the virus responsible for the current COVID-19 pandemic, is evolving into different genetic variants by accumulating mutations as it spreads globally. In addition to this diversity of consensus genomes across patients, RNA viruses can also display genetic diversity within individual hosts, and co-existing viral variants may affect disease progression and the success of medical interventions. To systematically examine the intra-patient genetic diversity of SARS-CoV-2, we processed a large cohort of 3939 publicly-available deeply sequenced genomes with specialised bioinformatics software, along with 749 recently sequenced samples from Switzerland. We found that the distribution of diversity across patients and across genomic loci is very unbalanced with a minority of hosts and positions accounting for much of the diversity. For example, the D614G variant in the Spike gene, which is present in the consensus sequences of 67.4% of patients, is also highly diverse within hosts, with 29.7% of the public cohort being affected by this coexistence and exhibiting different variants. We also investigated the impact of several technical and epidemiological parameters on genetic heterogeneity and found that age, which is known to be correlated with poor disease outcomes, is a significant predictor of viral genetic diversity. Author Summary Since it arose in late 2019, the new coronavirus (SARS-CoV-2) behind the COVID-19 pandemic has mutated and evolved during its global spread. Individual patients may host different versions, or variants, of the virus, hallmarked by different mutations. We examine the diversity of genetic variants coexisting within patients across a cohort of 3939 publicly accessible samples and 749 recently sequenced samples from Switzerland. We find that a small number of patients carry most of the diversity, and that patients with more diversity tend to be older. We also find that most of the diversity is concentrated in certain regions and positions of the virus genome. In particular, we find that a variant reported to increase infectivity is among the most diverse positions. Our study provides a large-scale survey of within-patient diversity of the SARS-CoV-2 genome.

Genetics

Law

18

Paper

Save

Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer

Arthur Dondi et al.Dec 14, 2022

Abstract Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 are novel. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.

Genetics

Molecular Biology

0

Paper

Save

SIEVE: joint inference of single-nucleotide variants and cell phylogeny from single-cell DNA sequencing data

Senbai Kang et al.Mar 27, 2022

Abstract Single-cell DNA sequencing (scDNA-seq) has enabled the identification of single nucleotide somatic variants and the reconstruction of cell phylogenies. However, statistical phylogenetic models for cell phylogeny reconstruction from raw sequencing data are still in their infancy. Here we present SIEVE (SIngle-cell EVolution Explorer), a statistical method for the joint inference of somatic variants and cell phylogeny under the finite-sites assumption from scDNA-seq reads. SIEVE leverages raw read counts for all nucleotides at candidate variant sites, and corrects the acquisition bias of branch lengths. In our simulations, SIEVE outperforms other methods both in phylogenetic accuracy and variant calling accuracy. We apply SIEVE to three scDNA-seq datasets, for colorectal (CRC) and triple-negative breast cancer (TNBC), one of them generated by us. On simulated data, SIEVE reliably infers homo-and heterozygous somatic variants. The analysis of real data uncovers that double mutant genotypes are rare in CRC but unexpectedly frequent in TNBC samples.

Genetics

Molecular Biology

15

Paper

Save

WiPP: Workflow for improved Peak Picking for Gas Chromatography-Mass Spectrometry (GC-MS) data

Nico Borgsmüller et al.Jul 24, 2019

Lack of reliable peak detection impedes automated analysis of large scale GC-MS metabolomics datasets. Performance and outcome of individual peak-picking algorithms can differ widely depending on both algorithmic approach and parameters as well as data acquisition method. Comparing and contrasting between algorithms is thus difficult. Here we present a workflow for improved peak picking (WiPP), a parameter optimising, multi-algorithm peak detection for GC-MS metabolomics. WiPP evaluates the quality of detected peaks using a machine learning-based classification scheme based on seven peak classes. The quality information returned by the classifier for each individual peak is merged with results from different peak detection algorithms to create one final high quality peak set for immediate down stream analysis. Medium and low quality peaks are kept for further inspection. By applying WiPP to standard compound mixes and a complex biological dataset we demonstrate that peak detection is improved through the novel way to assign peak quality, an automated parameter optimisation, and results integration across different embedded peak picking algorithms. Furthermore, our approach can provide an impartial performance comparison of different peak picking algorithms. WiPP is freely available on GitHub ( ) under MIT licence.

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save

7

V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimation

Lara Fuhrmann et al.Jan 1, 2023

The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, adaptation to higher sample coverage, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting two large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science.

Ecology

Artificial Intelligence

7

Paper

Ecology

Artificial Intelligence

0

Save

0

De novo detection of somatic variants in long-read single-cell RNA sequencing data

Arthur Dondi et al.Mar 8, 2024

In cancer, genetic and transcriptomic variations generate clonal heterogeneity, possibly leading to treatment resistance. Long-read single-cell RNA sequencing (LR scRNA-seq) has the potential to detect genetic and transcriptomic variations simultaneously. Here, we present LongSom, a computational workflow leveraging LR scRNA-seq data to call de novo somatic single-nucleotide variants (SNVs), copy-number alterations (CNAs), and gene fusions to reconstruct the tumor clonal heterogeneity. For SNV calling, LongSom distinguishes somatic SNVs from germline polymorphisms by reannotating marker gene expression-based cell types using called variants and applying strict filters. Applying LongSom to ovarian cancer samples, we detected clinically relevant somatic SNVs that were validated against single-cell and bulk panel DNA-seq data and could not be detected with short-read (SR) scRNA-seq. Leveraging somatic SNVs and fusions, LongSom found subclones with different predicted treatment outcomes. In summary, LongSom enables de novo SNVs, CNAs, and fusions detection, thus enabling the study of cancer evolution, clonal heterogeneity, and treatment resistance.

Genetics

Molecular Biology

0

Paper

Save

Single-cell phylogenies reveal deviations from clock-like, neutral evolution in cancer and healthy tissues

Nico Borgsmüller et al.Aug 11, 2022

Abstract How tumors evolve affects cancer progression, therapy response, and relapse. However, whether tumor evolution is driven primarily by selectively advantageous or neutral mutations remains under debate. Resolving this controversy has so far been limited by the use of bulk sequencing data. Here, we leverage the high resolution of single-cell DNA sequencing (scDNA-seq) to test for clock-like, neutral evolution. Under neutrality, different cell lineages evolve at a similar rate, accumulating mutations according to a molecular clock. We developed and benchmarked a test of the somatic clock based on single-cell phylogenies and applied it to 22 scDNA-seq datasets. We rejected the clock in 10/13 cancer and 5/9 healthy datasets. The clock rejection in seven cancer datasets could be related to known driver mutations. Our findings demonstrate the power of scDNA-seq for studying somatic evolution and suggest that some cancer and healthy cell populations are driven by selection while others seem to evolve under neutrality.

Genetics

Philosophy

23

Paper

Save

DelSIEVE: joint inference of single-nucleotide variants, somatic deletions, and cell phylogeny from single-cell DNA sequencing data

Senbai Kang et al.Jan 1, 2023

The swift advancements in single-cell DNA sequencing (scDNA-seq) have enabled quantitative assessment of genetic content in individual cells, allowing downstream analyses at the single-cell resolution. This technology considerably facilitates cancer research, yet its underlying power has not been fully exploited. Specifically, computational methods for variant calling and phylogenetic tree reconstruction struggle due to high coverage variance and allelic dropout. To address these issues, here we present DelSIEVE, a statistical method that directly models the inherent noise in scDNA-seq data for the inference of ingle-nucleotide variants (SNVs), somatic deletions, and cell phylogeny. In a simulation study DelSIEVE exhibits outstanding performance with respect to the identification of somatic deletions and SNVs. We apply DelSIEVE to three real datasets, where rare double mutant and somatic deletion genotypes are found in colorectal cancer samples. As expected with the more expressive model, for the triple negative breast cancer sample we identify several somatic deletions, with less single and double mutant genotypes as compared to those reported by our previous method SIEVE.

Genetics

Molecular Biology

0

Paper

Save

Bayesian non-parametric clustering of single-cell mutation profiles

Nico Borgsmüller et al.Jan 15, 2020

The high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified split-merge move and a novel posterior estimator to predict clones and genotypes. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. Its inferred genotypes were the most accurate, and it was the only method able to run and produce results on data sets with 10,000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. With ever growing scDNA-seq data sets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve intra-tumor heterogeneity but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at .

Artificial Intelligence

Molecular Biology

0

Paper

Artificial Intelligence

Molecular Biology

0

Save