ResearchHub | Open Science Community

Initial sequencing and comparative analysis of the mouse genome

R Waterston et al.Dec 16, 2023

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

Genome

Synteny

Biology

4

Paper

Save

Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform

Gilad Almogy et al.Oct 13, 2023

Abstract We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow optical end-point detection without reversible terminators. This platform enables sequencing billions of reads with longer read length (∼300bp) and fast runs times (<20hrs) with high base accuracy (Q30 > 85%), at a low cost of $1/Gb. We establish system performance by whole-genome sequencing of the Genome-In-A-Bottle reference samples HG001-7, demonstrating high accuracy for SNPs (99.6%) and Indels in homopolymers up to length 10 (96.4%) across the vast majority (>98%) of the defined high-confidence regions of these samples. We demonstrate scalability of the whole-genome sequencing workflow by sequencing an additional 224 selected samples from the 1000 Genomes project achieving high concordance with reference data.

Genome

Whole Genome Sequencing

Reference Genome

376

Paper

Save

Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations

Alicia Martin et al.Oct 24, 2023

Abstract Background Genetic studies of biomedical phenotypes in underrepresented populations identify disproportionate numbers of novel associations. However, current genomics infrastructure--including most genotyping arrays and sequenced reference panels--best serves populations of European descent. A critical step for facilitating genetic studies in underrepresented populations is to ensure that genetic technologies accurately capture variation in all populations. Here, we quantify the accuracy of low-coverage sequencing in diverse African populations. Results We sequenced the whole genomes of 91 individuals to high-coverage (≥20X) from the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study, in which participants were recruited from Ethiopia, Kenya, South Africa, and Uganda. We empirically tested two data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole genome sequencing data. We show that low-coverage sequencing at a depth of ≥4X captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1X) performed comparable to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation, with 4X sequencing detecting 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Conclusion These results indicate that low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, including those that capture variation most common in Europeans and Africans. Low-coverage sequencing effectively identifies novel variation (particularly in underrepresented populations), and presents opportunities to enhance variant discovery at a similar cost to traditional approaches.

Genotyping

Deep Sequencing

Concordance

66

Paper

Save

Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms

Maura Costello et al.May 6, 2020

+12

J

M

Here, we present an in-depth characterization of the index swapping mechanism on Illumina instruments that employ the ExAmp chemistry for cluster generation (HiSeqX, HiSeq4000, and NovaSeq). We discuss best practices for eliminating the effects of index swapping on data integrity by utilizing unique dual indexing for complete filtering of index swapped reads. We calculate mean swap rates across multiple sample preparation methods and sequencer models, demonstrating that different methods can have vastly different swap rates, and show that even non-ExAmp chemistry instruments display trace levels of index swapping. Finally, using computational methods we provide a greater insight into the mechanism of index swapping.

Swap (Finance)

Search Engine Indexing

Index (Typography)

0

Paper

Swap (Finance)

Search Engine Indexing

0

Save