ResearchHub | Open Science Community

Mapping copy number variation by population-scale genome sequencing

Ryan Mills et al.Feb 1, 2011

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies. Copy number variations (or CNVs) are large-scale deletions, duplications and insertions that contribute significantly to genetic variation in the human genome, and many CNVs are linked to susceptibility to disease. A high-resolution map of CNVs has now been produced by harnessing information from whole-genome sequencing in 185 individuals. Nucleotide resolution of the map facilitates analysis of structural variant distribution and identification of the mechanisms of their origin. The study provides a resource for sequence-based association studies. Harnessing information from whole genome sequencing in 185 individuals, this study generates a high-resolution map of copy number variants. Nucleotide resolution of the map facilitates analysis of structural variant distribution and identification of the mechanisms of their origin. The study provides a resource for sequence-based association studies.

Genetics

Molecular Biology

0

Paper

Save

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications

Andy Rimmer et al.Jul 13, 2014

Gerton Lunter and colleagues report Platypus software, which combines a haplotype-based multi-sample variant caller with local sequence assembly in a Bayesian statistical framework. They demonstrate applications to exome and whole-genome data sets, to the identification de novo mutations in parent-offspring trios and to the genotyping of HLA loci. High-throughput DNA sequencing technology has transformed genetic research and is starting to make an impact on clinical practice. However, analyzing high-throughput sequencing data remains challenging, particularly in clinical settings where accuracy and turnaround times are critical. We present a new approach to this problem, implemented in a software package called Platypus. Platypus achieves high sensitivity and specificity for SNPs, indels and complex polymorphisms by using local de novo assembly to generate candidate variants, followed by local realignment and probabilistic haplotype estimation. It is an order of magnitude faster than existing tools and generates calls from raw aligned read data without preprocessing. We demonstrate the performance of Platypus in clinically relevant experimental designs by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls.

Genetics

Artificial Intelligence

0

Paper

Save

De novo assembly and genotyping of variants using colored de Bruijn graphs

Zamin Iqbal et al.Jan 8, 2012

Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.

Genetics

Molecular Biology

0

Paper

Save

Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study

Timothy Walker et al.Jun 24, 2015

Diagnosing drug-resistance remains an obstacle to the elimination of tuberculosis. Phenotypic drug-susceptibility testing is slow and expensive, and commercial genotypic assays screen only common resistance-determining mutations. We used whole-genome sequencing to characterise common and rare mutations predicting drug resistance, or consistency with susceptibility, for all first-line and second-line drugs for tuberculosis.Between Sept 1, 2010, and Dec 1, 2013, we sequenced a training set of 2099 Mycobacterium tuberculosis genomes. For 23 candidate genes identified from the drug-resistance scientific literature, we algorithmically characterised genetic mutations as not conferring resistance (benign), resistance determinants, or uncharacterised. We then assessed the ability of these characterisations to predict phenotypic drug-susceptibility testing for an independent validation set of 1552 genomes. We sought mutations under similar selection pressure to those characterised as resistance determinants outside candidate genes to account for residual phenotypic resistance.We characterised 120 training-set mutations as resistance determining, and 772 as benign. With these mutations, we could predict 89·2% of the validation-set phenotypes with a mean 92·3% sensitivity (95% CI 90·7-93·7) and 98·4% specificity (98·1-98·7). 10·8% of validation-set phenotypes could not be predicted because uncharacterised mutations were present. With an in-silico comparison, characterised resistance determinants had higher sensitivity than the mutations from three line-probe assays (85·1% vs 81·6%). No additional resistance determinants were identified among mutations under selection pressure in non-candidate genes.A broad catalogue of genetic mutations enable data from whole-genome sequencing to be used clinically to predict drug resistance, drug susceptibility, or to identify drug phenotypes that cannot yet be genetically predicted. This approach could be integrated into routine diagnostic workflows, phasing out phenotypic drug-susceptibility testing while reporting drug resistance early.Wellcome Trust, National Institute of Health Research, Medical Research Council, and the European Union.

Genetics

Epidemiology

0

Paper

Save

The global distribution and spread of the mobilized colistin resistance gene mcr-1

Ruobing Wang et al.Mar 21, 2018

Abstract Colistin represents one of the few available drugs for treating infections caused by carbapenem-resistant Enterobacteriaceae . As such, the recent plasmid-mediated spread of the colistin resistance gene mcr-1 poses a significant public health threat, requiring global monitoring and surveillance. Here, we characterize the global distribution of mcr-1 using a data set of 457 mcr-1- positive sequenced isolates. We find mcr-1 in various plasmid types but identify an immediate background common to all mcr-1 sequences. Our analyses establish that all mcr-1 elements in circulation descend from the same initial mobilization of mcr-1 by an IS A pl1 transposon in the mid 2000s (2002–2008; 95% highest posterior density), followed by a marked demographic expansion, which led to its current global distribution. Our results provide the first systematic phylogenetic analysis of the origin and spread of mcr-1 , and emphasize the importance of understanding the movement of antibiotic resistance genes across multiple levels of genomic organization.

Genetics

Oncology

1

Paper

Save

Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

Phelim Bradley et al.Dec 21, 2015

The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

Genetics

Epidemiology

0

Paper

Save

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

Olivier Delaneau et al.Jun 13, 2014

A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.

Genetics

Machine Learning

0

Paper

Save

Correction: Corrigendum: Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

Phelim Bradley et al.Apr 20, 2016

Nature Communications 6: Article number: 10063 (2015); Published: 21 December 2015; Updated: 20 April 2016 In Supplementary Data 3 of this Article, one of the Staphylococcus aureus accession codes is incorrect, as follows: SRR2101499 should be ERR1197981.

Genetics

Epidemiology

0

Paper

Save

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree

Michael Eberle et al.Nov 30, 2016

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.

Genetics

Biology

1

Paper

Save

Same-Day Diagnostic and Surveillance Data for Tuberculosis via Whole-Genome Sequencing of Direct Respiratory Samples

Antonina Votintseva et al.Mar 9, 2017

ABSTRACT Routine full characterization of Mycobacterium tuberculosis is culture based, taking many weeks. Whole-genome sequencing (WGS) can generate antibiotic susceptibility profiles to inform treatment, augmented with strain information for global surveillance; such data could be transformative if provided at or near the point of care. We demonstrate a low-cost method of DNA extraction directly from patient samples for M. tuberculosis WGS. We initially evaluated the method by using the Illumina MiSeq sequencer (40 smear-positive respiratory samples obtained after routine clinical testing and 27 matched liquid cultures). M. tuberculosis was identified in all 39 samples from which DNA was successfully extracted. Sufficient data for antibiotic susceptibility prediction were obtained from 24 (62%) samples; all results were concordant with reference laboratory phenotypes. Phylogenetic placement was concordant between direct and cultured samples. With Illumina MiSeq/MiniSeq, the workflow from patient sample to results can be completed in 44/16 h at a reagent cost of £96/£198 per sample. We then employed a nonspecific PCR-based library preparation method for sequencing on an Oxford Nanopore Technologies MinION sequencer. We applied this to cultured Mycobacterium bovis strain BCG DNA and to combined culture-negative sputum DNA and BCG DNA. For flow cell version R9.4, the estimated turnaround time from patient to identification of BCG, detection of pyrazinamide resistance, and phylogenetic placement was 7.5 h, with full susceptibility results 5 h later. Antibiotic susceptibility predictions were fully concordant. A critical advantage of MinION is the ability to continue sequencing until sufficient coverage is obtained, providing a potential solution to the problem of variable amounts of M. tuberculosis DNA in direct samples.

Genetics

Epidemiology

1

Paper

Genetics

322

0

Save