ResearchHub | Open Science Community

Biopython: freely available Python tools for computational molecular biology and bioinformatics

Peter Cock et al.Mar 20, 2009

Abstract Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk.

Molecular Biology

Software

0

Paper

Save

Using Tablet for visual exploration of second-generation sequencing data

Iain Milne et al.Mar 24, 2012

The advent of second-generation sequencing (2GS) has provided a range of significant new challenges for the visualization of sequence assemblies. These include the large volume of data being generated, short-read lengths and different data types and data formats associated with the diversity of new sequencing technologies. This article illustrates how Tablet—a high-performance graphical viewer for visualization of 2GS assemblies and read mappings—plays an important role in the analysis of these data. We present Tablet, and through a selection of use cases, demonstrate its value in quality assurance and scientific discovery, through features such as whole-reference coverage overviews, variant highlighting, paired-end read mark-up, GFF3-based feature tracks and protein translations. We discuss the computing and visualization techniques utilized to provide a rich and responsive graphical environment that enables users to view a range of file formats with ease. Tablet installers can be freely downloaded from http://bioinf.hutton.ac.uk/tablet in 32 or 64-bit versions for Windows, OS X, Linux or Solaris. For further details on the Tablet, contact tablet@hutton.ac.uk.

Molecular Biology

Computer Science

0

Paper

Save

Genomic Insights into the Origin of Parasitism in the Emerging Plant Pathogen Bursaphelenchus xylophilus

Taisei Kikuchi et al.Sep 1, 2011

+19

J

T

Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.

Genetics

Ecology

0

Paper

Save

Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB‐LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations

Florian Jupe et al.Aug 13, 2013

Summary RenSeq is a NB ‐ LRR (nucleotide binding‐site leucine‐rich repeat) gene‐targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM , and increased the number of identified NB ‐ LRR s from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB ‐ LRR s and can be accessed through a genome browser that we provide. We compared these NB ‐ LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum ‘Heinz 1706’ extended the NB ‐ LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co‐segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild S olanum species S. berthaultii ( Rpi‐ber2 ) and S. ruiz‐ceballosii ( Rpi‐rzc1 ), we were able to apply R en S eq successfully to identify markers that co‐segregate with resistance towards the late blight pathogen P hytophthora infestans . These SNP identification workflows were designed as easy‐to‐adapt Galaxy pipelines.

Genetics

Cell Biology

0

Paper

Save

SAM/BAM format v1.5 extensions for de novo assemblies

Peter Cock et al.May 29, 2015

Summary: The plain text Sequence Alignment/Map (SAM) file format and its companion binary form (BAM) are a generic alignment format for storing read alignments against reference sequences (and unmapped reads) together with structured meta-data (Li et al., 2009). Driven by the needs of the 1000 Genomes Project which sequenced many individual human genomes, early SAM/BAM usage focused on pairwise alignments of reads to a reference. However, through the CIGAR P operator multiple sequence alignments can also be preserved. Herein we describe clarifications and additions in version 1.5 of the specification to facilitate storing de novo sequence alignments: Padded reference sequences (with gap characters), annotation of reads or regions of the reference, and the option of embedding the reference sequence within the file. Availability: The latest public release of the specification is at http://samtools.sourceforge.net/SAM1.pdf, with in development drafts at https://github.com/samtools/hts-specs/ under version control.

Genetics

Artificial Intelligence

0

Paper

Save

Shared transcriptional control and disparate gain and loss of aphid parasitism genes and loci acquired via horizontal gene transfer

Peter Thorpe et al.Jan 11, 2018

Abstract Background Aphids are a diverse group of taxa that contain hundreds of agronomically important species, which vary in their host range and pathogenicity. However, the genome evolution underlying agriculturally important aphid traits is not well understood. Results We generated highly-contiguous draft genome assemblies for two aphid species: the narrow host range Myzus cerasi , and the cereal specialist Rhopalosiphum padi . Using a de novo gene prediction pipeline on both these genome assemblies, and those of three related species ( Acyrthosiphon pisum, D. noxia and M. persicae ), we show that aphid genomes consistently encode similar gene numbers, and in the case of A. pisum , fewer and larger genes than previously reported. We compare gene content, gene duplication, synteny, horizontal gene transfer events, and putative effector repertoires between these five species to understand the genome evolution of globally important plant parasites. Aphid genomes show signs of relatively distant gene duplication, and substantial, relatively recent, gene birth, and are characterized by disparate gain and loss of genes acquired by horizontal gene transfer (HGT). Such HGT events account for approximately 1% of loci, and contribute to the protein-coding content of aphid species analysed. Putative effector repertoires, originating from duplicated loci, putative HGT events and other loci, have an unusual genomic organisation and evolutionary history. We identify a highly conserved effector-pair that is tightly genetically-linked in all aphid species. In R. padi , this effector pair is tightly transcriptionally-linked, and shares a transcriptional control mechanism with a subset of approximately 50 other putative effectors distributed across the genome. Conclusions This study extends our current knowledge on the evolution of aphid genomes and reveals evidence for a shared control mechanism, which underlies effector expression, and ultimately plant parasitism.

Genetics

Cell Biology

0

Paper

Save

Planemo: a command-line toolkit for developing, deploying, and executing scientific data analyses

Simon Bray et al.Mar 14, 2022

Abstract There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For over a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. In order to streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo’s implementation and describe its broad range of functionality for designing, testing and executing Galaxy tools, workflows and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers. Planemo is a mature project widely used within the Galaxy community which has been downloaded over 80,000 times.

Software

Information Systems

31

Paper

Save

THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline

Peter Cock et al.Apr 6, 2023

ABSTRACT THAPBI PICT is an open source software pipeline for metabarcoding analysis with multiplexed Illumina paired-end reads, including where different amplicons are sequenced together. We demonstrate using worked examples with our own and public data sets how, with appropriate primer settings and a custom database, THAPBI PICT can be applied to other amplicons and organisms, and used for reanalysis of existing datasets. The core dataflow of the implementation is (i) data reduction to unique marker sequences, often called amplicon sequence variants (ASVs), (ii) dynamic thresholds for discarding low abundance sequences to remove noise and artifacts (rather than error correction by default), before (iii) classification using a curated reference database. The default classifier assigns a label to each query sequence based on a database match that is either perfect, or a single base pair edit away (substitution, deletion or insertion). Abundance thresholds for inclusion can be set by the user or automatically using per-batch negative or synthetic control samples. Output is designed for practical interpretation by nonspecialists and includes a read report (ASVs with classification and counts per sample), sample report (samples with counts per species classification), and a topological graph of ASVs as nodes with short edit distances as edges. Source code available from https://github.com/peterjc/thapbi-pict/with documentation including installation instructions.

Ecology

Artificial Intelligence

13

Paper

Save

Hidden Phytophthora diversity unveiled in tree nurseries of the Czech Republic with traditional and metabarcoding techniques

Aneta Bačová et al.Jun 17, 2024

Abstract Phytophthora diversity was examined in eight forest and ornamental nurseries in the Czech Republic. A leaf baiting isolation technique and, in two nurseries, also Illumina DNA metabarcoding were used to reveal the diversity of Phytophthora in soil and irrigation water and compare the efficacy of both approaches. In total, baiting revealed the occurrence of 12 Phytophthora taxa in 59.4% of soil samples from seven (87.5%) nurseries. Additional baiting of compost was carried out in two nurseries and two Phytophthora species were recovered. Irrigation water was examined in three nurseries by baiting or by direct isolation from partially decomposed floating leaves collected from the water source, and two Phytophthora species were obtained. Illumina sequencing of soil and water samples was done in two and one nurseries, respectively. Phytophthora reads were identified as 45 Phytophthora taxa, 15 of them previously unknown taxa from Clades 6, 7, 8 and 9. Another 11 taxa belonged to known or undescribed species of the oomycete genera Globisporangium , Hyaloperonospora , Nothophytophthora , Peronospora and Plasmopara . Overall, with both techniques 50 Phytophthora taxa were detected with five taxa ( P. taxon organica, P. plurivora, P. rosacearum, P. syringae and P. transitoria ) being exclusively detected by baiting and 38 only by DNA metabarcoding. Particularly common records in DNA barcoding were P. cinnamomi and P. lateralis which were not isolated by baiting. Only seven species were detected by both techniques. It is recommended to use the combination of both techniques to determine true diversity of Phytophthora in managed or natural ecosystems and reveal the presence of rare or unknown Phytophthora taxa.

Ecology

Philosophy

0

Paper

Save

FALDO: A semantic standard for describing the location of nucleotide and protein feature annotation.

Jerven Bolleman et al.Jan 31, 2014

Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned omics areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe and potentially merge sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

Genetics

Philosophy

0

Paper

Genetics

Philosophy

0

Save