ResearchHub | Open Science Community

Evolution of genes and genomes on the Drosophila phylogeny

Andrew Clark et al.Nov 1, 2007

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species. This issue includes a landmark collection of papers on the stalwart of the genetics lab, the Drosophila fruit fly. The centrepiece is the publication by the Drosophila 12 Genomes Consortium of the genomic sequence for ten Drosophila species. The paper compares the newly sequenced genomes (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi species), with the two previously known sequences for D. melanogaster and D. pseudoobscura. The resulting database of genetic variation will be invaluable for the study of the forces of evolutionary change. A second major collaboration has mined the dozen Drosophila genome sequences for conserved elements, and reports the relationship between conservation and function for many specific sequence motifs. A detailed regulatory network emerges, identifying protein-coding genes and exons, RNA genes, microRNAs and their targets. These papers are discussed in News and Views. Two further research papers use the new genomic data to study gene expression, first for genes with male-biased expression and those unique to each species and second, to track the evolution of gene dosage compensation on Drosophila sex chromosomes. Four new reviews focus on how the latest work on Drosophila is taking this genetically pliant lab model into exciting new fields. Pierre Leopold and Norbert Perrimon review advances in the study of endocrinology and homeostasis that are establishing Drosophila as a model for mammalian physiology. Drosophila has proved a powerful system in which to study the pathways controlling cell shape in growing tissue, as reported by Thomas Lecuit and Loïc Le Goff. Leslie Vosshall reviews the remarkable work linking neural circuits and behaviour and John Lis reviews work on Drosophila that has rewritten the textbook view of gene transcription. The cover shows anaesthetized individuals of all twelve Drosophila species. An international consortium reports the genomic sequence for ten Drosophila species, and compares them to two other previously published Drosophila species. These data are invaluable for drawing evolutionary conclusions across an entire phylogeny of species at once.

Genetics

Immunology

0

Paper

Save

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia

Stephen Landt et al.Sep 1, 2012

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

Genetics

Molecular Biology

0

Paper

Save

Predicting Splicing from Primary Sequence with Deep Learning

Kishore Jaganathan et al.Jan 1, 2019

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.

Genetics

Molecular Biology

0

Paper

Save

Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++

Eugene Davydov et al.Dec 2, 2010

Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.

Genetics

Artificial Intelligence

0

Paper

Save

Architecture of the human regulatory network derived from ENCODE data

Mark Gerstein et al.Sep 1, 2012

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease. A description is given of the ENCODE consortium’s efforts to examine the principles of human transcriptional regulatory networks; the results are integrated with other genomic information to form a hierarchical meta-network where different levels have distinct properties. This manuscript describes the effort of the ENCODE (Encyclopedia of DNA Elements) Consortium to examine the principles of human transcriptional regulatory networks, using a subset of 119 transcription factors. The results are integrated with other genomic information to form a multi-level meta-network in which different levels have distinct properties. The findings will aid future interpretations of human genomics and help us to understand the basic principles of human biology and disease.

Genetics

Philosophy

0

Paper

Save

Distribution and intensity of constraint in mammalian genomic sequence

Gregory Cooper et al.Jun 17, 2005

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures ∼3.9 neutral substitutions per site and spans ∼1.9 Mbp of the human genome. We identify constrained elements from 3 bp to over 1 kbp in length, covering ∼5.5% of the human locus. Our estimate for the total amount of nonexonic constraint experienced by this locus is roughly twice that for exonic constraint. Constrained elements tend to cluster, and we identify large constrained regions that correspond well with known functional elements. While constraint density inversely correlates with mobile element density, we also show the presence of unambiguously constrained elements overlapping mammalian ancestral repeats. In addition, we describe a number of elements in this region that have undergone intense purifying selection throughout mammalian evolution, and we show that these important elements are more numerous than previously thought. These results were obtained with Genomic Evolutionary Rate Profiling (GERP), a statistically rigorous and biologically transparent framework for constrained element identification. GERP identifies regions at high resolution that exhibit nucleotide substitution deficits, and measures these deficits as “rejected substitutions.” Rejected substitutions reflect the intensity of past purifying selection and are used to rank and characterize constrained elements. We anticipate that GERP and the types of analyses it facilitates will provide further insights and improved annotation for the human genome as mammalian genome sequence data become richer.

Genetics

Molecular Biology

0

Paper

Save

Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae

James Galagan et al.Dec 1, 2005

The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation. More than 300 labs worldwide are using the fungus Aspergillus nidulans as a model system for molecular genetics, and other species of this fungus are important in everyday life. A package of three genomics papers in this issue covers the Aspergillus field comprehensively. Galagan et al. report the genome sequence of the laboratory classic A. nidulans, and Nierman et al. have sequenced A. fumigatus, known chiefly as a human pathogen and allergen. And finally Machida et al. present genome sequencing and analysis of A. oryzae, focusing in particular on the expansion of genes in its genome, which is almost 25% bigger than the other two genomes. A. oryzae is used in traditional Chinese and Japanese food fermentation (think soy sauce) and also in enzyme production by biotechnologists.

Genetics

Microbiology

0

Paper

Save

ProbCons: Probabilistic consistency-based multiple sequence alignment

Michèle Hu et al.Feb 1, 2005

To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consistency , a novel scoring function for multiple sequence comparisons. We present ProbCons, a practical tool for progressive protein multiple sequence alignment based on probabilistic consistency, and evaluate its performance on several standard alignment benchmark data sets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, ProbCons achieves statistically significant improvement over other leading methods while maintaining practical speed. ProbCons is publicly available as a Web resource.

Genetics

Artificial Intelligence

0

Paper

Save

LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA

Michael Brudno et al.Mar 12, 2003

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an ∼1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu .

Genetics

Artificial Intelligence

0

Paper

Save

A Single P450 Allele Associated with Insecticide Resistance in Drosophila

Phillip Daborn et al.Sep 27, 2002

Insecticide resistance is one of the most widespread genetic changes caused by human activity, but we still understand little about the origins and spread of resistant alleles in global populations of insects. Here, via microarray analysis of all P450s in Drosophila melanogaster , we show that DDT-R , a gene conferring resistance to DDT, is associated with overtranscription of a single cytochrome P450 gene, Cyp6g1 . Transgenic analysis of Cyp6g1 shows that overtranscription of this gene alone is both necessary and sufficient for resistance. Resistance and up-regulation in Drosophila populations are associated with a single Cyp6g1 allele that has spread globally. This allele is characterized by the insertion of an Accord transposable element into the 5′ end of the Cyp6g1 gene.

Genetics

Ecology

0

Paper

Genetics

820

0

Save