ResearchHub | Open Science Community

A Draft Sequence of the Rice Genome ( Oryza sativa L. ssp. indica )

Jun Yu et al.Apr 5, 2002

We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica , by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana . The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

Genetics

Molecular Biology

0

Paper

Save

WEGO: a web tool for plotting GO annotations

Jia Ye et al.Jul 1, 2006

Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at http://wego.genomics.org.cn . There are two available mirror sites at http://wego2.genomics.org.cn and http://wego.genomics.com.cn . Any suggestions are welcome at wego@genomics.org.cn .

Genetics

Philosophy

0

Paper

Save

SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data

Yuxin Chen et al.Dec 4, 2017

Quality control (QC) and preprocessing are essential steps for sequencing data analysis to ensure the accuracy of results. However, existing tools cannot provide a satisfying solution with integrated comprehensive functions, proper architectures, and highly scalable acceleration. In this article, we demonstrate SOAPnuke as a tool with abundant functions for a “QC-Preprocess-QC” workflow and MapReduce acceleration framework. Four modules with different preprocessing functions are designed for processing datasets from genomic, small RNA, Digital Gene Expression, and metagenomic experiments, respectively. As a workflow-like tool, SOAPnuke centralizes processing functions into 1 executable and predefines their order to avoid the necessity of reformatting different files when switching tools. Furthermore, the MapReduce framework enables large scalability to distribute all the processing works to an entire compute cluster. We conducted a benchmarking where SOAPnuke and other tools are used to preprocess a ∼30× NA12878 dataset published by GIAB. The standalone operation of SOAPnuke struck a balance between resource occupancy and performance. When accelerated on 16 working nodes with MapReduce, SOAPnuke achieved ∼5.7 times the fastest speed of other tools.

Ecology

Artificial Intelligence

0

Paper

Save

The Genomes of Oryza sativa: A History of Duplications

Jun Yu et al.Jan 21, 2005

We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.

Genetics

Molecular Biology

0

Paper

Save

The diploid genome sequence of an Asian individual

Jun Wang et al.Nov 1, 2008

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual’s genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.

Genetics

Molecular Biology

0

Paper

Save

WEGO 2.0: a web tool for analyzing and plotting GO annotations, 2018 update

Jia Ye et al.May 10, 2018

WEGO (Web Gene Ontology Annotation Plot), created in 2006, is a simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results. Owing largely to the rapid development of high-throughput sequencing and the increasing acceptance of GO, WEGO has benefitted from outstanding performance regarding the number of users and citations in recent years, which motivated us to update to version 2.0. WEGO uses the GO annotation results as input. Based on GO’s standardized DAG (Directed Acyclic Graph) structured vocabulary system, the number of genes corresponding to each GO ID is calculated and shown in a graphical format. WEGO 2.0 updates have targeted four aspects, aiming to provide a more efficient and up-to-date approach for comparative genomic analyses. First, the number of input files, previously limited to three, is now unlimited, allowing WEGO to analyze multiple datasets. Also added in this version are the reference datasets of nine model species that can be adopted as baselines in genomic comparative analyses. Furthermore, in the analyzing processes each Chi-square test is carried out for multiple datasets instead of every two samples. At last, WEGO 2.0 provides an additional output graph along with the traditional WEGO histogram, displaying the sorted P-values of GO terms and indicating their significant differences. At the same time, WEGO 2.0 features an entirely new user interface. WEGO is available for free at http://wego.genomics.org.cn.

Genetics

Philosophy

0

Paper

Save

A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms

Gane Wong et al.Dec 1, 2004

We describe a genetic variation map for the chicken genome containing 2.8 million single-nucleotide polymorphisms (SNPs). This map is based on a comparison of the sequences of three domestic chicken breeds (a broiler, a layer and a Chinese silkie) with that of their wild ancestor, red jungle fowl. Subsequent experiments indicate that at least 90% of the variant sites are true SNPs, and at least 70% are common SNPs that segregate in many domestic breeds. Mean nucleotide diversity is about five SNPs per kilobase for almost every possible comparison between red jungle fowl and domestic lines, between two different domestic lines, and within domestic lines—in contrast to the notion that domestic animals are highly inbred relative to their wild ancestors. In fact, most of the SNPs originated before domestication, and there is little evidence of selective sweeps for adaptive alleles on length scales greater than 100 kilobases.

Genetics

Demography

0

Paper

Save

SOAPTyping: an open-source and cross-platform tool for Sanger sequence-based typing for HLA class I and II alleles

Yong Zhang et al.Jun 20, 2019

The human leukocyte antigen (HLA) gene family plays a key role in the immune response and thus is crucial in many biomedical and clinical settings. Utilizing Sanger sequencing, the gold standard technology for HLA typing, enables accurate identification of HLA alleles with high-resolution. However, there exists a current hurdle that only commercial software such as UType, SBT-Assign and SBTEngine, instead of any open source tools could be applied to perform HLA typing based on Sanger sequencing. To fill the gap, we developed a stand-alone, open-source and cross-platform software, known as SOAPTyping, for Sanger-based typing in HLA class I and II alleles. Availability and implementation: SOAPTyping is implemented in C++ language and Qt framework, which is supported on Windows, Mac and Linux. Source code and detailed documentation are accessible via the project GitHub page: https://github.com/BGI-flexlab/SOAPTyping.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

Two-Stage Beamspace MUSIC-Based Near-Field Channel Estimation for Hybrid XL-MIMO

Kaiqian Qu et al.Jun 4, 2024

In extremely large-scale multiple-input multiple-output (XL-MIMO) systems, channel estimation poses a key challenge due to the introduction of the unknown distance parameter in near-field scenarios. We propose a beamforming codebook that includes pre-compensated distances, which allows the application of the traditional beamspace multiple signal classification (BMUSIC) to near-field channel estimation. To determine the optimal pre-compensation distance, we introduce three strategies: Maximizing the correlation integral (MCI), maximizing the minimum correlation (MMC), and exceeding the minimum correlation threshold (EMCT). In addition, we develop a two-stage BMUSIC algorithm and a switch transformation design to further reduce the time-intensive 2-dimensional (2D) search processes and avoid the overlaps of multiple coherent paths. Simulation results confirm that the proposed method not only diminishes computational complexity but also notably outperforms existing methods in terms of estimation accuracy.

Geology

Paleontology

0

Paper

Geology

Paleontology

0

Save