ResearchHub | Open Science Community

Generalized Reporter Score-based Enrichment Analysis for Diverse Omics Data

Peng Chen et al.Jan 1, 2023

Enrichment analysis contextualizes biological features in pathways to facilitate a systematic understanding of high-dimensional data and is widely used in biomedical research. The emerging method known as the reporter score-based analysis (RSA) shows more promising sensitivity, as it relies on p-values instead of raw values of features. However, RSA can only be applied to two-group comparisons and is often misused due to the lack of a convenient tool. We propose the Generalized Reporter Score-based Enrichment Analysis (GRSA) method for enrichment analysis of multi-group and longitudinal omics data. The GRSA is implemented in an R package, ReporterScore, integrating a powerful visualization module and updatable pathway databases. A comparison with other common pathway enrichment analysis methods, such as Fisher9s exact test and GSEA, reveals that GRSA exhibits increased sensitivity across multiple benchmark datasets. We applied GRSA to the microbiome, transcriptome, and metabolome data to show its versatility in discovering new biological insights in omics studies. Finally, we showcased the applicability of the GRSA method beyond functional enrichment using a custom taxonomy database. We believe the ReporterScore package will be an invaluable tool for broad biomedical research fields. The ReporterScore and a complete description of the usages are publicly available on GitHub (https://github.com/Asa12138/ReporterScore).

Access COI barcode efficiently using high throughput Single End 400 bp sequencing

Chentao Yang et al.Dec 17, 2018

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, constraints in barcoding costs led to unbalanced efforts which prevented accurate taxonomic identification for biodiversity studies. We present a high throughput sequencing approach based on the HIFI-SE pipeline which takes advantage of Single-End 400 bp (SE400) sequencing data generated by BGISEQ-500 to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons. HIFI-SE was written in Python and included four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a test plate which contained 96 samples (30 coral, 64 insects and 2 blank controls) and delivered a total of 86 fully assembled HIFI COI barcodes. By comparing to their corresponding Sanger sequences (72 sequences available), it showed that most of the samples (98.61%, 71/72) were correctly and accurately assembled, including 46 samples that had a similarity of 100% and 25 of ca. 99%. Our approach can produce standard full-length barcodes cost efficiently, allowing DNA barcoding for global biomes which will advance DNA-based species identification for various ecosystems and improved quarantine biosecurity efforts.