ResearchHub | Open Science Community

NIAGADS Alzheimer’s GenomicsDB: A resource for exploring Alzheimer’s Disease genetic and genomic knowledge

Emily Greenfest‐Allen et al.Sep 25, 2020

Abstract INTRODUCTION The NIAGADS Alzheimer’s Genomics Database (GenomicsDB) is a public knowledgebase of Alzheimer’s disease (AD) genetic datasets and genomic annotations. METHODS It uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant GWAS summary statistics datasets with functional annotations, including a database of >230 million annotated variants from the AD Sequencing Project’s joint-calling efforts. RESULTS The knowledgebase generates genome browser tracks and interactive compiled from harmonized datasets and annotations in the underlying database. These facilitate data sharing and discovery, by contextualizing AD-risk associations in a broader functional genomic context or summarizing them in the context of functionally annotated genes and variants. DISCUSSION Created to make AD-genetics knowledge more accessible to AD-researchers, the GenomicsDB shares annotated AD-relevant summary statistics datasets via a web interface designed to guide users unfamiliar with genetic data in not only exploring, but also interpreting this ever-growing volume of data.

Genetics

Paleontology

7

Paper

Save

FILER: large-scale, harmonized FunctIonaL gEnomics Repository

Pavel Kuksa et al.Jan 25, 2021

Abstract Motivation Querying massive collections of functional genomic and annotation data, linking and summarizing the query results across data sources and data types are important steps in high-throughput genomic and genetic analytical workflows. However, accomplishing these steps is difficult because of the heterogeneity and breadth of data sources, experimental assays, biological conditions (e.g., tissues, cell types), data types, and file formats. Results FunctIonaL gEnomics Repository (FILER) is a large-scale, harmonized functional genomics data catalog uniquely providing: 1) streamlined access to >50,000 harmonized, annotated functional genomic and annotation datasets across >20 integrated data sources, >1,100 biological conditions/tissues/cell types, and >20 experimental assays; 2) a scalable, indexing-based genomic querying interface; 3) ability for users to analyze and annotate their own experimental data against reference datasets. This rich resource spans >17 Billion genomic records for both GRCh37/hg19 and GRCh38/hg38 genome builds. FILER scales well with the experimental (query) data size and the number of reference datasets and data sources. When evaluated on large-scale analysis tasks, FILER demonstrated great efficiency as the observed running time for querying 1000x more genomic intervals (10 6 vs. 10 3 ) against all 7×10 9 hg19 FILER records increased sub-linearly by only a factor of 15x. Together, these features facilitate reproducible research and streamline querying, integrating, and utilizing large-scale functional genomics and annotation data. Availability and implementation FILER can be 1) freely accessed at https://lisanwanglab.org/FILER , 2) deployed on cloud or local servers ( https://bitbucket.org/wanglab-upenn/FILER ), and 3) integrated with other pipelines using provided scripts. Contact lswang@pennmedicine.upenn.edu

Genetics

Artificial Intelligence

8

Paper

Save

HIPPIE2: a method for fine-scale identification of physically interacting chromatin regions

Pavel Kuksa et al.May 10, 2019

Most regulatory chromatin interactions are mediated by various transcription factors (TFs) and involve physically-interacting elements such as enhancers, insulators, or promoters. To map these elements and interactions, we developed HIPPIE2 which analyzes raw reads from high-throughput chromosome conformation (Hi-C) experiments to identify fine-scale physically-interacting regions (PIRs). Unlike standard genome binning approaches (e.g., 10K-1Mbp bins), HIPPIE2 dynamically calls physical locations of PIRs with better precision and higher resolution based on the pattern of restriction events and relative locations of interacting sites inferred from the sequencing readout.We applied HIPPIE2 to in situ Hi-C datasets across 6 human cell lines (GM12878, IMR90, K562, HMEC, HUVEC, NHEK) with matched ENCODE and Roadmap functional genomic data. HIPPIE2 detected 1,042,738 distinct PIRs across cell lines, with high resolution (average PIR length of 1,006bps) and high reproducibility (92.3% in GM12878 replicates). 32.8% of PIRs were shared among cell lines. PIRs are enriched for epigenetic marks (H3K27ac, H3K4me1) and open chromatin, suggesting active regulatory roles. HIPPIE2 identified 2.8M significant intrachromosomal PIR–PIR interactions, 27.2% of which were enriched for TF binding sites. 50,608 interactions were enhancer–promoter interactions and were enriched for 33 TFs (31 in enhancers/29 in promoters), several of which are known to mediate DNA looping/long-distance regulation. 29 TFs were enriched in >1 cell line and 4 were cell line-specific. These findings demonstrate that the dynamic approach used in HIPPIE2 ( ) characterizes PIR–PIR interactions with high resolution and reproducibility.

Genetics

Molecular Biology

0

Paper

Save

SparkINFERNO: A scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants

Pavel Kuksa et al.Jan 8, 2020

Summary: We report SparkINFERNO (Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants), a scalable bioinformatics pipeline characterizing noncoding GWAS association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts, and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci, and other functional datasets across ore than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWAS studies and show that SparkINFERNO is more than 60-times efficient and scales with data size and amount of computational resources. Availability: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save

0

INFERNO - INFERring the molecular mechanisms of NOncoding genetic variants

Alexandre Amlie‐Wolf et al.Oct 30, 2017

The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, where they affect regulatory elements including transcriptional enhancers. We propose INFERNO (INFERring the molecular mechanisms of NOncoding genetic variants), a novel method which integrates hundreds of diverse functional genomics data sources with GWAS summary statistics to identify putatively causal noncoding variants underlying association signals. INFERNO comprehensively infers the relevant tissue contexts, target genes, and downstream biological processes affected by causal variants. We apply INFERNO to schizophrenia GWAS data, recapitulating known schizophrenia-associated genes including CACNA1C and discovering novel signals related to transmembrane cellular processes.

Genetics

Molecular Biology

0

Paper

Save

hipFG: High-throughput harmonization and integration pipeline for functional genomics data

Jeffrey Cifello et al.Apr 25, 2023

Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG, an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g., chromatin interactions, genomic intervals, quantitative trait loci).

Genetics

Molecular Biology

1

Paper

Save

Author Correction: Genetic, transcriptomic, histological, and biochemical analysis of progressive supranuclear palsy implicates glial activation and novel risk genes

Kurt Farrell et al.Nov 13, 2024

Genetics

Cell Biology

0

Paper

Save

Fast Parallel Algorithm for Large Fractal Kinetic Models with Diffusion

А. Попов et al.Mar 8, 2018

Chemical kinetic simulations are usually based on the law of mass action that applies to behavior of particles in solution. Molecular interactions in a crowded medium as in a cell, however, are not easily described by such conventional mathematical treatment. Fractal kinetics is emerging as a novel method for simulating kinetic reactions in such an environment. To date, there has not been a fast, efficient, and, more importantly, parallel algorithm for such computations. Here, we present an algorithm with several novel features for simulating large (with respect to size and time scale) fractal kinetic models. We applied the fractal kinetic technique and our algorithm to a canonical substrate-enzyme model with explicit phase-separation in the product, and achieved a speed-up of up to 8 times over previous results with reasonably tight bounds on the accuracy of the simulation. We anticipate that this technique and algorithm will have important applications to simulation of intra-cell biochemical reactions with complex dynamic behavior.

Genetics

Biophysics

0

Paper

Genetics

Biophysics

0

Save