ResearchHub | Open Science Community

JASPAR 2020: update of the open-access database of transcription factor binding profiles

Oriol Fornés et al.Oct 16, 2019

Abstract JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.

Genetics

Plant Science

0

Paper

Save

JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework

Aziz Khan et al.Oct 27, 2017

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.

Genetics

Molecular Biology

0

Paper

Save

The T cell epitope landscape of SARS-CoV-2 variants of concern

Simen Tennøe et al.Jun 6, 2022

ABSTRACT During the COVID-19 pandemic, several SARS-CoV-2 variants of concern (VOC) emerged, bringing with them varying degrees of health and socioeconomic burdens. In particular, the Omicron VOC displayed distinct features of increased transmissibility accompanied by anti-genic drift in the spike protein that partially circumvented the ability of pre-existing anti-body responses in the global population to neutralize the virus. However, T cell immunity has remained robust throughout all the different VOC transmission waves and has emerged as a critically important correlate of protection against SARS-CoV-2 and it’s VOCs, in both vaccinated and infected individuals. Therefore, as SARS-CoV-2 VOCs continue to evolve, it is crucial that we characterize the correlates of protection and the potential for immune escape for both B cell and T cell human immunity in the population. Generating the insights necessary to understand T cell immunity, experimentally, for the global human population is at present critical but a time consuming, expensive, and laborious process. Further, it is not feasible to generate global or universal insights into T cell immunity in an actionable time frame for potential future emerging VOCs. However, using computational means we can expedite and provide early insights into the correlates of T cell protection. In this study, we generated and reveal insights on the T cell epitope landscape for the five main SARS-CoV-2 VOCs observed to date. We demonstrated here using a unique AI prediction platform, a strong concordance in global T cell protection across all mutated peptides for each VOC. This was modeled using the most frequent HLA alleles in the human population and covers the most common HLA haplotypes in the human population. The AI resource generated through this computational study and associated insights may guide the development of T cell vaccines and diagnostics that are even more robust against current and future VOCs, and their emerging subvariants.

Genetics

Immunology

1

Paper

Save

Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2: toward universal blueprints for vaccine designs

Brandon Malone et al.Apr 21, 2020

Abstract The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goals of this study were to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant “epitope hotspot” regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA genotypes of approximately 22 000 individuals to develop a “digital twin” type simulation to model how effective different combinations of hotspots would work in a diverse human population, and used the approach to identify an optimal constellation of epitopes hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have managed to profile the entire SARS-CoV-2 proteome and identify a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.

Genetics

Immunology

32

Paper

Save

A map of direct TF-DNA interactions in the human genome

Marius Gheorghe et al.Aug 17, 2018

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1,983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. Finally, we provide this collection of direct TF-DNA interactions and cis-regulatory modules in the human genome through the UniBind web-interface (http://unibind.uio.no).

Genetics

Molecular Biology

0

Paper

Save

Beware the Jaccard: the choice of metric is important and non-trivial in genomic colocalisation analysis.

Stefania Salvatore et al.Nov 27, 2018

Background: The generation and systematic collection of genome-wide data is ever-increasing. This vast amount of data has enabled researchers to study relations between a variety of genomic and epigenomic features, including genetic variation, gene regulation, and phenotypic traits. Such relations are typically investigated by comparatively assessing genomic co-occurrence. Technically, this corresponds to assessing the similarity of pairs of genome-wide binary vectors. A variety of metrics have been proposed for this problem in other fields like ecology. However, while several of these metrics have been employed for assessing genomic co-occurrence, their appropriateness for the genomic setting has never been investigated. Results: We show that the choice of metric may strongly influence results and propose two alternative modelling assumptions that can be used to guide this choice. On both simulated and real genomic data, the Jaccard index is strongly affected by dataset size and should be used with caution. The Forbes coefficient (fold change) and tetrachoric correlation are less affected by dataset size, but one should be aware of increased variance for small datasets. Availability: All results on simulated and real data can be inspected and reproduced at https://hyperbrowser.uio.no/sim-measure

Genetics

Artificial Intelligence

0

Paper

Genetics

Artificial Intelligence

0

Save