ResearchHub | Open Science Community

Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning

Tzu‐Chieh Lin et al.Apr 9, 2023

Abstract Background Alternative splicing is a crucial mechanism of post-transcriptional modification responsible for the transcriptome plasticity and proteome diversity of a metazoan cell. Although many splicing regulations around the exon/intron regions have been discovered, the relationship between promoter-bound transcription factors and the downstream alternative splicing remains largely unexplored. Results In this study, we present computational approaches to decipher the regulation relationship connecting the promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine data set, including DNase I hypersensitive sites sequencing and transcriptome in fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to tackle the associations between the promoter and downstream splicing events. Our results demonstrated that the convolutional neural network (CNN) models learned from the TF binding changes in the promoter to predict the splicing pattern changes. Furthermore, through an in silico perturbation-based analysis of the CNN models, we identified several TFs that considerably reduced the model performance of splicing prediction. Conclusion In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.

Genetics

Molecular Biology

1

Paper

Save

The borders of cis-regulatory DNA sequences harbor the divergent transcription factor binding motifs in the human genome

Jia-Hsin Huang et al.Aug 2, 2018

Changes in the cis-regulatory DNA sequences and transcription factor (TF) repertoires provide major sources that shape the gene regulatory evolution in eukaryotes. However, it is currently unclear how dynamic change of DNA sequences introduce various divergence level of TF binding motifs in the genome over evolutionary time. Here, we estimated the evolutionary divergence level of the TF binding motifs, and quantified their occurrences in the DNase I hypersensitive sites. Results from our in silico motif scan and empirical TF-ChIP (chromatin immunoprecipitation) demonstrate that the divergent motifs tend to be introduced at the borders of the cis-regulatory regions, that are likely accompanied with the expansion through evolutionary time. Accordingly, we propose that an expansion by incorporating divergent motifs within the cis-regulatory regions provides a rationale for the evolutionary divergence of regulatory circuits.

Genetics

Philosophy

0

Paper

Save

scDrug+: predicting drug-responses using single-cell transcriptomics and molecular structure

Yih-Yun Sun et al.Aug 1, 2024

Predicting drug responses based on individual transcriptomic profiles holds promise for refining prognosis and advancing precision medicine. Although many studies have endeavored to predict the responses of known drugs to novel transcriptomic profiles, research into predicting responses for newly discovered drugs remains sparse. In this study, we introduce scDrug+, a comprehensive pipeline that seamlessly integrates single-cell analysis with drug-response prediction. Importantly, scDrug+ is equipped to predict the response of new drugs by analyzing their molecular structures. The open-source tool is available as a Docker container, ensuring ease of deployment and reproducibility. It can be accessed at https://github.com/ailabstw/scDrugplus.

Genetics

Pharmacology

0

Paper

Save

Somatic mutation detection workflow validity distinctly influences clinical decision

Pei-Miao Chien et al.Oct 30, 2023

Abstract Identifying somatic mutations from tumor tissues holds substantial clinical consequences for making informed medical decisions. Evaluating the accuracy and robustness of somatic mutation analysis workflows has become essential when employing whole exome sequencing (WES) analysis in clinical settings. In the study, we utilized a set of tumor WES data the Sequencing and Quality Control Phase 2 (SEQC2) project to systematically benchmark the workflow analytical validity, including various combinations of read aligners and mutation callers. The read aligners included BWA; Bowtie2; built-in DRAGEN-Aligner; DRAGMAP; and HISAT2 as well as the callers Mutect2; TNscope; built-in DRAGEN-Caller; and DeepVariant. Among all combinations, DRAGEN showed the best performance with mean F1-score of 0.9659 in SNV detection, while the combination of BWA and Mutect2 showed the second highest mean F1-score of 0.9485. Notably, our results suggested that the mutation callers had a significantly higher impact on the overall sensitivity than the aligners. For drug-related biomarkers, Sentieon TNscope tended to underestimate tumor mutation burden and missed many drug-resistance mutations such as FLT3(c.G1879A:p.A627T) and MAP2K1(c.G199A:p.D67N). Our investigation provides a valuable guide for cancer genomic researchers on tumor mutation identification, accomplished through an in-depth performance comparison among diverse tool combinations.

Genetics

Molecular Biology

0

Paper

Save

PGSbuilder: An end-to-end platform for human genome association analysis and polygenic risk score predictions

Ko-Han Lee et al.Apr 13, 2023

Abstract Understanding the genetic basis of human complex diseases is increasingly important in the development of precision medicine. Over the last decade, genome-wide association studies (GWAS) have become a key technique for detecting associations between common diseases and single nucleotide polymorphisms (SNPs) present in a cohort of individuals. Alternatively, the polygenic risk score (PRS), which often applies results from GWAS summary statistics, is calculated for the estimation of genetic propensity to a trait at the individual level. Despite many GWAS and PRS tools being available to analyze a large volume of genotype data, most clinicians and medical researchers are often not familiar with the bioinformatics tools and lack access to a high-performance computing cluster resource. To fill this gap, we provide a publicly available web server, PGSbuilder, for the GWAS and PRS analysis of human genomes with variant annotations. The user-friendly and intuitive PGSbuilder web server is developed to facilitate the discovery of the genetic variants associated with complex traits and diseases for medical professionals with limited computational skills. For GWAS analysis, PGSbuilder provides the most renowned analysis tool PLINK 2.0 package. For PRS, PGSbuilder provides six different PRS methods including Clumping and Thresholding, Lassosum, LDPred2, GenEpi, PRS-CS, and PRSice2. Furthermore, PGSbuilder provides an intuitive user interface to examine the annotated functional effects of variants from known biomedical databases and relevant literature using advanced natural language processing approaches. In conclusion, PGSbuilder offers a reliable platform to aid researchers in advancing the public perception of genomic risk and precision medicine for human disease genetics. PGSbuilder is freely accessible at http://pgsb.tw23.org .

Genetics

Molecular Biology

1

Paper

Save

Identification and comparative analysis of long non-coding RNAs in the brain of fire ant queens in two different reproductive states

Cheng‐Hung Tsai et al.Aug 18, 2021

Abstract Background Many long non-coding RNAs (lncRNAs) have been extensively identified in many higher eukaryotic species. The function of lncRNAs has been reported to play important roles in diverse biological processes, including developmental regulation and behavioral plasticity. However, there are no reports of systematic characterization of long non-coding RNAs in the fire ant Solenopsis invicta . Results In this study, we performed a genome-wide analysis of lncRNAs in the brains of S. invicta from RNA-seq. In total, 1,393 novel lncRNA transcripts were identified in the fire ant. In contrast to the annotated lncRNA transcripts having at least two exons, novel lncRNAs are monoexonic transcripts with a shorter length. Besides, the transcriptome from virgin alate and dealate mated queens were analyzed and compared. The results showed 295 differentially expressed mRNA genes (DEGs) and 65 differentially expressed lncRNA genes (DELs) between virgin and mated queens, of which 17 lncRNAs were highly expressed in the virgin alates and 47 lncRNAs were highly expressed in the mated dealates. By identifying the DEL:DEG pairs with high association in their expression (Spearman’s | rho | > 0.8 and p -value < 0.01), many DELs were co-regulated with DEGs after mating. Furthermore, several remarkable lncRNAs ( MSTRG . 6523, MSTRG . 588 , and nc909 ) that were found to associate with particular coding genes may play important roles in the regulation of brain gene expression in reproductive transition in fire ants. Conclusion This study provides the first genome-wide identification of S. invicta lncRNAs in the brains in different reproductive states and will contribute to a fuller understanding of the transcriptional regulation underpinning reproductive changes.

Genetics

Endocrinology

1

Paper

Save

Evaluating the analytical validity of mutation calling pipeline for tumor whole exome sequencing

Chin-Yi Cheng et al.Nov 18, 2022

Abstract Detecting somatic mutations from the patients’ tumor tissues has the clinical impacts in medical decision making. Library preparation methods, sequencing platforms, read alignment tools and variant calling algorithms are the major factors to influence the data analysis results. Understanding the performance of the tool combinations of the somatic variant calling pipelines has become an important issue in the use of the whole exome sequences (WES) analysis in clinical actions. In this study, we selected four state-of-the-art sequence aligners including BWA, Bowtie2, DRAGMAP, DRAGEN aligner (DragenA) and HISAT2. For the variant callers, we chose GATK Mutect2, Sentieon TNscope, DRAGEN caller (DragenC) and DeepVariant. The benchmarking tumor whole exome sequencing data released from the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium was applied as the true positive variants to evaluate the overall performance. Multiple combinations of the aligners and variant callers were used to assess the variation detection capability. We measured the recall, precision and F1-score for each combination in both single nucleotide variants (SNVs) and short insertions and deletions (InDels) variant detections. We also evaluated their performances in different variant allele frequencies (VAFs) and the base pair length. The results showed that the top recall, precision and F1-score in the SNVs detection were generated by the combinations of BWA+DragenC(0.9629), Bowtie2+TNscope(0.9957) and DRAGMAP+DragenC(0.9646), respectively. In the InDels detection, BWA+DragenC(0.9546), Hisat2+TNscope(0.7519) and DragenA+DragenC(0.8081) outperformed the other combinations in the recall, precision and F1-Score, respectively. In addition, we found that the variant callers could bias the variant calling results. Finally, although some combinations yielded high accuracies of variant detection, but some variants still could not be detected by these outperformed combinations. The results of this study provided the vital information that no single combination could achieve superior results in detecting all the variants of the benchmarking dataset. In conclusion, applying both merged-based and ensemble-based variants detection approaches is encouraged to further detect variants comprehensively.

Genetics

Artificial Intelligence

3

Paper

Genetics

Artificial Intelligence

0

Save