ResearchHub | Open Science Community

A new long-read dog assembly uncovers thousands of exons and functional elements missing in the previous reference

Chao Wang et al.Jul 2, 2020

Abstract Here we present a new high-quality canine reference genome with gap number reduced 41-fold, from 23,836 to 585. Analysis of existing and novel data, RNA-seq, miRNA-seq and ATAC-seq, revealed a large proportion of these harboured previously hidden elements, including genes, promoters and miRNAs. Short-read dark regions were detected, and genomic regions completed, including the DLA, TCR and 366 cancer genes. 10x sequencing of 27 dogs uncovered a total of 22.1 million SNPs, Indels and larger structural variants (SVs). 1.4% overlap with protein coding genes and could provide a source of normal or aberrant phenotypic modifications.

Genetics

Microbiology

32

Paper

Save

Using evolutionary constraint to define novel candidate driver genes in medulloblastoma

Ananya Roy et al.Nov 3, 2022

Abstract Current knowledge of cancer genomics is biased against non-coding mutations. Here, we use whole genome sequencing data from pediatric brain tumors, combined with evolutionary constraint inferred from 240 mammals to identify genes enriched in non-coding constraint mutations (NCCMs). We compare medulloblastoma (MB, malignant) to pilocytic astrocytoma (PA, benign) and find drastically different NCCM frequencies between the two. In PA, a high NCCM frequency only affects the BRAF locus, while in MB, >500 genes have high levels of NCCMs. Intriguingly, many genes are associated with different age of onset, such as HOXB1 in young patients and NUAK1 in adult patients. Our analysis points to different molecular pathways in different patient groups. These novel candidate driver genes may assist patient stratification in MB and may be useful for treatment options. One-Sentence Summary Non-coding constraint mutations implicate novel candidate genes to stratify medulloblastoma by age and subgroups.

Genetics

Molecular Biology

4

Paper

Save

SweHLA: the high confidence HLA typing bio-resource drawn from 1 000 Swedish genomes

Jessika Nordin et al.Jun 4, 2019

There is a need to accurately call human leukocyte antigen (HLA) genes from existing short-read sequencing data, however there is no single solution that matches the gold standard of lab typing. Here we aimed to combine results from available software, minimising the biases of applied algorithm and HLA reference. The result is a robust HLA population resource for the published 1 000 Swedish genomes, and a framework for future HLA interrogation. HLA 2-field alleles were called using four imputation and inference methods for the classical eight genes (class I: HLA-A, -B, -C; class II: HLA-DPA1, -DPB1, -DQA1, -DQB1, -DRB1). A high confidence population set (SweHLA) was determined using an n-1 concordance rule for class I (four software) and class II (three software) alleles. Results were compared across populations and individual programs benchmarked to SweHLA. Per allele, 875 to 988 of the 1 000 samples were genotyped in SweHLA; 920 samples had at least seven loci. While a small fraction of reference alleles were common to all software (class I=1.9% and class II=4.1%), this did not affect the overall call rate. Gene-level concordance was high compared to European populations (>0.83%), with COX and PGF the dominant SweHLA haplotypes. We noted that 15/18 discordant alleles (delta allele frequency > 2) were previously reported as disease-associated. These differences could in part explain across-study genetic replication failures, reinforcing the need to use multiple software. SweHLA demonstrates a way to use existing NGS data to generate a population resource agnostic to individual HLA software biases.

Genetics

Immunology

0

Paper

Save

Leveraging Base Pair Mammalian Constraint to Understand Genetic Variation and Human Disease

Patrick Sullivan et al.Mar 10, 2023

Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.

Genetics

Molecular Biology

1

Paper

Genetics

Molecular Biology

0

Save