ResearchHub | Open Science Community

Mapping the Genetic Architecture of Gene Expression in Human Liver

Eric Schadt et al.Apr 29, 2008

Genetic variants that are associated with common human diseases do not lead directly to disease, but instead act on intermediate, molecular phenotypes that in turn induce changes in higher-order disease traits. Therefore, identifying the molecular phenotypes that vary in response to changes in DNA and that also associate with changes in disease traits has the potential to provide the functional information required to not only identify and validate the susceptibility genes that are directly affected by changes in DNA, but also to understand the molecular networks in which such genes operate and how changes in these networks lead to changes in disease traits. Toward that end, we profiled more than 39,000 transcripts and we genotyped 782,476 unique single nucleotide polymorphisms (SNPs) in more than 400 human liver samples to characterize the genetic architecture of gene expression in the human liver, a metabolically active tissue that is important in a number of common human diseases, including obesity, diabetes, and atherosclerosis. This genome-wide association study of gene expression resulted in the detection of more than 6,000 associations between SNP genotypes and liver gene expression traits, where many of the corresponding genes identified have already been implicated in a number of human diseases. The utility of these data for elucidating the causes of common human diseases is demonstrated by integrating them with genotypic and expression data from other human and mouse populations. This provides much-needed functional support for the candidate susceptibility genes being identified at a growing number of genetic loci that have been identified as key drivers of disease from genome-wide association studies of disease. By using an integrative genomics approach, we highlight how the gene RPS26 and not ERBB3 is supported by our data as the most likely susceptibility gene for a novel type 1 diabetes locus recently identified in a large-scale, genome-wide association study. We also identify SORT1 and CELSR2 as candidate susceptibility genes for a locus recently associated with coronary artery disease and plasma low-density lipoprotein cholesterol levels in the process.

Genetics

Epidemiology

0

Paper

Save

Limits and potential of combined folding and docking using PconsDock

Gabriele Pozzati et al.Jun 7, 2021

Abstract In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSA). In CASP14, the best groups predicted the structure of most proteins with impressive accuracy. The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most of the earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This paper introduces a fold-and-dock method, PconsDock, based on predicted residue-residue distances with trRosetta. PconsDock can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein-protein docking yielded limited success. However, using alternative methods for MSA generating allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. However, no a priori structural information for the individual proteins is needed. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. PconsDocck contributed to the best model for one of the CASP14 oligomeric targets, H1065.

Philosophy

Artificial Intelligence

6

Paper

Save

Coding and regulatory variants affect serum protein levels and common disease

Valur Emilsson et al.May 8, 2020

+15

A

V

Abstract Circulating proteins are prognostic for human outcomes including cancer, heart failure, brain trauma and brain amyloid plaque burden. A deep serum proteome survey recently revealed close associations of serum protein networks and common diseases. The present study reveals unprecedented number of individual serum proteins that overlap genetic signatures of diseases emanating from different tissues of the body. Here, 54,469 low-frequency and common exome-array variants were compared with 4782 protein measurements in the serum of 5343 individuals of the deeply annotated AGES Reykjavik cohort. Using a study-wide significant threshold, 2019 independent exome array variants affecting levels of 2135 serum proteins were identified. These variants overlapped genetic loci for hundreds of complex disease traits, emphasizing the emerging role for serum proteins as biomarkers of and potential causative agents of multiple diseases.

Genetics

Molecular Biology

1

Paper

Save

pyconsFold: A fast and easy tool for modelling and docking using distance predictions

John Lamb et al.Feb 9, 2021

A

J

Abstract Motivation Contact predictions within a protein has recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted inter-protein distances has also been shown to be able to dock some protein dimers. Results Here we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction based modelling on our dataset of 210 proteins. It performs marginally worse than the state of the art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted inter-protein contacts to simultaneously fold and dock two protein chains. Availability and implementation pyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 license. Contact arne@bioinfo.se Supplemental material Install instructions, examples and parameters can be found in the supplemental notes. Availability of data The data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold .

Biochemistry

Molecular Biology

3

Paper

Save

The evolutionary history of topological variations in the CPA/AT superfamily

G. Sudha et al.Dec 15, 2020

CPA/AT transporters consist of two structurally and evolutionarily related inverted repeat units, each of them with one core and one scaffold subdomain. During evolution, these families have undergone substantial changes in structure, topology and function. Central to the function of the transporters is the existence of two non-canonical helices that are involved in the transport process. In different families, two different types of these helices have been identified, reentrant and broken. Here, we use an integrated topology annotation method to identify novel topologies in the families. It combines topology prediction, similarity to families with known structure, and the difference in positively charged residues present in inside and outside loops in alternative topological models. We identified families with diverse topologies containing broken or reentrant helix. We classified all families based on 3 distinct evolutionary groups that each share a structurally similar C-terminal repeat unit newly termed as Fold-types. Using the evolutionary relationship between families we propose topological transitions including, a transition between broken and reentrant helices, complete change of orientation, changes in the number of scaffold helices and even in some rare cases, losses of core helices. The evolutionary history of the repeat units shows gene duplication and repeat shuffling events to result in these extensive topology variations. The novel structure-based classification, together with supporting structural models and other information, is presented in a searchable database, CPAfold (cpafold.bioinfo.se). Our comprehensive study of topology variations within the CPA superfamily provides better insight about their structure and evolution.

Genetics

Artificial Intelligence

5

Paper

Save

A genome-wide association study of serum proteins reveals shared loci with common diseases

Alexander Guðjónsson et al.Jul 4, 2021

Abstract With the growing number of genetic association studies, the genotype-phenotype atlas has become increasingly more complex, yet the functional consequences of most disease associated alleles is not understood. The measurement of protein level variation in solid tissues and biofluids integrated with genetic variants offers a path to deeper functional insights. Here we present a large-scale proteogenomic study in 5,368 individuals, revealing 4,113 independent associations between genetic variants and 2,099 serum proteins, of which 37% are previously unreported. The majority of both cis - and trans -acting genetic signals are unique for a single protein, although our results also highlight numerous highly pleiotropic genetic effects on protein levels and demonstrate that a protein’s genetic association profile reflects certain characteristics of the protein, including its location in protein networks, tissue specificity and intolerance to loss of function mutations. Integrating protein measurements with deep phenotyping of the cohort, we observe substantial enrichment of phenotype associations for serum proteins regulated by established GWAS loci, and offer new insights into the interplay between genetics, serum protein levels and complex disease.

Genetics

Molecular Biology

1

Paper

Save

Intra-helical salt bridge contribution to membrane protein insertion

Gerard Duart et al.Feb 25, 2021

+2

J

G

ABSTRACT Salt bridges between negatively (D, E) and positively charged (K, R, H) amino acids play an important role in protein stabilization. This has a more prevalent effect in membrane proteins where polar amino acids are exposed to a very hydrophobic environment. In transmembrane (TM) helices the presence of charged residues can hinder the insertion of the helices into the membrane. This can sometimes be avoided by TM region rearrangements after insertion, but it is also possible that the formation of salt bridges could decrease the cost of membrane integration. However, the presence of intra-helical salt bridges in TM domains and their effect on insertion has not been properly studied yet. In this work, we use an analytical pipeline to study the prevalence of charged pairs of amino acid residues in TM α-helices, which shows that potentially salt-bridge forming pairs are statistically over-represented. We then selected some candidates to experimentally determine the contribution of these electrostatic interactions to the translocon-assisted membrane insertion process. Using both in vitro and in vivo systems, we confirm the presence of intra-helical salt bridges in TM segments during biogenesis and determined that they contribute between 0.5-0.7 kcal/mol to the apparent free energy of membrane insertion (ΔG app ). Our observations suggest that salt bridge interactions can be stabilized during translocon-mediated insertion and thus could be relevant to consider for the future development of membrane protein prediction software.

Ecology

Biochemistry

1

Paper

Save

Deep serum proteomics reveal biomarkers and causal candidates for type 2 diabetes

Valborg Guðmundsdóttir et al.May 10, 2019

The prevalence of type 2 diabetes mellitus (T2DM) is expected to increase rapidly in the next decades, posing a major challenge to societies worldwide. The emerging era of precision medicine calls for the discovery of biomarkers of clinical value for prediction of disease onset, where causal biomarkers can furthermore provide actionable targets. Blood-based factors like serum proteins are in contact with every organ in the body to mediate global homeostasis and may thus directly regulate complex processes such as aging and the development of common chronic diseases. We applied a data-driven proteomics approach measuring serum levels of 4,137 proteins in 5,438 Icelanders to discover novel biomarkers for incident T2DM and describe the serum protein profile of prevalent T2DM. We identified 536 proteins associated with incident or prevalent T2DM. Through LASSO penalized logistic regression analysis combined with bootstrap resampling, a panel of 20 protein biomarkers that accurately predicted incident T2DM was identified with a significant incremental improvement over traditional risk factors. Finally, a Mendelian randomization analysis provided support for a causal role of 48 proteins in the development of T2DM, which could be of particular interest as novel therapeutic targets.

Genetics

Molecular Biology

0

Paper

Save

Induction of muscle stem cell quiescence by the secreted niche factor Oncostatin M

Srinath Sampath et al.Apr 18, 2018

+7

A

S

The balance between stem cell quiescence and proliferation in skeletal muscle is tightly controlled, but perturbed in a variety of disease states. Despite progress in identifying activators of stem cell proliferation, the niche factor(s) responsible for quiescence induction remain unclear. Here we report an in vivo imaging-based screen which identifies Oncostatin M (OSM), a member of the interleukin-6 family of cytokines, as a potent inducer of muscle stem cell (MuSC, satellite cell) quiescence. OSM is produced by muscle fibers, induces reversible MuSC cell cycle exit , and maintains stem cell regenerative capacity as judged by serial transplantation. Conditional OSM receptor deletion in satellite cells leads to stem cell depletion and impaired regeneration following injury. These results identify Oncostatin M as a secreted niche factor responsible for quiescence induction, and for the first time establish a direct connection between induction of quiescence, stemness, and transplantation potential in solid organ stem cells.

Genetics

Immunology

0

Paper

Save

Identification of a novel fold type in CPA/AT transporters by ab-initio structure prediction

Claudio Bassot et al.Nov 5, 2020

Abstract Members of the CPA/AT transporter superfamily show significant structural variability. All previously known members consist of an inverted duplicated repeat unit that folds into two separate domains, the core and the scaffold domain. Crucial for its transporting function, the central helix in the core domain is a noncanonical transmembrane helix, which can either be in the form of a broken helix or a reentrant helix. Here, we expand the structural knowledge of the CPA/AT family by using contact-prediction-based protein modelling. We show that the N-terminal domains of the Pfam families; PSE (Cons_hypoth698 PF03601), Lysine exporter (PF03956) and LrgB (PF04172) families have a previously unseen reentrant-helix-reentrant fold. The close homology between PSE and the Sodium-citrate symporter (2HCT) suggests that the new fold originates from the truncation of an ancestral reentrant protein, caused by the loss of the C-terminal reentrant helix. To compensate for the lost reentrant helix one external loop moves into the membrane to form the second reentrant helix, highlighting the adaptability of the CPA/AT transporters. This study also demonstrates that the most recent deep-learning-based modelling methods have become a useful tool to gain biologically relevant structural, evolutionary and functional insights about protein families.

Genetics

Ecology

0

Paper

Genetics

Ecology

0

Save