A new version of ResearchHub is available.Try it now
Post
Document
Flag content
26

Cross-Disease Modeling Approach to Identify Biological Pathways Shared Between Alzheimer’s Disease and Type 2 Diabetes

Published
Apr 1, 2024
Save
Document
Flag content
26
Save
Document
Flag content
29,501 RSC
raised of
16,444 RSC
$0.00
goal
Fundraise Completed
Author Profile Avatar
Author Profile Avatar
Author Profile Avatar
14Supporters

Authors & Affiliations

Brendan K. Ball [1]; Douglas K. Brubaker [2]

[1] Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN
[2] School of Medicine, Case Western Reserve University, Cleveland, OH


Research Updates

Currently, I have made some progress on the project. The progress is briefly shared below:

  • Pre-processed the RNA-seq datasets for both AD and T2D human cohorts
  • Applied of TransComp-R, and the selection of PCs from the LASSO model
  • Acquired Gene Set Enrichment Results for the selected PCs

The next step is to estimate the potential cell types contributing to the RNA-seq data from blood samples using in silico modeling. Further updates will be shared throughout the progress of this study. 

 

Abstract

Alzheimer’s disease, a progressive neurodegenerative disorder characterized by memory loss and behavioral changes, is the seventh leading cause of death in the United States (1). Understanding the development of AD is difficult because there is not a single cause for AD, but a combination of risk factors. Some of these risk factors include age, genetics, family history, and even pre-existing health conditions. One comorbidity of interest is type 2 diabetes, a metabolic disorder associated with dysregulated blood sugar levels. Studies have reported that individuals with T2D face a higher prevalence of AD than the general population (2,3). In fact, individuals living with AD who may have T2D, or impaired glucose levels, may be as high as 81% (4). The connection between AD and T2D has also been demonstrated in both rodent and human studies, with hypotheses of chronic inflammation or decreased brain glucose metabolism as potentially affected pathways (5,6). While several studies report a connection between AD progression with T2D, the biological mechanism by which T2D exacerbates AD pathology is not well-understood. To address this gap, we will incorporate publicly available RNA-sequenced transcriptomics of human T2D and AD studies. In this study, we aim to identify protein interaction networks associated with different thresholds of up or down regulated gene expressions in human data. We will also pair dimensional reduction tools with gene set enrichment analyses to identify biological pathways associated between AD and T2D. The implementation of novel computational tools to link publicly available T2D and AD human studies together poses a promising method to quickly identify disrupted biological pathways. Understanding the shared biological pathways between AD and T2D will inform us the dysregulated pathways and genes during disease development.

Introduction

Motivation

Alzheimer’s disease (AD) is a neurodegenerative disease characterized by progressive memory loss and cognitive impairment, which affects more than 6.5 million people in the United States. The pathogenesis of AD is multifactorial, which include age, genetics, family history, and pre-existing health conditions. Evidence suggests that type 2 diabetes (T2D), a chronic metabolic disease that affects the body’s ability to regulate and process glucose, increases the risk for AD development (4). In fact, a study reported that more than 80% of people who have AD are predicted to have T2D or impaired glucose levels (4). While there are proposed connections between glucose metabolism and brain function, the exact biological mechanisms by which AD pathology is exacerbated by T2D are not well understood (Figure 1)

Figure 1. Alzheimers's disease and type 2 diabetes share similar biological pathways. In both  AD and T2D, there is a higher risk for memory loss, metabolic dysregulation, and even impaired supply of blood to the brain.

Despite the link between AD and T2D, much there are limited studies that look into both AD and T2D at the same time. This is often because studying both diseases at the same time may be difficult (e.g., recruitment of participants who have both AD and T2D, facilities, etc.). To overcome this, we propose to use a computational tool for cross-disease translation that allows us to identify biological signatures in T2D predictive of AD-associated biomarkers in humans. Our main objective is to potentially identify and understand the biological pathways that may be involved when T2D exacerbates AD pathology. 

Hypotheses/Aims

We hypothesize that biological pathways and metabolic processes regulated by T2D modulates AD pathology. To test this question, we propose the following specific aims:

Aim 1. Determine how biological networks in AD-associated pathology are regulated by T2D.

Aim 2. Characterize pathways by which AD disease status is regulated by T2D factors via computational modeling approaches. 

Impact/Significance

Our research proposal is innovative because we consider information from two separate disease scales to test our hypothesis. Here, we use computational approaches to interpret publicly available transcriptomics data from independent AD and T2D human cohort scales to understand the role of T2D exacerbation of AD pathology. The use of computational modeling between human data from separate disease studies will provide insightful information on how T2D modulates the exacerbation of AD development.

Materials/Methods

Study Type

Secondary Analysis: Data from publicly available data repositories are used to conduct further analysis in the form of modeling that may have not been performed in the original experiment.

Existing Data

Registration prior to analysis of the data: As of the date of submission, the data exist and you have accessed it, though no analysis has been conducted related to the research plan (including calculation of summary statistics). A common situation for this scenario when a large dataset exists that is used for many different studies over time, or when a data set is randomly split into a sample for exploratory analyses, and the other section of data is reserved for later confirmatory data analysis.

Explanation of Existing Data

All publicly available data will be acquired from Gene Expression Omnibus (GEO), a data repository site accessible to all individuals interested in accessing data from previously conducted experiments. Since availability of gene expression results from the brain was limited, data derived from the same tissue (blood) was prioritized to minimize inter-tissue variability in human. Using gene expression from blood samples would still provide us valuable insight between AD and T2D.

Table 1. Proposed datasets for secondary analysis from Gene Expression Omnibus

AccessionDiseaseSize (n)Description of Study
GSE184050T2D25Longitudinal blood-derived gene expression of participants who are diagnosed with T2D 
vs healthy control groups.
GSE63060AD249Cohort (Batch 1) study with gene expression data on blood samples from people with 
AD, mild cognitive impairment, or healthy control. Data is from the EU funded 
AddNeuroMed Cohort.
GSE63061AD273Cohort (Batch 2) study with gene expression data on blood samples from people with 
AD, mild cognitive impairment, or healthy control. Data is from the EU funded 
AddNeuroMed Cohort.

Data collection procedures

No data will be collected for this study. All data will be accessed from the GEO data repository. 

Sample Size

There will be three separate data sets that will be analyzed. A sample size of 25, 249, and 273 will be included in the study for the T2D human, AD human cohort 1, and AD human cohort 2 studies, respectively. Each study is separated into disease and control groups. We will note that a limitation to using publicly available data sets is that sample sizes are restricted to what is available.

Variables

Each data set contains the quantified gene expression from blood, as well as demographic information such as sex and age. These information will be incorporated in our model as interaction variables. 

Processing and Analysis Plan

We will first construct physically expressed protein networks and enrichment analysis of AD regulated by T2D on the String Database. Different thresholds of normalized z-scores of publicly available gene expressions of AD and T2D human data sets will be used for analysis. The protein-protein network will identify functional associations of the genes that code for the downstream proteins. Within the network, edges (the connecting lines) represent the confidence of the connections, and nodes (each circle) represents the different proteins coded by the gene expressions. The creation of this network will provide insight on the types of proteins that may interact with each other as a result of gene expression changes, whether they may upregulate or downregulate each other (Figure 2). 

Figure 2. Process in establishing protein-protein interaction networks. (A) Protein-protein networks are the downstream connections from gene expression, with nodes representing the specific protein, and edges signifying the strength of confidence between that connection. (B) An example of an unfiltered protein interaction network that is then filtered down to subnetworks based on z-score values. The smaller subnetworks represents gene expressions that were more differentially expressed than the mean.

After creating physical protein-protein networks, we will then leverage blood-derived transcriptomics data from T2D human data to predict disease status in human AD conditions. We will use computational modeling to understand how T2D can explain known status of human AD conditions. With the cross-disease model, we will identify biological pathways and genetic signature predictive of AD status. The computational methodology, coined as Translatable Components Regression (TransComp-R) was developed by Dr. Douglas Brubaker. TransComp-R allows the combination of two datasets to predict the outcomes of AD in human (7) (Figure 3).

Figure 3. Process to use TransComp-R to translate cross-disease. (A) Shared human AD and T2D gene expression are selected for analysis. AD Human samples are then projected into human T2D PCA space. (B) T2D human principal component translatability is determined by how well the PCs predict AD condition. The PCs are detected from LASSO and logistic regression against AD human outcomes. Loadings that contain genes from the significant PCs from regression analysis are processed through Gene Set Enrichment Analysis (GSEA) to identify biological pathways that may be involved in the shared disease.

Statistical Models

In the TransComp-R model, we will regress the selected PCs against human AD outcomes to model the effects of T2D (logistic regression). Each individual PC, and the entire model will be considered significant with a p-value under 0.05. From the biological interpretation side, pathways revealed from GSEA will be deemed significant with a false discovery rate (FDR) less than 0.20. Using 1000 permutations, a Kolmogorov-Smirnov test will be used to determine FDR values. Analysis from GSEA will incorporate the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway for the gene sets database to maintain consistency on for interpretation of the data. 

Transformations

Before modeling, human datasets will be internally normalized via z-scoring to account for data variability across the different populations and sequencing platform. This approach ensures an evaluation of relative separation of human samples in our computational model, and reduces any bias due to differences in magnitudes across datasets. To ensure reproducibility, our analysis will also acknowledge confounding variables such as sex and age into the computational modeling. Not performing this step will result in skewed results, with bias towards datasets with larger numerical values.

Data Exclusion

Due to the heterogeneity of human data, some data contains information of subjects from younger populations. In attempts to reduce an additional variable of age, which strongly correlates with AD risk, all gene expression data of participants that are younger than age 50 will be filtered out. We acknowledge that late-onset of AD is often defined to be at age 65, however; filtering results at 65 significantly reduces the sample size, which will limit computational and statistical power. Additionally, only data containing either control, T2D, or AD will be selected from the datasets.

Budget

Costs

While there are always unexpected expenses in a research project, some anticipated costs to complete this project are included in Table 2

Table 2. Estimated costs for the duration of the research project.

Item & DescriptionEstimated Cost (USD)
Materials (e.g., External Hard Drive, Data Storage, etc.)$500
Researcher Stipend$1,000
Publication in Open Access Journal$1,500
Research Travel & Conference Fees $2,000
Total Support Requested$5,000

These expenses include essential materials for data computing, a one-time researcher stipend for the entirety of the project, funds to support the submission of the results in an open-access journal, and covered expense to share the results at an academic research conference. The pre-print of the manuscript will be shared on Research Hub.

Data & Code Availability

Upon completion of the project, source code with the link via Github will be provided for open access to encourage further studies in other areas of disease studies. All data that is analyzed in this study is publicly available on Gene Expression Omnibus through accession numbers GSE184050, GSE63060, and GSE63061.

References

  1. Jiaquan Xu, Sherry Murphy, Kenneth Kochanek, Elizabeth Arias. Mortality in the United States, 202. NCHS. 2022;(456).
  2. Arvanitakis Z, Wilson RS, Bienias JL, Evans DA, Bennett DA. Diabetes Mellitus and Risk of Alzheimer Disease and Decline in Cognitive Function. Archives of Neurology. 2004 May 1;61(5):661–6.
  3. Akomolafe A, Beiser A, Meigs JB, Au R, Green RC, Farrer LA, et al. Diabetes Mellitus and Risk of Developing Alzheimer Disease: Results From the Framingham Study. Archives of Neurology. 2006 Nov 1;63(11):1551–5.
  4. Janson J, Laedtke T, Parisi JE, O’Brien P, Petersen RC, Butler PC. Increased Risk of Type 2 Diabetes in Alzheimer Disease. Diabetes. 2004 Feb 1;53(2):474–81.
  5. Okereke OI, Kang JH, Cook NR, Gaziano JM, Manson JE, Buring JE, et al. Type 2 Diabetes Mellitus and Cognitive Decline in Two Large Cohorts of Community-Dwelling Older Adults. Journal of the American Geriatrics Society. 2008;56(6):1028–36.
  6. Hirvonen J, Virtanen KA, Nummenmaa L, Hannukainen JC, Honka MJ, Bucci M, et al. Effects of Insulin on Brain Glucose Metabolism in Impaired Glucose Tolerance. Diabetes. 2011 Jan 21;60(2):443–7.
  7. Brubaker DK, Kumar MP, Chiswick EL, Gregg C, Starchenko A, Vega PN, Southard-Smith AN, Simmons AJ, Scoville EA, Coburn LA, Wilson KT, Lau KS, Lauffenburger DA. An interspecies translation model implicates integrin signaling in infliximab-resistant inflammatory bowel disease. Sci Signal. 2020 Aug 4;13(643):eaay3258. doi: 10.1126/scisignal.aay3258. PMID: 32753478; PMCID: PMC7459361.
100%
Discussion


Start the discussion.
This post has not yet been discussed.