ResearchHub | Open Science Community

Parkinson's disease age at onset genome‐wide association study: Defining heritability, genetic loci, and α‐synuclein mechanisms

Cornelis Blauwendraat et al.Apr 7, 2019

Abstract Background Increasing evidence supports an extensive and complex genetic contribution to PD. Previous genome‐wide association studies (GWAS) have shed light on the genetic basis of risk for this disease. However, the genetic determinants of PD age at onset are largely unknown. Objectives To identify the genetic determinants of PD age at onset. Methods Using genetic data of 28,568 PD cases, we performed a genome‐wide association study based on PD age at onset. Results We estimated that the heritability of PD age at onset attributed to common genetic variation was ∼0.11, lower than the overall heritability of risk for PD (∼0.27), likely, in part, because of the subjective nature of this measure. We found two genome‐wide significant association signals, one at SNCA and the other a protein‐coding variant in TMEM175 , both of which are known PD risk loci and a Bonferroni‐corrected significant effect at other known PD risk loci, GBA , INPP5F/BAG3, FAM47E/SCARB2 , and MCCC1 . Notably, SNCA, TMEM175, SCARB2, BAG3 , and GBA have all been shown to be implicated in α‐synuclein aggregation pathways. Remarkably, other well‐established PD risk loci, such as GCH1 and MAPT , did not show a significant effect on age at onset of PD. Conclusions Overall, we have performed the largest age at onset of PD genome‐wide association studies to date, and our results show that not all PD risk loci influence age at onset with significant differences between risk alleles for age at onset. This provides a compelling picture, both within the context of functional characterization of disease‐linked genetic variability and in defining differences between risk alleles for age at onset, or frank risk for disease. © 2019 International Parkinson and Movement Disorder Society

Genetics

Paleontology

0

Paper

Save

Virus exposure and neurodegenerative disease risk across national biobanks

Kristin Levine et al.Apr 1, 2023

With recent findings connecting the Epstein-Barr virus to an increased risk of multiple sclerosis and growing concerns regarding the neurological impact of the coronavirus pandemic, we examined potential links between viral exposures and neurodegenerative disease risk. Using time series data from FinnGen for discovery and cross-sectional data from the UK Biobank for replication, we identified 45 viral exposures significantly associated with increased risk of neurodegenerative disease and replicated 22 of these associations. The largest effect association was between viral encephalitis exposure and Alzheimer's disease. Influenza with pneumonia was significantly associated with five of the six neurodegenerative diseases studied. We also replicated the Epstein-Barr/multiple sclerosis association. Some of these exposures were associated with an increased risk of neurodegeneration up to 15 years after infection. As vaccines are currently available for some of the associated viruses, vaccination may be a way to reduce some risk of neurodegenerative disease.

Immunology

Molecular Biology

4

Paper

Save

A Saturated Map of Common Genetic Variants Associated with Human Height from 5.4 Million Individuals of Diverse Ancestries

Loïc Yengo et al.Jan 10, 2022

ABSTRACT Common SNPs are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes. Here we show, using GWAS data from 5.4 million individuals of diverse ancestries, that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a median size of ~90 kb, covering ~21% of the genome. The density of independent associations varies across the genome and the regions of elevated density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs account for 40% of phenotypic variance in European ancestry populations but only ~10%-20% in other ancestries. Effect sizes, associated regions, and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely explained by linkage disequilibrium and allele frequency differences within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than needed to implicate causal genes and variants. Overall, this study, the largest GWAS to date, provides an unprecedented saturated map of specific genomic regions containing the vast majority of common height-associated variants.

Genetics

Biology

3

Paper

Save

Multi-Modality Machine Learning Predicting Parkinson’s Disease

Mary Makarious et al.Mar 7, 2021

SUMMARY Background Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multi-modal data is key moving forward. We build upon previous work to deliver multi-modal predictions of Parkinson’s Disease (PD). Methods We performed automated ML on multi-modal data from the Parkinson’s Progression Marker Initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Finally, networks were built to identify gene communities specific to PD. Findings Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification, increased the diagnosis prediction accuracy (balanced accuracy) and other metrics. Combining data modalities outperforms the single biomarker paradigm. UPSIT was the largest contributing predictor for the classification of PD. The transcriptomic data was used to construct a network of disease-relevant transcripts. Interpretation We have built a model using an automated ML pipeline to make improved multi-omic predictions of PD. The model developed improves disease risk prediction, a critical step for better assessment of PD risk. We constructed gene expression networks for the next generation of genomics-derived interventions. Our automated ML approach allows complex predictive models to be reproducible and accessible to the community. Funding National Institute on Aging, National Institute of Neurological Disorders and Stroke, the Michael J. Fox Foundation, and the Global Parkinson’s Genetics Program. RESEARCH IN CONTEXT Evidence before this study Prior research into predictors of Parkinson’s disease (PD) has either used basic statistical methods to make predictions across data modalities, or they have focused on a single data type or biomarker model. We have done this using an open-source automated machine learning (ML) framework on extensive multi-modal data, which we believe yields robust and reproducible results. We consider this the first true multi-modality ML study of PD risk classification. Added value of this study We used a variety of linear, non-linear, kernel, neural networks, and ensemble ML algorithms to generate an accurate classification of both cases and controls in independent datasets using data that is not involved in PD diagnosis itself at study recruitment. The model built in this paper significantly improves upon our previous models that used the entire training dataset in previous work 1 . Building on this earlier work, we showed that the PD diagnosis can be refined using improved algorithmic classification tools that may yield potential biological insights. We have taken careful consideration to develop and validate this model using public controlled-access datasets and an open-source ML framework to allow for reproducible and transparent results. Implications of all available evidence Training, validating, and tuning a diagnostic algorithm for PD will allow us to augment clinical diagnoses or risk assessments with less need for complex and expensive exams. Going forward, these models can be built on remote or asynchronously collected data which may be important in a growing telemedicine paradigm. More refined diagnostics will also increase clinical trial efficiency by potentially refining phenotyping and predicting onset, allowing providers to identify potential cases earlier. Early detection could lead to improved treatment response and higher efficacy. Finally, as part of our workflow, we built new networks representing communities of genes correlated in PD cases in a hypothesis-free manner, showing how new and existing genes may be connected and highlighting therapeutic opportunities.

Artificial Intelligence

Biochemistry

17

Paper

Artificial Intelligence

6

0

Save

0

Genome-wide association study of Parkinson’s disease progression biomarkers in 12 longitudinal patients’ cohorts

Hirotaka Iwaki et al.Mar 25, 2019

Abstract Background Several reports have identified different patterns of Parkinson’s disease progression in individuals carrying missense variants in the GBA or LRRK2 genes. The overall contribution of genetic factors to the severity and progression of Parkinson’s disease, however, has not been well studied. Objectives To test the association between genetic variants and the clinical features and progression of Parkinson’s disease on a genome-wide scale. Methods We accumulated individual data from 12 longitudinal cohorts in a total of 4,093 patients with 25,254 observations over a median of 3.81 years. Genome-wide associations were evaluated for 25 cross-sectional and longitudinal phenotypes. Specific variants of interest, including 90 recently-identified disease risk variants, were also investigated for the associations with these phenotypes. Results Two variants were genome-wide significant. Rs382940(T>A), within the intron of SLC44A1 , was associated with reaching Hoehn and Yahr stage 3 or higher faster (HR 2.04 [1.58, 2.62], P-value = 3.46E-8). Rs61863020(G>A), an intergenic variant and eQTL for ADRA2A , was associated with a lower prevalence of insomnia at baseline (OR 0.63 [0,52, 0.75], P-value = 4.74E-8). In the targeted analysis, we found nine associations between known Parkinson’s risk variants and more severe motor/cognitive symptoms. Also, we replicated previous reports of GBA coding variants (rs2230288: p.E365K, rs75548401: p.T408M) being associated with greater motor and cognitive decline over time, and APOE E4 tagging variant (rs429358) being associated with greater cognitive deficits in patients. Conclusions We identified novel genetic factors associated with heterogeneity of progression in Parkinson’s disease. The results provide new insights into the pathogenesis of Parkinson’s disease as well as patient stratification for clinical trials.

Genetics

Oncology

0

Paper

Save

GenoTools: An Open-Source Python Package for Efficient Genotype Data Quality Control and Analysis

Dan Vitale et al.Mar 29, 2024

GenoTools, a Python package, streamlines population genetics research by integrating ancestry estimation, quality control (QC), and genome-wide association studies (GWAS) capabilities into efficient pipelines. By tracking samples, variants, and quality-specific measures throughout fully customizable pipelines, users can easily manage genetics data for large and small studies. GenoTools' "Ancestry" module renders highly accurate predictions, allowing for high-quality ancestry-specific studies, and enables custom ancestry model training and serialization, specified to the user's genotyping or sequencing platform. As the genotype processing engine that powers several large initiatives, including the NIH's Center for Alzheimer's and Related Dementias (CARD) and the Global Parkinson's Genetics Program (GP2). GenoTools was used to process and analyze the UK Biobank and major Alzheimer's Disease (AD) and Parkinson's Disease (PD) datasets with over 400,000 genotypes from arrays and 5000 sequences and has led to novel discoveries in diverse populations. It has provided replicable ancestry predictions, implemented rigorous QC, and conducted genetic ancestry-specific GWAS to identify systematic errors or biases through a single command. GenoTools is a customizable tool that enables users to efficiently analyze and scale genotype data with reproducible and scalable ancestry, QC, and GWAS pipelines.

Genetics

Molecular Biology

0

Paper

Save

Identification and prediction of Parkinson’s disease subtypes and progression using machine learning in two cohorts

Anant Dadu et al.Aug 6, 2022

Abstract Background The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. The emergence of machine learning to detect hidden patterns in complex, multi-dimensional datasets provides unparalleled opportunities to address this critical need. Methods and Findings We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson’s Disease Progression Marker Initiative (PPMI) (n = 294 cases) to identify patient subtypes and to predict disease progression. The resulting models were validated in an independent, clinically well-characterized cohort from the Parkinson’s Disease Biomarker Program (PDBP) (n = 263 cases). Our analysis distinguished three distinct disease subtypes with highly predictable progression rates, corresponding to slow, moderate, and fast disease progression. We achieved highly accurate projections of disease progression five years after initial diagnosis with an average area under the curve (AUC) of 0.92 (95% CI: 0.95 ± 0.01 for the slower progressing group (PDvec1), 0.87 ± 0.03 for moderate progressors, and 0.95 ± 0.02 for the fast progressing group (PDvec3). We identified serum neurofilament light (Nfl) as a significant indicator of fast disease progression among other key biomarkers of interest. We replicated these findings in an independent validation cohort, released the analytical code, and developed models in an open science manner. Conclusions Our data-driven study provides insights to deconstruct PD heterogeneity. This approach could have immediate implications for clinical trials by improving the detection of significant clinical outcomes that might have been masked by cohort heterogeneity. We anticipate that machine learning models will improve patient counseling, clinical trial design, allocation of healthcare resources, and ultimately individualized patient care.

Biochemistry

Oncology

15

Paper

Save

Large-scale pathway-specific polygenic risk, transcriptomic community networks and functional inferences in Parkinson disease

Sara Bandrés‐Ciga et al.May 6, 2020

ABSTRACT Polygenic inheritance plays a central role in Parkinson disease (PD). A priority in elucidating PD etiology lies in defining the biological basis of genetic risk. Unraveling how risk leads to disruption will yield disease-modifying therapeutic targets that may be effective. Here, we utilized a high-throughput and hypothesis-free approach to determine biological pathways underlying PD using the largest currently available cohorts of genetic data and gene expression data from International Parkinson’s Disease Genetics Consortium (IPDGC) and the Accelerating Medicines Partnership - Parkinson’s disease initiative (AMP-PD), among other sources. We placed these insights into a cellular context. We applied large-scale pathway-specific polygenic risk score (PRS) analyses to assess the role of common variation on PD risk in a cohort of 457,110 individuals by focusing on a compilation of 2,199 publicly annotated gene sets representative of curated pathways, of which we nominate 46 pathways associated with PD risk. We assessed the impact of rare variation on PD risk in an independent cohort of whole-genome sequencing data, including 4,331 individuals. We explored enrichment linked to expression cell specificity patterns using single-cell gene expression data and demonstrated a significant risk pattern for adult dopaminergic neurons, serotonergic neurons, and radial glia. Subsequently, we created a novel way of building de novo pathways by constructing a network expression community map using transcriptomic data derived from the blood of 1,612 PD patients, which revealed 54 connecting networks associated with PD. Our analyses highlight several promising pathways and genes for functional prioritization and provide a cellular context in which such work should be done.

Genetics

Paleontology

31

Paper

Save

Genetic modifiers of risk and age at onset in GBA associated Parkinson’s disease and Lewy body dementia

Cornelis Blauwendraat et al.Aug 18, 2019

Abstract Parkinson’s disease (PD) is a genetically complex disorder. Multiple genes have been shown to contribute to the risk of PD, and currently 90 independent risk variants have been identified by genome-wide association studies. Thus far, a number of genes (including SNCA , LRRK2 , and GBA ) have been shown to contain variability across a spectrum of frequency and effect, from rare, highly penetrant variants to common risk alleles with small effect sizes. Variants in GBA , encoding the enzyme glucocerebrosidase, are associated with Lewy body diseases such as PD and Lewy body dementia (LBD). These variants, which reduce or abolish enzymatic activity, confer a spectrum of disease risk, from 1.4- to >10-fold. An outstanding question in the field is what other genetic factors that influence GBA -associated risk for disease, and whether these overlap with known PD risk variants. Using multiple, large case-control datasets, totalling 217,165 individuals (22,757 PD cases, 13,431 PD proxy cases, 622 LBD cases and 180,355 controls), we identified 1,772 PD cases, 711 proxy cases and 7,624 controls with a GBA variant (p.E326K, p.T369M or p.N370S). We performed a genome-wide association study and analysed the most recent PD-associated genetic risk score to detect genetic influences on GBA risk and age at onset. We attempted to replicate our findings in two independent datasets, including the personal genetics company 23andMe, Inc. and whole-genome sequencing data. Our analysis showed that the overall PD genetic risk score modifies risk for disease and decreases age at onset in carriers of GBA variants. Notably, this effect was consistent across all tested GBA risk variants. Dissecting this signal demonstrated that variants in close proximity to SNCA and CTSB (encoding cathepsin B) are the most significant contributors. Risk variants in the CTSB locus were identified to decrease mRNA expression of CTSB . Additional analyses suggest a possible genetic interaction between GBA and CTSB and GBA p.N370S neurons were shown to have decreased Cathepsin B expression compared to controls. These data provide a genetic basis for modification of GBA -associated PD risk and age at onset and demonstrate that variability at genes implicated in lysosomal function exerts the largest effect on GBA associated risk for disease. Further, these results have important implications for selection of GBA carriers for therapeutic interventions.

Genetics

Internal Medicine

0

Paper

Save

A multi-layer functional genomic analysis to understand noncoding genetic variation in lipids

Shweta Ramdas et al.Dec 8, 2021

Abstract A major challenge of genome-wide association studies (GWAS) is to translate phenotypic associations into biological insights. Here, we integrate a large GWAS on blood lipids involving 1.6 million individuals from five ancestries with a wide array of functional genomic datasets to discover regulatory mechanisms underlying lipid associations. We first prioritize lipid-associated genes with expression quantitative trait locus (eQTL) colocalizations, and then add chromatin interaction data to narrow the search for functional genes. Polygenic enrichment analysis across 697 annotations from a host of tissues and cell types confirms the central role of the liver in lipid levels, and highlights the selective enrichment of adipose-specific chromatin marks in high-density lipoprotein cholesterol and triglycerides. Overlapping transcription factor (TF) binding sites with lipid-associated loci identifies TFs relevant in lipid biology. In addition, we present an integrative framework to prioritize causal variants at GWAS loci, producing a comprehensive list of candidate causal genes and variants with multiple layers of functional evidence. Two prioritized genes, CREBRF and RRBP1 , show convergent evidence across functional datasets supporting their roles in lipid biology.

Genetics

Molecular Biology

57

Paper

Genetics

1

0

Save