ResearchHub | Open Science Community

The Foundational data initiative for Parkinson’s disease (FOUNDIN-PD): enabling efficient translation from genetic maps to mechanism

Elisângela Bressan et al.Oct 24, 2023

Abstract The FOUNdational Data INitiative for Parkinson’s Disease (FOUNDIN-PD) is an international collaboration producing fundamental resources for Parkinson’s disease (PD). FOUNDIN-PD generated a multi-layered molecular dataset in a cohort of induced pluripotent stem cell (iPSC) lines differentiated to dopaminergic (DA) neurons, a major affected cell type in PD. The lines were derived from the Parkinson’s Progression Markers Initiative study including participants with PD carrying monogenic PD ( SNCA ) variants, variants with intermediate effects and variants identified by genome-wide association studies and unaffected individuals. We generated genetic, epigenetic, regulatory, transcriptomic, and longitudinal cellular imaging data from iPSC-derived DA neurons to understand molecular relationships between disease associated genetic variation and proximate molecular events. These data reveal that iPSC-derived DA neurons provide a valuable cellular context and foundational atlas for modelling PD genetic risk. We have integrated these data into a FOUNDIN-PD data browser ( https://www.foundinpd.org ) as a resource for understanding the molecular pathogenesis of PD.

Parkinson's Disease

Induced Pluripotent Stem Cell

Context (Archaeology)

1

Paper

Save

REViewer: Haplotype-resolved visualization of read alignments in and around tandem repeats

Egor Dolzhenko et al.Oct 24, 2023

Abstract Background Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely-recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads. Results We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in STR research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically-relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR. Conclusions Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions.

Haplotype

Tandem Repeat

Genetics

54

Paper

Save

Genome-wide analysis of Structural Variants in Parkinson’s Disease using Short-Read Sequencing data

Kimberley Billingsley et al.Oct 24, 2023

Abstract Parkinson’s disease is a complex neurodegenerative disorder, affecting approximately one million individuals in the USA alone. A significant proportion of risk for Parkinson’s disease is driven by genetics. Despite this, the majority of the common genetic variation that contributes to disease risk is unknown, in-part because previous genetic studies have focussed solely on the contribution of single nucleotide variants. Structural variants represent a significant source of genetic variation in the human genome. However, because assay of this variability is challenging, structural variants have not been cataloged on a genome-wide scale, and their contribution to the risk of Parkinson’s disease remains unknown. In this study, we 1) leveraged the GATK-SV pipeline to detect and genotype structural variants in 7,772 short-read sequencing data and 2) generated a subset of matched whole-genome Oxford Nanopore Technologies long-read sequencing data from the PPMI cohort to allow for comprehensive structural variant confirmation. We detected, genotyped, and tested 3,154 “high-confidence” common structural variant loci, representing over 412 million nucleotides of non-reference genetic variation. Using the long-read sequencing data, we validated three structural variants that may drive the association signals at known Parkinson’s disease risk loci, including a 2kb intronic deletion within the gene LRRN4 . Further, we confirm that the majority of structural variants in the human genome cannot be detected using short-read sequencing alone, encompassing on average around 4 million nucleotides of inaccessible sequence per genome. Therefore, although these data provide the most comprehensive survey of the contribution of structural variants to the genetic risk of Parkinson’s disease to date, this study highlights the need for large-scale long-read datasets to fully elucidate the role of structural variants in Parkinson’s disease.

Structural Variation

Genetics

Biology

25

Paper

Save

Identification and prediction of Parkinson’s disease subtypes and progression using machine learning in two cohorts

Anant Dadu et al.Oct 24, 2023

Abstract Background The clinical manifestations of Parkinson’s disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. The emergence of machine learning to detect hidden patterns in complex, multi-dimensional datasets provides unparalleled opportunities to address this critical need. Methods and Findings We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson’s Disease Progression Marker Initiative (PPMI) (n = 294 cases) to identify patient subtypes and to predict disease progression. The resulting models were validated in an independent, clinically well-characterized cohort from the Parkinson’s Disease Biomarker Program (PDBP) (n = 263 cases). Our analysis distinguished three distinct disease subtypes with highly predictable progression rates, corresponding to slow, moderate, and fast disease progression. We achieved highly accurate projections of disease progression five years after initial diagnosis with an average area under the curve (AUC) of 0.92 (95% CI: 0.95 ± 0.01 for the slower progressing group (PDvec1), 0.87 ± 0.03 for moderate progressors, and 0.95 ± 0.02 for the fast progressing group (PDvec3). We identified serum neurofilament light (Nfl) as a significant indicator of fast disease progression among other key biomarkers of interest. We replicated these findings in an independent validation cohort, released the analytical code, and developed models in an open science manner. Conclusions Our data-driven study provides insights to deconstruct PD heterogeneity. This approach could have immediate implications for clinical trials by improving the detection of significant clinical outcomes that might have been masked by cohort heterogeneity. We anticipate that machine learning models will improve patient counseling, clinical trial design, allocation of healthcare resources, and ultimately individualized patient care.

Disease

Cohort

Parkinson's Disease

15

Paper

Save

Genetic variation within genes associated with mitochondrial function is significantly associated with later age at onset of Parkinson disease and contributes to disease risk

Kimberley Billingsley et al.May 7, 2020

ABSTRACT Mitochondrial dysfunction has been implicated in the aetiology of monogenic Parkinson’s disease (PD). Yet the role that mitochondrial processes play in the most common form of the disease; sporadic PD, is yet to be fully established. Here we comprehensively assessed the role of mitochondrial function associated genes in sporadic PD by leveraging improvements in the scale and analysis of PD GWAS data with recent advances in our understanding of the genetics of mitochondrial disease. First, we identified that a proportion of the “missing heritability” of the PD can be explained by common variation within genes implicated in mitochondrial disease (primary gene list) and mitochondrial function (secondary gene list). Next we calculated a mitochondrial-specific polygenic risk score (PRS) and showed that cumulative small effect variants within both our primary and secondary gene lists are significantly associated with increased PD risk. Most significantly we further report that the PRS of the secondary mitochondrial gene list was significantly associated with later age at onset. Finally, to identify possible functional genomic associations we implemented Mendelian randomisation, which showed that 14 of these mitochondrial function associated genes showed functional consequence associated with PD risk. Further analysis suggested that the 14 identified genes are not only involved in mitophagy but implicate new mitochondrial processes. Our data suggests that therapeutics targeting mitochondrial bioenergetics and proteostasis pathways distinct from mitophagy could be beneficial to treating the early stage of PD.

Biology

Genetics

Mitophagy

0

Paper

Save

Assessing methylation detection for primary human tissue using Nanopore sequencing

Rylee Genner et al.May 27, 2024

DNA methylation most commonly occurs as 5-methylcytosine (5-mC) in the human genome and has been associated with human diseases. Recent developments in single-molecule sequencing technologies (Oxford Nanopore Technologies (ONT) and Pacific Biosciences) have enabled readouts of long, native DNA molecules, including cytosine methylation. ONT recently upgraded their Nanopore sequencing chemistry and kits from R9 to the R10 version, which yielded increased accuracy and sequencing throughput. However the effects on methylation detection have not yet been documented. Here we performed a series of computational analyses to characterize differences in Nanopore-based 5mC detection between the ONT R9 and R10 chemistries. We compared 5mC calls in R9 and R10 for three human genome datasets: a cell line, a frontal cortex brain sample, and a blood sample. We performed an in-depth analysis on CpG islands and homopolymer regions, and documented high concordance for methylation detection among sequencing technologies. The strongest correlation was observed between Nanopore R10 and Illumina bisulfite technologies for cell line-derived datasets. Subtle differences in methylation datasets between technologies can impact analysis tools such as differential methylation calling software. Our findings show that comparisons can be drawn between methylation data from different Nanopore chemistries using guided hypotheses. This work will facilitate comparison among Nanopore data cohorts derived using different chemistries from large scale sequencing efforts, such as the NIH CARD Long Read Initiative.

Dna Methylation

Nanopore Sequencing

Methylation

0

Paper

Save

The Parkinson's Disease GWAS Locus Browser

Francis Grenn et al.May 7, 2020

Parkinson's disease (PD) is a neurodegenerative disease with an often complex genetic component identifiable by genome-wide association studies (GWAS). The most recent large scale PD GWASes have identified more than 90 independent risk variants for PD risk and progression across 80 loci. One major challenge in current genomics is identifying the causal gene(s) and variant(s) from each GWAS locus. Here we present a GWAS locus browser application that combines data from multiple databases to aid in the prioritization of genes associated with PD GWAS loci. We included 92 independent genome-wide significant signals from multiple recent PD GWAS studies including the PD risk GWAS, age-at-onset GWAS and progression GWAS. We gathered data for all 2336 genes within 1Mb up and downstream of each variant to allow users to assess which gene(s) are most associated with the variant of interest based on a set of self-ranked criteria. Our aim is that the information contained in this browser (https://pdgenetics.shinyapps.io/GWASBrowser/) will assist the PD research community with the prioritization of genes for follow-up functional studies and as potential therapeutic targets.

Genome-wide Association Study

Locus (Genetics)

Genetic Association

0

Paper

Genome-wide Association Study

Locus (Genetics)

0

Save

202

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Mikhail Kolmogorov et al.Oct 24, 2023

Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer’s and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of-the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT-based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.

Nanopore Sequencing

Indel

Scalability

202

Paper

Nanopore Sequencing

Indel

0

Save