ResearchHub | Open Science Community

A gene-based association method for mapping traits using reference transcriptome data

Eric Gamazon et al.Aug 10, 2015

Hae Kyung Im and colleagues report a method for predicting gene expression perturbations from genotype data after training on reference transcriptome data sets. Association of predicted gene expression with disease traits identifies known and new candidate disease genes. Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.

Genetics

Molecular Biology

0

Paper

Save

The “All of Us” Research Program

Joshua Denny et al.Aug 14, 2019

Knowledge gained from observational cohort studies has dramatically advanced the prevention and treatment of diseases. Many of these cohorts, however, are small, lack diversity, or do not provide comprehensive phenotype data. The All of Us Research Program plans to enroll a diverse group of at least 1 million persons in the United States in order to accelerate biomedical research and improve health. The program aims to make the research results accessible to participants, and it is developing new approaches to generate, access, and make data broadly available to approved researchers. All of Us opened for enrollment in May 2018 and currently enrolls participants 18 years of age or older from a network of more than 340 recruitment sites. Elements of the program protocol include health questionnaires, electronic health records (EHRs), physical measurements, the use of digital health technology, and the collection and analysis of biospecimens. As of July 2019, more than 175,000 participants had contributed biospecimens. More than 80% of these participants are from groups that have been historically underrepresented in biomedical research. EHR data on more than 112,000 participants from 34 sites have been collected. The All of Us data repository should permit researchers to take into account individual differences in lifestyle, socioeconomic factors, environment, and biologic characteristics in order to advance precision diagnosis, prevention, and treatment.

Philosophy

Artificial Intelligence

0

Paper

Save

PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations

Joshua Denny et al.Mar 24, 2010

Abstract Motivation: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease–gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European–Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP–disease associations for these SNPs. Results: Four of seven known SNP–disease associations using the PheWAS algorithm were replicated with P-values between 2.8 × 10−6 and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP–disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. Availability:The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research. Contact: josh.denny@vanderbilt.edu

Genetics

Molecular Biology

0

Paper

Save

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies

Wei Zhou et al.Aug 8, 2018

In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness. SAIGE (Scalable and Accurate Implementation of GEneralized mixed model) is a generalized mixed model association test that can efficiently analyze large data sets while controlling for unbalanced case-control ratios and sample relatedness, as shown by applying SAIGE to the UK Biobank data for > 1,400 binary phenotypes.

Genetics

Rheumatology

1

Paper

Save

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

Joshua Denny et al.Nov 24, 2013

Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻⁶ (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.

Genetics

Molecular Biology

0

Paper

Save

Artificial intelligence, bias and clinical safety

Robert Challen et al.Jan 12, 2019

In medicine, artificial intelligence (AI) research is becoming increasingly focused on applying machine learning (ML) techniques to complex problems, and so allowing computers to make predictions from large amounts of patient data, by learning their own associations.1 Estimates of the impact of AI on the wider economy globally vary wildly, with a recent report suggesting a 14% effect on global gross domestic product by 2030, half of which coming from productivity improvements.2 These predictions create political appetite for the rapid development of the AI industry,3 and healthcare is a priority area where this technology has yet to be exploited.2 3 The digital health revolution described by Duggal et al 4 is already in full swing with the potential to ‘disrupt’ healthcare. Health AI research has demonstrated some impressive results,5–10 but its clinical value has not yet been realised, hindered partly by a lack of a clear understanding of how to quantify benefit or ensure patient safety, and increasing concerns about the ethical and medico-legal impact.11 This analysis is written with the dual aim of helping clinical safety professionals to critically appraise current medical AI research from a quality and safety perspective, and supporting research and development in AI by highlighting some of the clinical safety questions that must be considered if medical application of these exciting technologies is to be successful. Clinical decision support systems (DSS) are in widespread use in medicine and have had most impact providing guidance on the safe prescription of medicines,12 guideline adherence, simple risk screening13 or prognostic scoring.14 These systems use predefined rules, which have predictable behaviour and are usually shown to reduce clinical error,12 although sometimes inadvertently introduce safety issues themselves.15 16 Rules-based systems have also been developed to address diagnostic uncertainty17–19 …

Law

Health Informatics

0

Paper

Save

MedEx: a medication information extraction system for clinical narratives

Hanzhang Xu et al.Jan 1, 2010

Medication information is one of the most important types of clinical data in electronic medical records. It is critical for healthcare safety and quality, as well as for clinical research that uses electronic medical record data. However, medication data are often recorded in clinical notes as free-text. As such, they are not accessible to other computerized applications that rely on coded data. We describe a new natural language processing system (MedEx), which extracts medication information from clinical notes. MedEx was initially developed using discharge summaries. An evaluation using a data set of 50 discharge summaries showed it performed well on identifying not only drug names (F-measure 93.2%), but also signature information, such as strength, route, and frequency, with F-measures of 94.5%, 93.9%, and 96.0% respectively. We then applied MedEx unchanged to outpatient clinic visit notes. It performed similarly with F-measures over 90% on a set of 25 clinic visit notes.

Philosophy

Artificial Intelligence

0

Paper

Save

Exome-wide association study of plasma lipids in >300,000 individuals

Dajiang Liu et al.Oct 30, 2017

We screened variants on an exome-focused genotyping array in >300,000 participants (replication in >280,000 participants) and identified 444 independent variants in 250 loci significantly associated with total cholesterol (TC), high-density-lipoprotein cholesterol (HDL-C), low-density-lipoprotein cholesterol (LDL-C), and/or triglycerides (TG). At two loci (JAK2 and A1CF), experimental analysis in mice showed lipid changes consistent with the human data. We also found that: (i) beta-thalassemia trait carriers displayed lower TC and were protected from coronary artery disease (CAD); (ii) excluding the CETP locus, there was not a predictable relationship between plasma HDL-C and risk for age-related macular degeneration; (iii) only some mechanisms of lowering LDL-C appeared to increase risk for type 2 diabetes (T2D); and (iv) TG-lowering alleles involved in hepatic production of TG-rich lipoproteins (TM6SF2 and PNPLA3) tracked with higher liver fat, higher risk for T2D, and lower risk for CAD, whereas TG-lowering alleles involved in peripheral lipolysis (LPL and ANGPTL4) had no effect on liver fat but decreased risks for both T2D and CAD.

Genetics

Molecular Biology

0

Paper

Save

Inactivating Mutations in NPC1L1 and Protection from Coronary Heart Disease

Nathan Stitziel et al.Nov 12, 2014

Ezetimibe lowers plasma levels of low-density lipoprotein (LDL) cholesterol by inhibiting the activity of the Niemann-Pick C1-like 1 (NPC1L1) protein. However, whether such inhibition reduces the risk of coronary heart disease is not known. Human mutations that inactivate a gene encoding a drug target can mimic the action of an inhibitory drug and thus can be used to infer potential effects of that drug.We sequenced the exons of NPC1L1 in 7364 patients with coronary heart disease and in 14,728 controls without such disease who were of European, African, or South Asian ancestry. We identified carriers of inactivating mutations (nonsense, splice-site, or frameshift mutations). In addition, we genotyped a specific inactivating mutation (p.Arg406X) in 22,590 patients with coronary heart disease and in 68,412 controls. We tested the association between the presence of an inactivating mutation and both plasma lipid levels and the risk of coronary heart disease.With sequencing, we identified 15 distinct NPC1L1 inactivating mutations; approximately 1 in every 650 persons was a heterozygous carrier for 1 of these mutations. Heterozygous carriers of NPC1L1 inactivating mutations had a mean LDL cholesterol level that was 12 mg per deciliter (0.31 mmol per liter) lower than that in noncarriers (P=0.04). Carrier status was associated with a relative reduction of 53% in the risk of coronary heart disease (odds ratio for carriers, 0.47; 95% confidence interval, 0.25 to 0.87; P=0.008). In total, only 11 of 29,954 patients with coronary heart disease had an inactivating mutation (carrier frequency, 0.04%) in contrast to 71 of 83,140 controls (carrier frequency, 0.09%).Naturally occurring mutations that disrupt NPC1L1 function were found to be associated with reduced plasma LDL cholesterol levels and a reduced risk of coronary heart disease. (Funded by the National Institutes of Health and others.).

Genetics

Physiology

0

Paper

Save

Operational Implementation of Prospective Genotyping for Personalized Medicine: The Design of the Vanderbilt PREDICT Project

Jill Pulley et al.May 16, 2012

The promise of "personalized medicine" guided by an understanding of each individual's genome has been fostered by increasingly powerful and economical methods to acquire clinically relevant information. We describe the operational implementation of prospective genotyping linked to an advanced clinical decision-support system to guide individualized health care in a large academic health center. This approach to personalized medicine entails engagement between patient and health-care provider, identification of relevant genetic variations for implementation, assay reliability, point-of-care decision support, and necessary institutional investments. In one year, approximately 3,000 patients, most of whom were scheduled for cardiac catheterization, were genotyped on a multiplexed platform that included genotyping for CYP2C19 variants that modulate response to the widely used antiplatelet drug clopidogrel. These data are deposited into the electronic medical record (EMR), and point-of-care decision support is deployed when clopidogrel is prescribed for those with variant genotypes. The establishment of programs such as this is a first step toward implementing and evaluating strategies for personalized medicine. Clinical Pharmacology & Therapeutics (2012); 92 1, 87–95. doi:10.1038/clpt.2011.371

Biochemistry

Pharmacology

0

Paper

Biochemistry

396

0

Save