Statistics in MedicineVolume 35, Issue 11 p. 1880-1906 Research ArticleOpen Access Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods Stephen Burgess, Corresponding Author Stephen Burgess orcid.org/0000-0001-5365-8760 Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K. Correspondence to: Stephen Burgess, Strangeways Research Laboratory, Department of Public Health and Primary Care, University of Cambridge, 2 Worts Causeway, Cambridge, CB1 8RN, U.K. E-mail: sb452@medschl.cam.ac.ukSearch for more papers by this authorFrank Dudbridge, Frank Dudbridge Department of Non-communicable Disease Epidemiology, London School of Hygiene Tropical Medicine, London, U.K.Search for more papers by this authorSimon G. Thompson, Simon G. Thompson orcid.org/0000-0002-5274-7814 Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K.Search for more papers by this author Stephen Burgess, Corresponding Author Stephen Burgess orcid.org/0000-0001-5365-8760 Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K. Correspondence to: Stephen Burgess, Strangeways Research Laboratory, Department of Public Health and Primary Care, University of Cambridge, 2 Worts Causeway, Cambridge, CB1 8RN, U.K. E-mail: sb452@medschl.cam.ac.ukSearch for more papers by this authorFrank Dudbridge, Frank Dudbridge Department of Non-communicable Disease Epidemiology, London School of Hygiene Tropical Medicine, London, U.K.Search for more papers by this authorSimon G. Thompson, Simon G. Thompson orcid.org/0000-0002-5274-7814 Department of Public Health and Primary Care, University of Cambridge, Cambridge, U.K.Search for more papers by this author First published: 13 December 2015 https://doi.org/10.1002/sim.6835Citations: 276AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Mendelian randomization is the use of genetic instrumental variables to obtain causal inferences from observational data. Two recent developments for combining information on multiple uncorrelated instrumental variables (IVs) into a single causal estimate are as follows: (i) allele scores, in which individual-level data on the IVs are aggregated into a univariate score, which is used as a single IV, and (ii) a summary statistic method, in which causal estimates calculated from each IV using summarized data are combined in an inverse-variance weighted meta-analysis. To avoid bias from weak instruments, unweighted and externally weighted allele scores have been recommended. Here, we propose equivalent approaches using summarized data and also provide extensions of the methods for use with correlated IVs. We investigate the impact of different choices of weights on the bias and precision of estimates in simulation studies. We show that allele score estimates can be reproduced using summarized data on genetic associations with the risk factor and the outcome. Estimates from the summary statistic method using external weights are biased towards the null when the weights are imprecisely estimated; in contrast, allele score estimates are unbiased. With equal or external weights, both methods provide appropriate tests of the null hypothesis of no causal effect even with large numbers of potentially weak instruments. We illustrate these methods using summarized data on the causal effect of low-density lipoprotein cholesterol on coronary heart disease risk. It is shown that a more precise causal estimate can be obtained using multiple genetic variants from a single gene region, even if the variants are correlated. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. 1 Introduction An instrumental variable (IV) can be used to estimate the causal effect of a risk factor on an outcome from observational data 1, 2. A valid IV must be associated with the risk factor of interest but not associated with other factors on alternative causal pathways. This implies that it is not associated with any confounder of the risk factor–outcome association and that any causal pathway from the IV to the outcome passes through the risk factor 3. Much recent attention has been devoted to IV analysis in the context of Mendelian randomization, defined as the use of genetic variants as IVs 4, 5. The causal effect of the risk factor on the outcome with a single IV can be estimated by dividing the coefficient from the regression of the outcome on the IV by the coefficient from the regression of the risk factor on the IV 6. This is known as the ratio of coefficients method. Alternatively, the same estimate can be obtained by first regressing the risk factor on the IV and then regressing the outcome on the fitted values of the risk factor from the first-stage regression 7. This is known as the two-stage least squares (2SLS) method. The 2SLS method can be extended for use with multiple IVs 8. As the number of IVs increases, overfitting in the first-stage regression model leads to systematic finite-sample bias in the causal estimate 9. This bias, known as weak instrument bias, acts in the direction of the confounded observational association between the risk factor and outcome 10. When there is a single IV, the median bias of the ratio (or 2SLS) method estimator is negligible for all but the weakest of IVs 11. A recent methodological development to exploit this fact is to aggregate multiple IVs into a single univariate score, and to use this score as a single IV rather than to use multiple IVs 12. In Mendelian randomization, this is known as an allele score, genetic risk score or gene score. An alternative approach to combine information on multiple IVs is to use summarized data on the associations of genetic variants with risk factors and disease outcomes. These data are increasingly becoming available from large consortia, such as the Global Lipids Genetics Consortium (GLGC) for lipid fractions 13 and DIAGRAM for type 2 diabetes 14. Causal estimates can be obtained from these associations for a single genetic variant using the ratio method without the need for individual-level data. Two methods for obtaining causal estimates from summarized data for multiple IVs have been proposed: a summary statistic method, in which the ratio estimates from each IV are combined in an inverse-variance weighted meta-analysis 15, 16, and a likelihood-based method, in which the summarized data are modelled directly using a likelihood function 17, 18. The summary statistic method requires that the IVs are uncorrelated in their distributions (for genetic IVs, the variants are in linkage equilibrium). In this paper, we review and extend the literature on IV estimation methods with summarized data, currently described in disparate sources. In Section 2, we lay out the assumptions made in this paper for the identification of causal effects. In Section 3, we demonstrate how an allele score estimate with a pre-specified choice of weights can be reproduced using summarized data. We rederive the known result for uncorrelated IVs that the allele score and summary statistic methods using an internally derived choice of weights give the same estimates as a (multivariable) 2SLS method; the estimates differ for other choices of weights. We investigate the bias and coverage properties of the allele score and summary statistic methods in simulation studies for different choices of weights, in particular with weak instruments. In Section 4, we derive extensions to the previously described methods that can be used when the IVs are correlated and similarly investigate their statistical properties. In Section 5, the methods are illustrated using summarized data on the causal effect of low-density lipoprotein cholesterol (LDL-c) on coronary heart disease (CHD) risk, comparing causal estimates obtained using a single genetic variant with those obtained using multiple genetic variants from the same gene region. Finally, we discuss the relevance of these methodological developments to applied practice (Section 6). For reference, a summary of methods for IV estimation discussed in this paper is given in Table 1. Sample code for implementing the methods is given in Appendix A.1. We clarify that the individual-level data methods require individual participant data on the genetic variants used as IVs, risk factor and outcome. The summarized data methods only require data on the associations of the IVs with the risk factor and with the outcome. If limited individual-level data are available (for example, on the IV–risk factor relationship but not the IV–outcome relationship), then summarized associations can be obtained from the individual-level data, and the analysis can proceed using summarized data only. Table 1. Summary of instrumental variable (IV) estimation methods discussed in this paper. Method Equation(s) Comments Individual-level data methods Two-stage least squares Commonly used method in IV analysis (Section 1). Allele score Combine IVs into a single score, and use the score as a single IV in a two-stage least squares (or equivalently, ratio) method (Section 3.1). Summarized data methods (uncorrelated IVs) Allele score 2 and 3 The allele score estimate obtained using individual-level data can be approximated using summarized data (Section 3.2). Summary statistic (inverse-variance weighted) 4 and 5 The summary statistic estimate combines the estimates from each IV in an inverse-variance weighted formula (Section 3.3). This estimate can also be motivated by weighted linear regression through the origin using the precisions of the IV associations with the outcome as weights. Likelihood-based method 6 The likelihood-based method fits a model for the summarized data using either maximum likelihood or Bayesian methods for inference (Section 3.4). Summarized data methods (correlated IVs) Allele score 2 and 8 The allele score estimate with summarized data is not affected by correlation between the IVs; although the estimate's precision is altered (Section 4.1). Summary statistic (inverse-variance weighed) 4 and 9 With correlated variants, the summary statistic formula can be used to test for a causal effect (although the standard error of the expression must be modified, Section 4.2), but it does not provide an estimate of the causal effect. Weighted generalized linear regression 10 and 11 With correlated variants, a weighting matrix can be obtained using the standard errors of the IV associations with the outcome and the correlations between the variants. The coefficient from weighted generalized linear regression using this weighting matrix provides an estimate of the causal effect (Section 4.2). Likelihood-based method A1 Correlation between summarized estimates can be incorporated into the likelihood model for the summarized data. (Appendix A.3). 2 Modelling assumptions In this paper, the situation of a continuous risk factor and a continuous outcome will be assumed, although the binary outcome case can be handled in a similar way. We assume that the causal effect of the risk factor on the outcome is linear and homogeneous in the population without effect modification. We also assume that the associations of the IVs with the risk factor are linear and homogeneous in the population without effect modification. As shown previously, these assumptions lead to the identification of the causal effect 6. These strong assumptions are not necessary for the estimation of a causal effect; alternative assumptions, such as monotonicity of the IV–risk factor association or no additive effect modification of the causal effect across levels of the instrument at different values of the risk factor, are able to identify a causal parameter 19. However, there is no guarantee that these weaker assumptions will ensure that the same causal effect is estimated by all IVs, particularly for the monotonicity assumption, which identifies a local average treatment effect 20. Hence, weaker assumptions may be tenable in some cases, but the homogeneity assumption is made in this paper. If the IV–risk factor and IV–outcome associations are estimated in different datasets (known as two-sample Mendelian randomization 21), we assume that these datasets are sampled from the same underlying population, such that the true association and causal parameters are equal in both datasets. We assume that association estimates used in Mendelian randomization analyses are not conditional on any covariates. If the outcome is continuous, then adjustment for covariates should not affect estimates asymptotically, provided that adjustment is performed uniformly across genetic variants, the covariates are not on the causal pathway from the IV to the outcome, and the IVs remain valid after conditioning on the covariates (so, for example, each IV is independent of confounders conditional on the covariates). If the outcome is binary and association estimates are obtained via logistic regression, then adjustment for covariates will affect estimates asymptotically as coefficients from logistic regression are non-collapsible 22. However, this should not affect the validity of causal findings, provided that the IVs are valid both marginally and conditionally on the covariates. In particular, adjustment for baseline covariates (such as age and sex) should not be an issue. A full discussion on adjustment of covariates in IV analysis is beyond the scope of this manuscript; further information is available elsewhere 23. Although these assumptions are restrictive, we note that even if these parametric assumptions are not satisfied, a Mendelian randomization investigation can still be interpreted as a test of the causal null hypothesis, even if the magnitude of the causal effect estimate does not have an interpretation 24, 25. Hence, while these assumptions are necessary to ensure the same causal effect parameter is identified by all IVs, and so that the methods provide consistent estimates of a causal parameter (even in a two-sample setting), causal inferences from the methods (that is, rejection or otherwise of the null hypothesis of no causal effect) are valid under much weaker assumptions. A causal estimate is nevertheless necessary to combine evidence on the causal effect across multiple IVs. However, the causal estimate could be regarded as a test statistic rather than an estimate. Causal estimates from Mendelian randomization in practice should not be regarded too literally, for example, because different mechanisms for intervention on the same risk factor are likely to lead to different magnitudes of causal effect 5. Practical issues with respect to the choice of datasets for two-sample Mendelian randomization are discussed elsewhere 18. In brief, participants in the two datasets should be as similar as possible, for example, with regard to ethnic origin, as otherwise it is more likely that the IV assumptions are invalid in one or other of the datasets. The reason for the particular emphasis on ethnic origin is that genetic variants used in Mendelian randomization are often not the ‘causal’ variant but rather are correlated with the true functional variant through linkage disequilibrium. As linkage disequilibrium patterns often differ between ethnic groups, it would seem prudent to ensure that associations were measured in ethnically homogeneous populations as far as possible. Additionally, if the minor allele frequencies of variants differ between ethnic groups (or other distinct populations or subpopulations), population stratification may bias results 26. In publicly available data from genome-wide association studies, it is common to adjust for genome-wide principal components to reduce the influence of population stratification 27. This adjustment generally has a large cumulative effect on association estimates across the genome but a small effect on the association estimates of individual variants. It therefore should not affect association estimates substantially. Hence, although the inclusion of participants of different ethnicities does not necessarily violate the IV assumptions, in such a case, special care should be taken to ensure that the IV assumptions are satisfied in participants of all ethnicities and that the magnitudes of associations and the frequencies of alleles are similar in all subpopulations. 3 Uncorrelated instrumental variables Initially, we consider the scenario where the IVs are uncorrelated. 3.1 Individual-level data allele score method Most genetic variants used as IVs in Mendelian randomization are biallelic single nucleotide polymorphisms (SNPs) that can be represented as random variables taking the values 0, 1 or 2, denoting the number of risk factor-increasing alleles in the genotype of an individual. An unweighted allele score is constructed as the total number of risk factor-increasing alleles for an individual across multiple genetic variants. If an individual i has gik copies of the risk factor-increasing allele for each genetic variant k = 1,…,K, then their unweighted score is . This score takes integer values between 0 and 2K. A weighted score can also be considered, in which each variant contributes a weight reflecting the effect of the corresponding genetic variant on the risk factor. If the weight for variant k is wk, then individual i has a weighted score . Provided that the genetic variants that comprise the score are valid IVs, either score can then be used in an IV analysis. Weights are typically taken as estimates of the associations of each IV in turn with the risk factor, obtained from univariate linear regression analyses. These associations may be estimated in the data under analysis, or in an independent dataset. If the weights in an allele score are derived from the data under analysis, then they will be the same asymptotically as the coefficients from a multivariable regression of the risk factor on the IVs (under the assumption that the IVs are uncorrelated). Values of the weighted score for each individual would therefore equal the fitted values of the risk factor from that regression (up to an additive constant), meaning that the allele score and (multivariable) 2SLS estimates would coincide 12. In this case, the allele score estimate would suffer from the same weak instrument bias as the 2SLS estimate, and there is no benefit in using the allele score method. Better approaches are to estimate the weights using a cross-validation or jackknife approach 28, to pre-specify the weights using an external data source, or else (particularly if the variants have approximately equal effects on the risk factor) to use an unweighted score 12. Under weak instrument asymptotics (the strength of instruments as measured by the concentration parameter – the expected value of the F statistic from regression of the risk factor on the IVs – remains fixed as the sample size increases), confidence intervals (CIs) from the 2SLS method using standard asymptotic approximations are overly narrow and coverage rates are below nominal levels 29. Under conventional asymptotics (the strength of instruments increases as the sample size increases), the 2SLS estimator is the most efficient combination of the ratio estimates based on the individual IVs 8, page 553], and coverage rates should tend towards nominal levels. If the weights in an allele score method tend towards the true associations of the IVs with the risk factor, then the allele score estimate will be as efficient asymptotically as the 2SLS estimate. If the weights do not tend towards the true associations, and in particular for an unweighted score, the allele score estimate will be asymptotically inefficient. However, if the true weights of all the IVs are similar, then an unweighted analysis may be more efficient than a weighted analysis in finite samples, as previously demonstrated in a simulation study 12. 3.2 Summarized data allele score method We assume the context of a one-sample IV analysis in a single dataset with data on the risk factor (X), outcome (Y) and IVs (G1,…GK) in all participants. We assume that the estimate of association for IV k = 1,…,K with the risk factor is with standard error σXk, and the estimate of association with the outcome is with standard error σYk. These estimates are typically obtained from linear regression (or logistic regression for associations with a binary outcome). Although the standard errors are estimated, we assume that they are known without error. This may lead to slightly overprecise estimates, but coverage levels have been shown to be close to nominal levels in realistic simulations 17. With the weighted allele score ( ) used as a single IV, and writing cov for the sample covariance and var for the sample variance, the IV estimate is (1) as the association estimates are calculated as for each k = 1,…,K (similarly for each ). The weights wk are assumed to be pre-specified and are typically taken as the association estimates of each IV with the risk factor in an independent dataset. If the IVs explain a small proportion of variance in the outcome, then var(Gk) is approximately proportional to , and so the allele score estimate based on summarized data ( ) is (2) We note that at no point in this calculation have we made use of the fact that the genetic variants are uncorrelated. With equal weights, this is equivalent to performing separate inverse-variance weighted meta-analyses of the genetic associations with the outcome and of the genetic associations with the risk factor (as the parameters are approximately proportional to ) and then taking the ratio of the pooled estimates. Even in this unweighted case, the directions of the IV associations with the risk factor are required to be specified, even if the magnitudes of the associations are unknown. There have been reports of genetic variants having different directions of association with a risk factor in different datasets 30; however, the majority of these instances were in populations of different ethnic origins, emphasizing the need to use ethnically homogeneous populations in Mendelian randomization and in two-sample analysis in particular. The asymptotic standard error of the allele score estimate with uncorrelated variants (equation 2) can be approximated from summarized data using a delta method 31: (3) where θS is the correlation between the numerator and denominator in equation 2. This correlation can be estimated by bootstrapping with individual-level data, or else specified as the observed correlation between the risk factor and the outcome (a sensitivity analysis for the value is advised). In two-sample Mendelian randomization, this correlation is zero. If the genetic associations with the risk factor are precisely estimated, then the first term will dominate this expression. 3.3 Summary statistic (inverse-variance weighted) method The summary statistic estimate is calculated using summarized data on the associations of each IV with the risk factor and with the outcome. If the estimates are taken from the same individuals, this is a one-sample IV analysis; if the estimates are from non-overlapping groups, this is a two-sample analysis 21. The ratio method estimate of the causal effect of the risk factor on the outcome using IV k is . The asymptotic standard error of this estimate, derived from the first term of the delta method expansion for the ratio of two random variables 31, is . Using the formula for combining estimates in a fixed-effect inverse-variance weighted meta-analysis 32, the summary statistic estimate can be calculated as (4) The approximate asymptotic standard error of the summary statistic estimate is (5) This method was previously referred to as an ‘inverse-variance weighted’ method 17; this refers to the weights in the meta-analysis formula rather than the weights in the allele score. This estimate can also be motivated as the coefficient from a weighted linear regression of the on the without an intercept term, using the as weights. The standard error from an inverse-variance weighted linear regression in conventional statistical software is often incorrect and has to be modified by forcing the residual standard error to be unity; this can be achieved by dividing the reported standard error by the residual standard error in the regression analysis 33. If the weights wk in equation 2 are set to , then the summary statistic estimate equals the allele score estimate using summarized data , as previously noted 16. In this case, the standard error in equation 5 equals the first term in equation 3. For other choices of weights, the estimates and standard errors will differ. 3.4 Likelihood-based method A likelihood-based method has also been proposed, in which the IV associations with the risk factor and with the outcome for each IV are modelled directly by a bivariate normal distribution, with correlation θL assumed to be the same for each IV: (6) where βL is the causal parameter. The IV–risk factor association estimates are the implicit ‘weights’ in this method. They could be obtained from the dataset under analysis or from an independent dataset (a two-sample analysis). The standard errors of the association estimates σXk and σYk are used to specify the variance–covariance matrix for the normal distribution and as before are assumed to be known. Model parameters (βL and ξk,k = 1,…,K) can be estimated either by numerical maximization of the log-likelihood function or in a Bayesian framework 34. Standard errors for maximum-likelihood estimates can be obtained using the inverse-Hessian matrix. The correlation θL is due to the IV associations with the risk factor and with the outcome being estimated in the same data. There is likely to be little information on this parameter in the data 35, and so it may be best specified in the analysis as the observational correlation between the risk factor and outcome; a sensitivity analysis can be performed to assess the effect of varying the parameter value on the causal estimate. In a two-sample IV analysis, the correlation θL will be zero. 3.5 Simulation study We investigate the properties of estimates from these methods in a simulation study with two specific goals. The first is to demonstrate that estimates from the allele score method are similar whether they are calculated using individual-level or summarized data. The second is to compare the behaviour of the summarized data methods (allele score, summary statistic and likelihood-based) in a two-sample setting with external weights. We have previously shown that the summary statistic and likelihood-based methods give similar estimates and standard errors in a one-sample setting 17. If individual-level data were available in a one-sample setting, several other IV methods that are robust to weak instruments could be used, such as limited information maximum likelihood and the continuous updating estimator 36. However, we are unaware of extensions of these methods to a two-sample setting. Data on 5000 individuals were generated from the following model in which, for subject i, xi is the risk factor of interest, ui a confounder, yi the outcome, and gik=0,1,2 is the kth IV (k = 1,…,K), representing the number of risk factor-increasing alleles of a genetic variant: (7) The causal effect of the risk factor on the outcome is taken as βX=0.2 throughout. The risk factor-increasing allele frequency πk and strength of association of the kth IV with the risk factor αk are allowed to vary between the IVs. We set α = 0.05,0.1,0.2, and consider scenarios for K = 15 IVs with positive (βU=+1) and negative (βU=−1) confounding. As genetic variants are defined arbitrarily with respect to either the risk factor-increasing or risk factor-decreasing allele, the restriction to consider only positive values of αk does not result in any loss of generality. The mean proportion of variance in the risk factor explained by the IVs varies from 1.0% to 10.2%, corresponding to mean F statistics from 3.3 to 37.9. 10000 simulations were undertaken for each set of parameter values. In addition to crude weights (weights estimated naively from the data under analysis using univariate regression of the risk factor on each IV in turn, ) and equal weights (wk=1), we also consider external weights, corresponding to a two-sample IV analysis. The external weight for the kth IV is generated by sampling from a normal distribution with mean αk and variance . This is equivalent to estimating genetic associations with the risk factor in a separate dataset of size N generated under the same model 7. Although in practice, the sample size used for obtaining external weights is often larger than that for the main analysis, the weights will be obtained in a slightly different study population and so may not be entirely appropriate for the data under analysis. Less appropriate weights can be modelled by simulating additional random error in the weights (or equivalently by using a smaller sample size), although in practice there may be systematic as well as random