There are many things in which Operations Management (OM) researchers can take pride. Since the inception of empirical OM, we have rigorously incorporated measurement reliability and validity into our analyses. In many respects, the OM literature is a few steps ahead of its sister disciplines — incorporating measurement error into analyses is perhaps the best example. We have also made considerable progress in terms of theory development, whether by way of case research or purely conceptual and theoretical analysis. Finally, recent developments in the area of problem solving and design science demonstrate that OM scholars are genuinely interested in solving actual managerial problems and remaining practically relevant. These are all reasons to celebrate the progress in empirical OM. But there are a number of blind spots, many of which continue to be reasons for rejections in the manuscript review process. The purpose of this editorial is to describe some of these issues. Specifically, there are a number of misunderstandings about some of the key methods used in manuscripts submitted to us. There are also some outdated practices that we want to discourage authors from using in their manuscripts. These issues are discussed in this editorial, in a roughly descending order of importance. We all know correlation does not establish causality. It is high time we do something about this. We constantly get manuscripts — based on cross-sectional surveys in particular — where the authors make causal claims. We no longer send to the review process manuscripts that uncritically interpret a cross-sectional correlation of X and Y as support of a causal claim, or more mildly, that the variance of X is driving the variance of Y. This applies to both econometric and structural equation models. The problem with assuming that the variance of X drives the variance of Y is well documented. Ignoring the problem often results in over-permissive tests of substantive hypotheses: we see evidence for our hypotheses even when there is not any. We now require all authors to take steps — theoretical or empirical, preferably both — to address the problem of endogeneity. This is now a standard practice in most top-tier management journals, and it is time for JOM, as a premier operations management journal, to follow suit. The literature on endogeneity is massive, going back almost a hundred years. Roberts and Whited (2013) offer a comprehensive summary of the key issues in the context of corporate finance research. All the issues discussed are directly applicable to OM research as well. In a nutshell, the problem of endogeneity is this: when a researcher is using non-experimental data to test the hypothesis that X has an effect on Y, it is possible that the variance of X is not exogenous but endogenous to the model. The end result is that the model is misspecified. This in fact applies not just to cross-sectional but even longitudinal research. Even if X is measured at t-1 and Y at t, there could be an unobserved variable Z that affects X and t-1 and Y at t. In a recent manuscript submitted to us, authors hypothesized that organizational integration drives employee commitment. Integration was assumed exogenous to commitment. This is a very problematic assumption, because we have many reasons to believe commitment could easily drive integration, making the variance of organizational integration indeed endogenous to the model. The consequence of endogeneity is asymptotic bias in parameter estimation. We must come to terms with the fact that plausible claims about the direction and magnitude of an effect cannot rest on an analysis that completely ignores endogeneity. If our inferences are to be biased, they need to be biased toward being conservative. The problem of endogeneity often has just the opposite effect, it inflates our results. We are not aware of any scientific principles that warrant the use of over-permissive inference. Examination of endogeneity starts with a simple question: What is the source of the variance in the exogenous variables in my model? So far JOM authors have been allowed simply to declare that these sources are exogenous to the model. Authors must take steps toward either demonstrating exogeneity or correcting for endogeneity. Both approaches have the common denominator: they call for addressing assumptions that have thus far gone untested. Endogeneity can probably never be completely eliminated from empirical analysis, and it is well known that many “solutions” create more problems than they solve (Murray, 2006). But there are no good reasons to avoid tackling the issue, at least theoretically. If the problem of endogeneity cannot be addressed empirically by testing for it or using instrumental variables or an experimental research design to mitigate it (Roberts and Whited, 2013), we expect at least a theoretical treatment of the topic in all JOM submissions where the general claim that one variable induces variance in another is made. When arguing that the variance of X gives rise to the variance of Y (causally or otherwise), we expect to see a plausible argument that the direction is indeed from X to Y, not vice versa, or perhaps caused by an omitted variable. Measurement error can also cause an endogeneity problem: if X and Y have a common measurement error source, X will unavoidably correlate with the error term of Y. Finally, sample selection bias may lead to problems very similar to that of endogeneity (Heckman, 1979). While there is definitely a time and a place for cross-sectional research, we strongly encourage cross-sectional researchers to rethink their research designs. We all know how difficult it is to get longitudinal data, but prospective JOM authors must push themselves on this issue and try to fix at least some of the problems of past research by getting out of their comfort zone. If we want to know the magnitude of the effect X has on Y, cross-sectional data is almost guaranteed not to give us a valid estimate. Not only the principles of scientific rigor but also those of practical relevance demand that we get the magnitude right. Many authors continue to build their arguments on the premise that application of statistical inference boils down to rule following. One of the most commonly found “rules of thumb” in manuscripts submitted to us is the claim that a measure is internally consistent if Cronbach's alpha exceeds .70. Nunnally's (1994) book Psychometric Theory is typically cited as the source. But to attribute the rule to Nunnally is a tell-tale sign one has not actually read Nunnally, because if anything, he claimed just the opposite: the criteria for adequate reliability always depend on the context. Lance et al. (2006) unambiguously debunk the “.70 rule.” The technical details of the argument can be found in the works cited in this editorial; there is no need to reproduce them here. If a manuscript submitted to us makes extensive use of unsubstantiated, non-inferential “rules of thumb,” we will desk reject the manuscript. We say non-inferential, because it is crucial to make a distinction between rules that directly link to an inferential test and those that do not. Model fit in structural equation modeling is a good example. Consider two common tools for assessing model fit: the omnibus chi-square test and the Comparative Fit Index (CFI). The chi-square test is an inferential procedure: if the statistic is statistically significant, the model does not fit the data in the sense that the observed and predicted covariance matrices do not match. This is solid inference and methodologically acceptable reasoning. But the claim that a CFI > 0.95 means the model fits the data is not. This is because CFI is a descriptive index, not a test statistic with a commensurate inferential test. A high CFI value simply means the focal model fits the data better than the baseline model. What is the baseline model? It is typically the model where all measured variables are assumed uncorrelated. Using such a baseline model is dubious, because we already know it provides horrible fit for the data. All the CFI thus tells you is that your model fits the data better than a model that does not fit the data at all. It is difficult to see the insight in this conclusion. Lance et al. (2006) discuss the issue in detail, and Tanaka (1993) provides structural equation modelers with a great overview of SEM model fit. H0: the measurement instrument is not internally consistent H1: the measurement instrument is internally consistent Using the “alpha > .70″ rule can help reject the null, but this is not an inferential test, it is merely a social convention that has no methodological basis. What is more, applying the rule misconstrues what methodological texts have actually said. H0: the model fits the data H1: the model does not fit the data The chi-square omnibus test fares much better than the “alpha > .70″ rule. The chi-square test is a valid inferential procedure (it is a test that produces a p-value). A rigorous modeler would consider the fact that the null means the model fits the data, which means low statistical power works to the advantage of the model, not against it. This leads to an over-permissive test, and sometimes this can present a problem. A skillful researcher is able to examine whether or not this is cause for concern. Of course, it is possible to reformulate the null and the alternative hypotheses such that over-permissiveness is not a problem. As far as author requirements, authors of JOM submissions must exhibit an understanding of which rules have a basis in formal statistical inference and which do not. Here, a very simple litmus test works very well: Does the procedure I am using produce a test statistic (with a p-value) or not? At the very minimum, we expect authors to know which rules are simply “urban legends.” This is crucial, because many of the cutoff criteria cited by OM researchers have been thoroughly discredited in the methods literature (Cortina, 2002; Lance, 2011; Lance et al., 2006; Lance and Vandenberg, 2008; Spector and Brannick, 2011). Prospective JOM authors must make themselves aware of this important literature. Citing “an urban legend” will likely lead to desk rejection of the manuscript. Instead of relying on “rules of thumb,” we encourage authors to contextualize their measurement. Indeed, this is what methodological authorities such as Nunnally actually recommend (e.g., Nunnally and Bernstein, 1994, p. 249). Suppose you are interested in estimating a regression model with two explanatory variables (x1 and x2) and a dependent variable (y), and you are assessing measurement reliability. By contextualization we mean asking the question: How does measurement error in my variables affect estimation? It is well known that measurement error in an independent variable is more problematic than in the dependent variable. One can think of measurement error as one of the components of the regression error term, therefore, measurement error in the dependent variable is implicitly already modeled. The statistical consequence of measurement error in the dependent variables is loss of efficiency, which typically does not create problems, particularly if the sample is of reasonable size. Measurement error in the independent variables, in turn, likely causes asymptotic bias to estimates. Although there are no hard and fast rules on the consequence, the resultant bias is roughly proportional to the amount of measurement error (Kennedy, 2008; Maddala, 1988). Increasing sample size does not fix the problem, because bias is asymptotic. How many authors citing the “alpha > .70 rule” for an independent variable realize that they are implicitly admitting that an asymptotic bias of up to 30 percent in a parameter estimate is acceptable? How much sense does it make to report parameter estimates with three significant digits when even the first digit is likely wrong? The Variance Inflation Factor (VIF) to test for multicollinearity is a perfect example of lack of contextualization. There is nothing wrong with the VIF itself, but every recommended cutoff must simply be ignored. In short, the VIF tells the researcher how much the variance of the parameter estimates has been inflated due to collinearity of predictors. But the VIF value has no meaning until one has looked at the magnitude of the variances of the parameter estimates. In large samples, variances of estimates are very small, therefore, even a tenfold (VIF = 10) increase may not present any significant problems. In a small sample, doubling of the variance (VIF = 2) may already be cause for concern. All generally recommended cutoffs that ignore sample size are nonsense. In general, statistics experts (at least sensible ones) never give recommendations without incorporating the context. If you ask an expert on estimation theory which estimator you should use in your model, you will not get an answer until you have described in detail your model, your data, the distributions of your variables, and the extent to which you believe your model is correctly specified. Should SEM researchers start using Bayesian estimators (BSEM), for instance? This is what the architect of one of the most commonly used estimators had to say only a few years ago: “Much more experience is needed… More needs to be learned about the performance of BSEM parameter posterior estimation using different informative priors for different types of models, sample sizes, and variable distributions” (Muthén and Asparouhov, 2012, p. 333). The fundamental problem with the use of various cutoffs for reliability and validity is that they turn measurement questions into yes/no issues, when it should be obvious that most issues that have to do with numbers are matters of degree. It is time to embrace this premise in empirical analysis. We understand that OM scholars are not statisticians, but one needs to understand the tools one uses to an appreciable depth. The most alarming example is the continuing use of Partial Least Squares modeling. We are desk rejecting practically all PLS-based manuscripts, because we have concluded that PLS has been without exception the wrong modeling approach in the kinds of models OM researchers use. Most of the time, use of PLS is (incorrectly) justified by saying that PLS is suitable for small samples, that it should be used when one has formative indicators in a measurement model, or that it is suitable when the Maximum Likelihood estimator fails to converge to a solution. All are poor excuses for using PLS. Claiming that PLS fixes problems or overcomes shortcomings associated with other estimators is an indirect admission that one does not understand PLS. Consequently, we will automatically desk reject a manuscript that makes incorrect claims about the applicability of the estimator (obviously, any estimator, not just PLS). After all, choosing the right estimator is one of the most important steps in statistical inference. The primary prescription is simple: never use an estimator or a modeling method you do not understand. As far as PLS is concerned, there have been lots of discussions in recent years in the methods literature; there is no need to reproduce the technical details here. If you think PLS is the appropriate estimator, take a look at Marcoulides and Chin (2013), Rönkkö and Evermann (2013), and McIntosh et al. (2014). Upon reading these articles, if you are still convinced PLS is a suitable estimator for your model, we welcome your PLS-based analysis to the journal. In your manuscript submission, clearly justify the use of PLS in light of the three articles cited above. This obviously applies to all estimators and modeling techniques. Many statistical programs (such as Stata and Mplus) have at least a dozen estimators from which an author can choose. This choice must be made transparently. The choice is transparent when the author clearly discusses both the strengths and the weaknesses of the chosen estimator. Not a single PLS manuscript submitted to JOM has discussed the weaknesses of the estimator. Authors should always avoid rhetoric such as “expert X has suggested that estimator Y be used.” Such rhetorical appeals must be replaced with methodological justification. Ketokivi and Schroeder (2004) showed that common method bias is impossible to address in survey research in an adequate manner unless one uses multiple informants per observational unit. Yet, many authors of single-informant studies make strong claims that common method bias is not a concern in their study. Such claims are dubious, because the effect of one of the well-known sources of potential bias — the informant (e.g., Campbell and Fiske, 1959; Phillips, 1981) — simply cannot be tested. Authors often use Harman's (1967) single-factor test as the inferential tool to test common method bias. Using the word test to describe the technique is, however, misleading: Harman's test is a more or less arbitrary procedure with no commensurate inferential test. Indeed, Podsakoff et al. (2003, p. 889) note that “despite the fact this procedure is widely used, we do not believe it is a useful remedy to deal with the problem.” It should go without saying that authors must not use procedures that are not useful. You should always be aware of what can and cannot be tested. The conclusion of no common method bias must be made with much caution (if at all) if the statistical test used to examine it is weak. Harman's test is perhaps the best example. For survey researchers, we have a very simple recommendation: either give up single-informant surveys or stop making strong claims about common method bias. If the research design relies on only one informant per observational unit, there is no way to determine what proportion of item variance is trait variance. Further, as Podsakoff et al. (2003) aptly note, all techniques — including the most sophisticated ones — have problems associated with them. Prospective JOM authors must understand what these weaknesses are, and accordingly, not assume there are unambiguous technical fixes to common method bias. There are no straightforward remedies to common method bias, because its sources are diverse and complex; this is the key message in Podsakoff et al. (2003). All we can recommend is that JOM authors use the most rigorous test available with their data. If the most rigorous test is Harman's single-factor test, authors should be extremely cautious with their conclusions. Addressing common method bias must really start at the research design phase: most effective remedy is to be ex ante smart about the issues. Many ex post analyses can only diagnose whether or not there is a problem — if there is a problem, there is usually not much the researcher can do at that point. If you cited Baron and Kenny (1986), Podsakoff and Organ (1986), Harman (1967), and Fornell and Larcker (1981) in your work back in the 1980s, you were probably fine. In 2015, you need to be careful. While statistical theory itself has not progressed all that much, the software applications that we all have on our desktops have massively improved. We have many solutions available to us now that we did not have in the 1980s; many of the shortcuts we took back in the day no longer need to be taken; many of the assumptions we were forced to make can now be relaxed. It behooves us to stay on top of current methodological developments, and accordingly, what was accepted in the journal ten, twenty years ago, is not necessarily acceptable anymore. As a general rule, prospective JOM authors should be cautious with benchmarking for methodology research published more than twenty years ago. Authors must of course give credit where credit is due, but it should go without saying that using Fornell and Larcker (1981) — an article published 35 years ago in a marketing journal — for methodological guidance in an operations management article submitted to a top journal in 2015 should be done with much caution. Older texts have a lot of useful material, but they are not to be used as research manuals. It is no exaggeration to say that new methodological developments come out every month. Most of these developments are minor, but many of them are noteworthy. Therefore, instead of using Baron and Kenny (1986) as a resource on how to deal with mediation hypotheses, one might look at Hayes (2013) and James et al. (2006) for updated approaches. One can even find published SPSS and SAS routines for testing mediation (Preacher and Hayes, 2004). There is just no excuse for not using up-to-date tools. How about a check on Google Scholar to find out whether there have been any new developments in methodology relevant to your work before you submit your manuscript? The Organizational Research Methods journal is also a wonderful resource for all management scholars. For instance, a lot of the relevant discussion on PLS has been published in ORM. The essence of science lies in the collective quest toward continuous progress. In terms of methodology, this means we need to strive for stronger inference. All the criteria described above have this primary objective. Most of us are, in one way or another, interested in the question How important is X as far as the outcome Y is concerned? Moving toward stronger inference will lead us toward an unbiased estimate of this effect. We want to encourage both authors and reviewers not just to become aware of the key problems, but also do something about them. All the problems described in this editorial are remediable, and we are already seeing authors engaging these problems in the more recent submissions to the journal. This is very encouraging.