The optimal diagnostic workup of pulmonary embolism (PE) in the emergency department (ED) remains debated and traditionally uses a sequential approach with the evaluation of the clinical probability, followed by D-dimer testing, chest imaging, or both. Scores such as the Wells score were developed to limit the number of patients who undergo diagnostic imaging (computed tomography or ventilation–perfusion scanning) by ruling out PE with a normal blood D-dimer test. Although the original motivation to reduce PE imaging was to limit costs and radiation, more recently clinical decision rules have focused on reducing overdiagnosis of inconsequential PE, false-positive diagnoses of PE, and ED overcrowding, which can be worsened by waits for imaging. The first study to demonstrate safety of excluding PE without the use of diagnostic imaging combined a low Wells score with a negative D-dimer and was published 23 years ago.1 This study used an uneventful 3-month follow-up period to determine that PE had been safely ruled out. Over the ensuing decades, researchers have studied sequential changes to this initial approach, motivated to further reduce the need for diagnostic imaging further or else to simplify the PE testing process. These researchers adopted 3 months of follow-up without the diagnosis of lower limb deep vein thrombosis (DVT) or PE as the definitive marker that the rule-out strategy was safe. The rate at which lower-limb DVT or PE are diagnosed in those testing negative for PE has become known as the "failure rate" and is interpreted as the true safety measure of the PE testing strategy. The Emergency Advisory and Research Board on Thrombosis and Hemostasis (EARTH) is an international group of emergency physicians with special expertise on thrombosis and PE research. In this paper, we argue that the term "failure rate" as a primary outcome is not well defined, with studies reporting overly optimistic or else pessimistic failure rates, depending on the approach used. The variation in the definition of failure rate is problematic as it could influence the adoption of specific PE testing strategies into clinical practice based on misinterpretation of safety. The crux of this issue is the choice of the population for whom the failure rate is calculated (i.e., the denominator of the calculation). The choice of the population has varied from study to study. There are several candidates for this: (a) all patients who avoided imaging with the new PE testing decision rule, but who would have been imaged if a "traditional" decision rule was used; (b) all patients who did not require imaging as per the new PE testing decision rule; or (c) all patients who tested negative for PE, with or without diagnostic imaging. The PEGeD study and the YEARS study can be used to illustrate these three possibilities (Table 1).2, 3 In the PEGeD study,2 a total of 2017 patients were evaluated, among whom 1325 had no chest imaging based on the D-dimer results, which included 315 patients in whom chest imaging was avoided because the PEGeD rule was negative (i.e., low clinical probability and D-dimer between 500 and 1000 ng/mL). Using this example, method (a) would use 355 patients, method (b) 1325, and method (c) 1863 as the failure rate denominator. The first one (most conservative) was used to confirm the safety of the tested rule, with a 95% confidence interval (CI) of the failure rate of 0% to 1.03%. In the YEARS study,3 method (a) would use 437, method (b) 1629, and method (c) 2946 patients. The latter denominator was used, with a reported failure rate of 0.61%. If the first group (most conservative) was used, the failure rate would have been six of 437, with an upper bound of the 95% CI of 3.0%. The International Society of Thrombosis and Hemostasis (ISTH) recommended using option (b): all patients in whom PE was ruled out without an imaging test.4, 5 However, they also recommended that only the point estimate and not the 95% CI should be compared with the "b" threshold. Reporting a point estimate below a certain safety threshold is not a guarantee of safety. For example, a study reporting a failure rate of 1%, with a 95% CI of 0.5% to 3%, includes a sizable probability that the tested strategy is not safe. Yonathan Freund and Kerstin de Wit drafted the paper. All authors contributed substantially to the final version. In the past 3 years, FG's institution (McMaster University) received research funding from NovoNordisk, Roche, Takeda, Bayer, Pfizer, BioMarin, and CSL. The other authors declare no conflicts of interest. No data in this text.