Abstract The use of Monte-Carlo (MC) p -values when testing the significance of a large number of hypotheses is now commonplace. In large-scale hypothesis testing, we will typically encounter at least some p -values near the threshold of significance, which require a larger number of MC replicates than p -values that are far from the threshold. As a result, the list of detections can vary when different MC replicates are used, resulting in lack of reproducibility. The method of Gandy and Hahn (GH) (2014; 2016; 2017) is the only method that has directly addressed this problem, defining a Monte-Carlo error rate (MCER) to be the probability that any decisions on accepting or rejecting a hypothesis based on MC p -values are different from decisions based on ideal p -values, and then making decisions that control the MCER. Unfortunately, GH is frequently very conservative, often making no rejections at all and leaving a large number of hypotheses “undecided”. In this article, we propose MERIT, a method for large-scale MC hypothesis testing that also controls the MCER but is more statistically efficient than the GH method. Through extensive simulation studies, we demonstrated that MERIT controlled the MCER and substantially improved the sensitivity and specificity of detections compared to GH. We also illustrated our method by an analysis of gene expression data from a prostate cancer study.