Abstract Trait stability of measures is an essential requirement for individual differences research. Functional MRI has been increasingly used in studies that rely on the assumption of trait stability, such as attempts to relate task related brain activation to individual differences in behavior and psychopathology. However, recent research using adult samples has questioned the trait stability of task-fMRI measures, as assessed by test-retest correlations. To date, little is known about trait stability of task fMRI in children. Here, we examined within-session reliability and longitudinal stability of task-fMRI using data from the Adolescent Brain Cognitive Development (ABCD) Study using its tasks focused on reward processing, response inhibition, and working memory. We also evaluated the effects of factors potentially affecting reliability and stability. Reliability and stability [quantified via an intraclass correlation (ICC) that focuses on rank consistency] was poor in virtually all brain regions, with an average ICC of .078 and .054 for short (within-session) and long-term (between-session) ICCs, respectively, in regions of interest (ROIs) historically-recruited by the tasks. ICC values in ROIs did not exceed the ‘poor’ cut-off of .4, and in fact rarely exceeded .2 (only 5.9%). Motion had a pronounced effect on estimated ICCs, with the lowest motion quartile of participants having a mean reliability/stability three times higher (albeit still ‘poor’) than the highest motion quartile. Regions with stronger activation tended to show higher ICCs, with the absolute value of activity and reliability/stability correlating at .53. Across regions, the magnitude of age-related longitudinal (between-session) changes positively correlated with the longitudinal stability of individual differences, which suggests developmental change was not necessarily responsible for poor stability. Poor reliability and stability of task-fMRI, particularly in children, diminishes potential utility of fMRI data due to a drastic reduction of effect sizes and, consequently, statistical power for the detection of brain-behavior associations. This essential issue needs to be addressed through optimization of preprocessing pipelines and data denoising methods.