Trial-averaged metrics, e.g. tuning curves or population response vectors, are a ubiquitous way of characterizing neuronal activity. But how relevant are such trial-averaged responses to neuronal computation itself? Here we present a simple test to estimate whether average responses reflect aspects of neuronal activity that contribute to neuronal processing. The test probes two assumptions implicitly made whenever average metrics are treated as meaningful representations of neuronal activity: Reliability: Neuronal responses repeat consistently enough across trials that they convey a recognizable reflection of the average response to downstream regions. Behavioural relevance: If a single-trial response is more similar to the average template, it is more likely to evoke correct behavioural responses. We apply this test to two data sets: (1) Two-photon recordings in primary somatosensory cortices (S1 and S2) of mice trained to detect optogenetic stimulation in S1; and (2) Electrophysiological recordings from 71 brain areas in mice performing a contrast discrimination task. Under the highly controlled settings of data set 1, both assumptions were largely fulfilled. Moreover, better-matched single-trial responses predicted correct behaviour. In contrast, the less restrictive paradigm of data set 2 met neither assumption, with the match between single-trial and average responses being neither reliable nor predictive of behaviour. Simulations confirmed these results. We conclude that when behaviour is less tightly restricted, average responses do not seem particularly relevant to neuronal computation, potentially because information is encoded more dynamically. Most importantly, we encourage researchers to apply this simple test of computational relevance whenever using trial-averaged neuronal metrics, in order to gauge how representative cross-trial averages are in a given context.