Abstract Research on attentional control has largely focused on single senses and the importance of behavioural goals in controlling attention. However, everyday situations are multisensory and contain regularities, both likely influencing attention. We investigated how visual attentional capture is simultaneously impacted by top-down goals, the multisensory nature of stimuli, and the contextual factors of stimuli’s semantic relationship and temporal predictability. Participants performed a multisensory version of the Folk et al. (1992) spatial cueing paradigm, searching for a target of a predefined colour (e.g. a red bar) within an array preceded by a distractor. We manipulated: 1) stimuli’s goal-relevance via distractor’s colour (matching vs. mismatching the target), 2) stimuli’s multisensory nature (colour distractors appearing alone vs. with tones), 3) the relationship between the distractor sound and colour (arbitrary vs. semantically congruent) and 4) the temporal predictability of distractor onset. Reaction-time spatial cueing served as a behavioural measure of attentional selection. We also recorded 129-channel event-related potentials (ERPs), analysing the distractor-elicited N2pc component both canonically and using a multivariate electrical neuroimaging framework. Behaviourally, arbitrary target-matching distractors captured attention more strongly than semantically congruent ones, with no evidence for context modulating multisensory enhancements of capture. Notably, electrical neuroimaging of surface-level EEG analyses revealed context-based influences on attention to both visual and multisensory distractors, in how strongly they activated the brain and type of activated brain networks. For both processes, the context-driven brain response modulations occurred long before the N2pc time-window, with topographic (network-based) modulations at ~30ms, followed by strength-based modulations at ~100ms post-distractor onset. Our results reveal that both stimulus meaning and predictability modulate attentional selection, and they interact while doing so. Meaning, in addition to temporal predictability, is thus a second source of contextual information facilitating goal-directed behaviour. More broadly, in everyday situations, attention is controlled by an interplay between one’s goals, stimuli’s perceptual salience, meaning and predictability. Our study calls for a revision of attentional control theories to account for the role of contextual and multisensory control.