My principal goal in this final section of the paper is to explore the degree to which the findings reported in the previous sections have been confirmed, and if they haven’t been confirmed to explore the possible cause(s) of the discrepancy. The field of multi-sensory integration has exploded since my earliest work on this topic (conducted during my graduate career at the University of Oregon, and presented in these publications: Klein, 1977 [
15]; Klein and Posner, 1974 [
28]; Posner, Nissen and Klein, 1976 [
23]). Whereas in the late 1980s I was still quite well informed about the multi-sensory literature, it would be imprudent to suggest that this is true now. Therefore, the task of determining the conceptual implications of the “old” work presented here for contemporary ideas about multi-sensory processing should be left to today’s experts.
When considering the degree to which the findings about cross-modal orienting reported here have been replicated, it is important to remain focused on the two adjectives in the title of this paper: “Covert exogenous”. With that focus established, let’s examine each of the findings reported above.
4.1. Localizable Auditory Stimuli Generate an Exogenous Shift of Covert Visual Attention
The finding, from Experiments 1–4, that localizable auditory stimuli generate an exogenous shift of covert visual attention was confirmed in Experiment 1 of a very thorough paper on this topic by Spence and Driver (1997) [
29]. As we did in Experiment 4, Spence and Driver used a target task that required a 2-AFC to ensure that their evidence for a covert shift of attention was not simply due to a criterion shift. In their comments on the work presented in Part I, of which they were aware, Spence and Driver noted that our evidence from Experiment 4 might have been compromised by a speed-accuracy tradeoff because accuracy on valid trials was slightly worse than in the other conditions (see
Table 4). Whereas it was probably appropriate for them to point this out (to enhance the empirical value of their findings) there are two reasons why our RT evidence for covert cross-modality orienting in Experiment 4 is unlikely to be compromised by a criterion shift. First, the RT effect was highly significant and the accuracy effect was not significant (F < 1). Secondly, Spence and Driver essentially replicated our finding. Hence, while a criterion shift explanation cannot be ruled out for the findings from Experiments 1–3, it is likely (and more parsimonious to assume) that the speed of processing (detecting and discriminating the properties of) visual targets was affected by their spatial position relative to the auditory cue. Indeed, Lee and Spence (2017) [
30] began a recent exploration of the spatial precision of such cuing effects by describing this finding as: “One of the most oft-replicated findings in the field of exogenous crossmodal spatial attention research …”.
In all of the studies of which I am aware, that have explored visual orienting toward the spatial positions of uninformative auditory cues, the sources of the auditory cues were visible and the visual targets were presented at the same locations where the auditory stimuli could be presented. The question that was posed in most of these experiments was “will visuo-spatial attention be automatically captured by a localized auditory stimulus”. It is reasonable to ask whether such auditory cues would be equally effective in capturing visuo-spatial attention if their sources were not visible and if their were a sufficient number and distribution of them to disrupt the participant’s ability to confidently link them with the spatial positions of the visual targets.
4.2. Such Cross-Modality Cuing Effects Are Relatively Automatic
The term “relatively” is intended, here, to anticipate several reactions. Whereas, due to my training I have been strongly influenced by the criteria for automaticity that were put forward by Posner and Snyder (1975) [
31], I recognize that there is considerable disagreement on this topic. Secondly, the research reported here only explores one criterion: “that the effect takes place regardless of our intentions”. Finally, “strong” automaticity implies not only that effects take place despite our intentions but also that these effects will not be modified by our intentions, an immunity that seems highly unlikely in studies of exogenous covert orienting (e.g., see Folk, Remington and Johnston, 1992) [
32].
The use of uninformative spatial cues in the Posner cuing paradigm is intended to give the observer no incentive to attend in the direction of cue. It is generally assumed that when cuing effects are observed they are due to involuntary capture of attention by the cue. In Experiment 3 we gave our participants an incentive to use the auditory cues to direct their visual attention toward the uncued (opposite) location. As illustrated in
Figure 2, when so incentivized to attend away from the auditory cues the cuing effect at the shortest SOA we tested (0 ms) was at least as large, if not larger, than when the cues were simply uninformative. This automatic capture of attention by the cue could be reversed by endogenous control, but this required over half a second [
33].
Finally, it should be noted that inhibition of return, which has been demonstrated to operate cross-modally (Spence et al., 2000) [
34], might be contributing to the negative cuing effect at the 1 s CTOA. It is unlikely to be contributing much at 500 ms because at this interval cuing was still positive when the auditory stimulus was uninformative.
4.4. Neither the Frequency Nor the Direction of a Frequency Glide Generates an Exogenous Shift of Covert Visual Attention
The inability of uninformative frequency glides to generate exogenous shifts of visual attention was demonstrated in the vertical condition of Experiment 1 and also in Experiment 6. In addition, as reported in [
13], a similar absence of visual cuing effects was observed when the auditory cues were the relative frequency or pitch of unvarying tones. As discussed in a recent review (Spence and Deroy, 2013) [
35], these null results are anomalous. Focusing on those studies designed to determine whether pitch or pitch glides might direct exogenous attention along the vertical axis, there are three papers (see
Table 1 in Spence and Deroy) that reported significant cuing effects from pitch and/or pitch glides: Chiou and Rich (2012) [
36], Fernández-Prieto, Vera- Constán, García-Morera, and Navarra (2012) [
37]; later published on line, Fernández-Prieto and Navarra, 2017 [
38]) and Mossbridge, Grabowecky, and Suzuki (2011) [
39]. These are all fine papers that employed interesting and revealing manipulations. Here we will examine pitch and pitch glides separately.
The only finding reported here that was about the possibility of visual orienting in the vertical dimension in response to static pitch (high versus low frequency tones) was briefly described in [
13]. Those findings are presented in
Figure 6 along with the results from 4 experiments in Chiou and Rich (2012) [
36] that used uninformative cues. Several points are worth noting. The average cuing effect, which was significant in each of the 4 experiments from Chiou and Rich, is rather small, at 5.6 ms. Even though our finding of a 3 ms cuing effect at an SOA of 500 is smaller when plotted against the somewhat noisy data points from Chiou and Rich it does not look anomalous. Finally, the number of participants we tested was 6 whereas in Chiou and Rich’s experiments the number of participants ranged from 12 to 18. Based on these points, I think it is reasonable to conclude that uninformative pitch weakly generates covert visual orienting exogenously. Two qualifications based on the findings from Chiou and Rich’s study are worth noting: In their Experiment 2 they did not obtain any cuing effect when the pitch difference was small (300 vs. 400 Hz) and in Experiment 3 they demonstrated that it is relative pitch in the context of the experiment that matters, not absolute pitch of the auditory cues.
In Experiments 1 and 6 uninformative frequency glides did not significantly affect visual detection latencies along the vertical axis. Statistically speaking, this finding conflicts with what was reported by Mossbridge et al. (2011) [
39] and Fernández-Prieto and Navarra (2017) [
38] (see
Figure 7). Because it seems likely that a 0 ms SOA is too short to permit orienting in response to a frequency glide that is beginning at time 0, and because these other studies only used longer SOAs, I believe it is reasonable, when exploring the source of the empirical conflict, to concentrate on the longer SOAs. Each of these studies has a feature that causes me to be cautious about putting too much weight on their findings.
Mossbridge et al. [
39] used an interesting go/nogo matching to sample task. At the start of a trial, and during the frequency glide, a colored disk was presented at fixation for 500 ms. Then, as indicted in the methods: “Upon offset of the reference circle and sound, the probe circle appeared in one of the four squares for 750 ms”. Given the relatively low salience of the remaining fixation dot, the lack of eye monitoring, the peripheral color discrimination required for correct responding and the fact that all RTs in this task came from trials for which the colors of the disk at fixation and the immediately following probe disk in the periphery matched, this method was almost certain to elicit saccadic eye movements. Consequently, the orienting explored in this otherwise excellent study was unlikely to have been covert. Fernández-Prieto and Navarra (2017) [
38] used a simple detection task as we did (in all but Experiment 4). The use of catch trials in simple detection tasks is considered de rigueur among reaction time experts. However, unlike our methods, which included a relatively high percentage of catch trials (33%), no catch trials were used by Fernández-Prieto and Navarra. Consequently, I am weakly inclined to believe that auditory frequency glides do not orient covert visual attention exogenously. Putting aside these concerns and ignoring the rational for focusing on the longer SOAs, it must be noted that the average cuing effect in our two studies (all filled symbols in
Figure 7) is about 5 ms. Because this is very similar to the small but significant effect of static pitch (
Figure 6), a cautious approach would be to recommend further data collection [
14].