1. Introduction
Sub-second timing prediction (TP) enables humans to accurately predict the occurrence of upcoming events. It can speed up behaviors, facilitate perceptions, and optimize the allocations of cognitive resources effectively [
1,
2]. The cerebellum and basal ganglia are major neural structures responsible for timing prediction, which plays a key role in single-interval and rhythmic timing, respectively [
3,
4]. Supplementary motor areas and the medial entorhinal cortex also contribute to the TP process [
5,
6,
7,
8]. Neural responses of precise TP within a single sensory modality have been widely studied [
1,
2]. However, less is known about how the TP works when the brain is concurrently faced with multimodality sensory inputs. Modality attention (MA) is the brain’s ability to prioritize information from a specific sensory modality, which can mitigate computational burdens induced by multimodality sensory inputs [
9]. Therefore, it is of vital importance to investigate whether and how the MA influences the neural effects of the precise TP.
Precise TP modulates both the pre-stimulus and evoked neural responses. Event-related potential (ERP) studies reported that the contingent negative variation (CNV) could index subjects’ time estimation ability [
10,
11,
12]. Neural oscillation studies highlighted that low frequency activities (<15 Hz) can represent TP. Specifically, the delta (1–3 Hz) phase reset had stronger inter-trial coherence (ITC) at the predicted moment during both rhythmic and non-rhythmic tasks [
13,
14]. Frontal theta (4–7 Hz) ITC was modulated by the prediction error magnitude when subjects undertook a visual temporal learning task, suggesting a close association with updating temporal information [
15]. The alpha (8–15 Hz) phase immediately before the visual stimulation was guided by top-down timing prediction [
16,
17], and alpha power changes were also found in some timing prediction studies [
18,
19]. Inspired by the predictive coding theory, which regards the brain as a prediction machine that can actively infer the external world and attempt to match incoming sensory inputs with top-down predictions [
20,
21,
22], there is growing agreement that the TP is a neural implementation of the predictive coding in the time domain [
23,
24]. Thus, comparing the evoked responses, which were induced by the stimuli emerging just at (matched timing prediction, MTP) and not at (violated timing prediction, VTP) the predicted moment, is promising to better reflect the neural effect of the precise TP. This hypothesis was supported by the observation that N1-P2 amplitudes indexed subjective time more accurately than the CNV [
25]. In our previous study, the TP was manipulated into different conditions by a visual task. In the early sensory processing stage (less than 400 ms after the target onset), the MTP conditions resulted in similar ERP profiles with the no timing prediction (NTP) conditions, whereas VTP condition suppressed N1 and enhanced N2 in the occipital brain area [
26]. However, this TP neural effect was observed when there were only visual stimuli. It remains unclear whether such an opposing effect still occur when the brain is concurrently faced with audio-visual stimuli, and whether neural responses in auditory modality are similar to that of visual or not.
MA is a crucial cognitive function for dealing with the overwhelming information induced by multimodality sensory inputs. When both the auditory and visual stimulations were presented, it is possible that MA would optimize sensory processing in a specific modality. However, previous studies concentrated more on how the MA influenced the multisensory integration of time information [
27,
28]. Neural evidence is still lacking regarding if and how MA influences the processing of the precise TP, especially in two aspects. First, it remains controversial whether the precise TP neural effect is independent of the MA, or if it performs differently under attended and unattended conditions. For this, an EEG study concurrently manipulated the visual–tactile attention and rhythmic-based timing prediction within an experiment. TP began to work preceding the MA, and the two processes had opposing effects in modulating early evoked responses [
29]. However, to the best of our knowledge, there have not been any studies investigating how auditory–visual attention modulates the neural responses of single-interval precise TP. Second, previous studies have suggested that the precise TP led to changes in either early evoked ERPs or low-frequency neural oscillations. However, it remains unclear which features may better reflect the neural effects of TP. Investigation is needed to determine how these features change when modality attention is attended or unattended.
This study investigated how audio–visual modality attention influences the neural effects of the single-interval precise TP. EEGs from 27 subjects were recorded and analyzed; ERPs, time–frequency analyses of ITC, and event-related spectral perturbation (ERSP) [
30] were calculated and compared, respectively. This experiment included three TP conditions: NTP, MTP, and VTP. MA conditions included visual-attended (Va), visual-unattended (Vua), auditory-attended (Aa), auditory-unattended (Aua). We found (i) in the visual modality, the TP led to the opposing N1-N2 performance only when the MA was attended. (ii) In the auditory modality, when the MA was attended, the MTP had the largest neural responses in P2 temporal window among distinct TP conditions, and these distinctions disappeared when the MA was unattended. (iii) Low-frequency ITC could better reflect the modulations of both the TP and MA. These results suggest that the MA can promote the neural effects of precise single-interval TP in early sensory processing.
2. Materials and Methods
2.1. Participants
Twenty-seven right-handed students from Tianjin University, aged between 18 and 26 years, were recruited for the experiment. Participants had to have normal or corrected-to-normal vision and be free from psychological or neurological diseases. Experimental procedures were approved by the Institutional Review Board at Tianjin University. All possible consequences were explained, and the written informed consent was obtained from all the participants.
2.2. Stimuli
This study designed a visual–auditory temporal discrimination task. The stimuli included flashing LED and buzzer, driven by a chronometric FPGA platform (Cyclone II: EP2C8T144C8), with a time precision of 20 ns in controlling. Visual stimuli were presented by a 15 × 15 mm
2 LED placed at the eye level, 80 cm away from participants. Auditory stimuli were generated by a buzzer located at the same position as LED. As
Figure 1a shows, there were three time intervals (TI) between first and second stimuli, (400 ms, 600 ms, and 900 ms). The first visual and auditory stimuli were presented concurrently, whereas the second were not. The visual TI differed from auditory TI, which formed six flash–buzzer combinations. An example of a single trial was shown in
Figure 1b. In this example, the trial started with a visual–auditory cue lasting for 1000 ms, then there was a blank period with a random duration selected from 1000 ms, 1500 ms, or 2000 ms; the random blank made the onset of first stimulus unpredictable. Next, the first visual–auditory stimuli emerged concurrently, and then the second visual and auditory stimuli appeared, but with different TIs. Finally, there was a pseudo-random period between 1600 and 3000 ms before the next trial. A block consisted of 30 trials. Each visual-auditory combination emerged for five trials, and all 30 trials were presented randomly.
2.3. Experimental Procedure
The formal experiment took place in an electrically shielded room and included six mental tasks. As described in
Figure 1c (left), the first three were visual tasks. For the first task, participants were required to indicate the onset of the second flash and if so, no specific moment was predicted. For the second task, participants had to indicate whether the second flash appeared 400 ms after first. Under this condition, 400 ms after first flash was the only predicted moment. For the third task, participants had to indicate whether the second flash appeared 600 ms after first flash, for example 600 ms after first flash was the predicted moment. Another three tasks were auditory tasks. For the first auditory task, participants were required to indicate the onset of second beep. For the second auditory task, participants had to indicate whether the second beep appeared 400 ms after first. For the third auditory task, participants had to indicate whether second beep appeared 600 ms after first flash. During auditory discrimination tasks, participants were required to maintain their sight on the visual stimuli at all the time, so that their visual inputs were completely identical to those in visual tasks. Participants made their decisions by pressing buttons with right/left thumb, which was balanced across blocks. Each task had four blocks. There were twenty-four blocks in total, all the twenty-four blocks were conducted randomly.
A precise-enough predictive template is a prerequisite for successfully manipulating the precise TP. For this reason, participants were trained for three days before the formal experiment; only when the discrimination accuracy was more than 80% could they start the formal experiment. On the first training day, participants first learnt about the three timing intervals and tried to discriminate them (i.e., TI400, TI600, and TI900), by watching the double-flash with a single timing interval, and specific TI was cued by the experimenter before each trial. After ~20 min of learning, they were then asked to determine which TI the presented double flash was by pressing buttons. On the second training day, subjects participated in a visual or auditory temporal discrimination task, in which they were required to judge whether the actual double-flash/beep was TI400 or TI600. Notably, in each block, there were 10–15 trials, the timing of which was randomly selected from 300 ms, 500 ms, and 800 ms. The aim of adding these untrained TIs into the training block is to avoid the participants from realizing there were only three kinds of actual stimuli in the formal test and allocated more attentional resources to the three specific moments. On the third training day, subjects underwent the same training as the second day.
2.4. Experimental Design for Forming Distinct MA-TP Conditions
MA was manipulated by the task-relevance. In visual tasks, attentional resources were allocated to visual modality. For visual tasks 1–3, they were visual attended (Va) (
Figure 1d upper), but auditory unattended (Aua) (
Figure 1e lower). In auditory tasks, attention was allocated to auditory modality. For tasks 4–6, they were auditory attended (Aa) (
Figure 1d lower), but visual unattended (Vua) (
Figure 1e upper).
Distinct TP conditions were formed by the interactions between mental tasks and actual onset moment of the second stimulus. Trials containing 400 ms TI were extracted, as the yellow (for visual modality) and red (for auditory modality) boxes in
Figure 1a shows. This means that the actual stimuli for analyzing were identical; but the predicted moment in subjects’ minds varied with mental tasks. Specifically, in tasks 1 and 4, there was no specific oriented moment, i.e., NTP condition. In this condition, induced neural activities were the least influenced by top-down process, which can be a baseline for studying how the TP changes evoked neural responses. In tasks 2 and 5, the only predicted moment was 400 ms after first stimulation, so the actual stimulus emerged exactly at the predicted moment, i.e., MTP condition. In tasks 3 and 6, 600 ms after first stimulation was the only predicted moment, which means that the actual stimulus occurred before the predicted moment, i.e., VTP condition.
In summary, the tasks for forming distinct MA-TP conditions were listed in
Figure 1d,e. There were 12 MA-TP conditions in total: Va-NTP, Va-MTP, Va-VTP, Vua-NTP, Vua-MTP, Vua-VTP, Aa-NTP, Aa-MTP, Aa-VTP, Aua-NTP, Aua-MTP, Aua-VTP.
2.5. EEG Recording and Pre-Processing
EEG was recorded using a 64-electrode Neurocan Synamps2 system at a sample rate of 10,000 Hz and was notch-filtered at 50 Hz. All electrodes were positioned on the scalp according to the International 10–20 system, and were all referenced to the tip of nose and grounded to the frontal brain area. Additional bipolar electrodes registered the electro-oculogram (EOG). An independent component analysis (ICA) was used to reject eye movement artifacts. Eye-related components were identified by comparing individual ICA components with EOG channels and by visual inspection. To collect qualified EEG signals, the impedance levels of all the electrodes were less than 10 kΩ.
In pre-processing, EEG data were filtered by a FIR Ⅰ low-pass filter cutting at 40 Hz and down-sampled to 200 Hz. According to the experimental design, the TP mainly worked after the second onset of TI400 double-stimulus, whereas responses to the first stimulus was almost not influenced. Therefore, the second stimulus onset was defined as the zero point. The correct trials with a reaction time less than 80 ms (relative to the zero-time point) were defined as the qualified trials; each MA-TP condition contained 35–40 qualified trials for subsequent EEG analyses in total.
2.6. Data Processing and Analyses
This study analyzed EEG from O1, OZ, and O2 electrodes for probing the evoked responses in visual modality, and EEG from F1, FZ, and F2 electrodes for auditory responses. Choosing these electrodes was based on earlier studies that investigated visual or auditory neural responses using these electrodes [
26,
31,
32].
For ERP analyses, baseline correction was performed using a 100 ms pre-stimulus baseline for the ERPs induced by the first and second stimulus, respectively. Such baseline correction was because this study mainly investigated the evoked responses rather than the CNVs.
The ERP technique, time-frequency analyses were used to measure the evoked neural responses under distinct MA-TP conditions. In visual modality, N1 component induced by first and second flash, and N2 component induced by second flash, were selected for further analyses. According to the separations of visual ERP profiles, the temporal windows for first N1, second N1, second N2 were defined as 140–200 ms after the first flash (i.e., −240 to −200 ms relative to zero point), 120–190 ms and 200–300 ms after second flash, respectively. In auditory modality, the temporal windows for P2 component induced by the first and second beep were defined as −230 to −180 ms and 110–250 ms, respectively. The ERP amplitude was calculated as the mean amplitude within specific temporal window.
The ITC and ERSP were calculated to show the event-related neural dynamics with a time-frequency distribution. ITC measures the phase synchronization to a set of experimental events to which EEG trials are time-locked, and it values between 0 and 1. The larger an ITC value is, the stronger the phase synchronization is. The ITC can be calculated as Equation (1). Moreover, the ERSP was used to visualize event-related changes in spectral power over time, with a baseline covering 100 ms before the first stimulus. It was calculated as Equation (2)
According to the inspection of the time-frequency distribution, in visual modality, three temporal windows were selected for ITC analyses (−300 to −150 ms, 100–200 ms and 200–300 ms relative to zero point, respectively). In auditory modality, three temporal windows were selected for ITC analyses (−300 to −150 ms, 100–200 ms and 200–400 ms, respectively). In ERSP analysis, the temporal windows of −100 to −200 ms, 100–200 ms and 200–400 ms were selected for both the visual and auditory analyses. As to the frequency information, 1–3 Hz, 4–8 Hz, 8–14 Hz and 15–30 Hz were defined as frequency windows of delta, theta, alpha and beta band, respectively.
2.7. Statistical Analyses
In behavioral analysis, the paired-samples T test was used to make comparisons between visual and auditory tasks; one-way repeated-measures analysis of variance (ANOVA) was used for comparing reaction time and accuracy rate in NTP, MTP, and VTP conditions. EEG were analyzed by two-way repeated ANOVA. In visual modality, there were four separate hypotheses: there should be an MA-TP interaction on the visual (i) N1 amplitude, (ii) N2 amplitude, (iii) ITC, and (iv) ERSP. Similarly, there were three separate hypotheses: there should be an MA-TP interaction on the auditory (i) P2 amplitude, (ii) ITC, and (iii) ERSP. For each hypothesis, if the interactive effect did not exist, the main effect of modality attention and timing prediction would be tested, respectively. If the interactive effect existed, we then tested the simple effect of timing prediction under attended and unattended conditions, respectively. For each hypothesis, the Bonferroni method was used for multiple comparison test.
4. Discussion
This study investigated how the MA and precise single-interval TP modulated early sensory responses. We found in the visual modality, after the predicted moment (for example, the second flash), distinct TP conditions affected N1-N2 amplitude. Theta ITC differences were only observed in the Va condition, no difference was seen in Vua conditions. In the auditory modality, distinct TP conditions led to P2 amplitude variations and delta ITC differences only in the Aa condition, no difference was found in the Aua condition. These results suggest that the MA increased the TP-related response differences.
4.1. The MA-TP Mainly Modulated Low-Frequency ITCs in Early Sensory Processing
We first analyzed ERP signatures, including the N1-N2 component in visual modality, P2 component in auditory modality. The differences induced by distinct MA-TP conditions were small (which was measured by the parameter ). A possible reason for this may be that the MA-TP changes were hidden by the low-frequency ERPs, which had very large amplitudes. Therefore, it is necessary to analyze neural activities in distinct frequency bands.
As expected, ITC time-frequency distributions had more obvious distinctions amongst the six MA-TP conditions in both the visual and auditory modalities. In the visual modality, alpha ITC was significantly increased after the first flash by the MA, but was not sensitive to TP modulation. Furthermore, the theta ITC was sensitive to both MA and TP modulations and there was a significant MA-TP interaction. Theta-band activity is traditionally related to specific cognitive controls [
33], such as maintenance of working memory [
34]; sustained attention [
35]; shift of spatial attention [
36]; and prediction errors management in perceptual learning [
15,
37]. A recent study reported that theta ITC is instrumental in shaping temporal predictions in early sensory processing [
23]. For example, in rhythmic predictive timing process, phase-reset aligns stimulus and the ideal phase of delta-theta oscillation, which is correlated with following evoked ERPs [
23]. In a time estimation task with rotating intervals, theta ITC in the frontal area was modulated by error magnitude, possibly indexing the degree of surprise [
15]. The current study found that the theta ITC not only had a close association with the TP, but also reflected the interaction between the MA and TP, which went beyond the traditional role of theta ITC. Furthermore, previous studies proposed that the posterior theta is related to stimulus processing, but unaffected by task demands [
31]. Although this study found specific theta changes could reflect top-down modulations in the posterior brain (O1, Oz, O2). Such observations suggest that the top-down modulation to theta activity can be observed not only in frontal area, but also in the primary visual cortex.
The auditory ITC is primarily located in delta and theta bands, of which the differences were in lower frequency bands than the visual responses. The delta phase synchronization has been widely accepted as a neural mechanism underlying rhythmic timing prediction [
23], recent studies further demonstrated that delta phase works as neural mechanism of single-interval timing prediction as well [
14]. Therefore, in the current single-interval precise TP cognitive process, it is reasonable to observe delta ITC changes among distinct TP conditions. ITC in the 200–400 ms period was concurrently modulated by the MA and TP. To the best of our knowledge, this may be novel neural evidence regarding the effect of delta ITC.
Additionally, the alpha and beta ERSP were affected by MA or TP. After the first stimulus onset, smaller alpha ERSP was found in both the Va and Aa conditions. Alpha oscillation has a key role in many mental tasks [
38], especially the attention process [
17,
39]. Therefore, it is reasonable to observe MA-related alpha changes here. The beta ERSP was modulated by the TP in Va conditions. Many predictive timing studies have reported the suppression in post-stimulus beta power, and data suggest that it may have a key role in time maintenance or prediction error encoding [
40,
41,
42,
43]. This study found much smaller beta ERSP 200–400 ms after the second flash in Va conditions. This suggests that VTP may have a longer period for time maintenance or have larger prediction error encodings.
4.2. Visual MA Promoted the TP-Related Neural Effect in N1 and N2 Period
In visual modality, the NTP, MTP and VTP responses of single-interval precise TP were compared in Va and Vua conditions. In the 100–300 ms (N1) period of the Va condition, MTP had almost the same ERP and ITC performance as the NTP, which was the least influenced by top-down factors. VTP had a much smaller theta ITC than the MTP. In the 300–400 ms (N2) period, the VTP led to much larger negative waveforms than others. Allowing for the TP is a neural implementation of the predictive coding in time domain [
23,
24], N1 and N2 performance may be explained by the ‘sharpen’ and ‘dampen’ effects of predictive coding theory, respectively. According to the predictive coding, there are two types of neurons in a prediction process. One type of neuron encodes the information that is the same as the expected feature, the other type of neuron encodes the information that is different from expected (i.e., prediction error). The ‘sharpen’ effect proposed that the neurons which are not tuned to the expected information that is suppressed, making the expected features more salient and selective. The ‘dampen’ effect suggests that the information which is different from expected feature would result in larger responses, which was used for encoding prediction errors [
44,
45,
46]. Correspondingly, the VTP condition, which represented unexpected information, was suppressed, whereas the MTP condition was not influenced, as it had almost the same ITC as the NTP. Such observations were consistent with the ‘sharpen’ effect. Regarding the N2 variations, VTP resulted in a more negative waveform than the MTP in Va conditions. Negative waveforms have been suggested as a neural representation of prediction error encodings [
47,
48], and the ‘dampen’ effect may explain this negative waveform. Therefore, in Va conditions, we found ‘sharpened’ N1 and ‘dampened’ N2 performance.
These results demonstrate that when MA existed in the visual modality, N1 was sharpened and N2 was dampened. This supports the results of our previous study. This was observed even when the brain was concurrently faced with visual-auditory stimuli. However, such N1-N2 performance disappeared when the MA was not attended in the visual modality, which suggests that the MA promoted the neural effects of the TP in the visual modality.
4.3. Auditory MA Also Promoted the TP-Related Neural Effect
We then investigated auditory neural responses. In Aa condition, the P2 component different in the VTP and MTP conditions. MTP went rapidly trended up 100 ms after the second beep. VTP had a more gradual upward trend, leading to a relatively negative waveform. This negative waveform may also be explained by the ‘dampen’ effect, which reflects the encodings of prediction error. However, in Aua condition, no significant difference was found, suggesting that the auditory MA promoted the TP neural effect.
There were some differences between the TP neural effect in visual and auditory modality. Compared with the visual results, auditory responses had clear TP-related differences in the lower frequency band. Visual responses revealed both the sharpen and dampen effects, but the auditory P2 only showed only the dampen effect. There are two potential explanations for this phenomenon. Early sensory processing may be essentially different when the brain is faced with visual and auditory responses. Alternatively, auditory ERP is primarily located in the frontal-central area, which is also the area neural circuits of predictive timing are found. This could mean that the neural signatures reflecting early auditory processing may be confused with the signals related to the top-down controls for higher cognitive processing. Therefore, it is necessary to investigate how to separate purely auditory response from the prediction-related variations in frontal area, if we want to have a better understanding of the neural effect of TP in auditory modality.