1. Introduction
Perceptual decision-making is the process of choosing a course of action based on available sensory information [
1]. It is a fundamental mechanism enabling our interaction with the environment. Neuroscientists usually analyze perceptual decision-making to understand how the brain structures interact to distinguish and classify information from the senses [
2,
3,
4]. They also consider how external and internal factors affect our perceptual decisions. The internal factors include the human condition, e.g., motivation, fatigue, and alertness [
5]. Furthermore, they rely on personal experience and memory [
6]. Thus, subjects better respond to the sensory stimuli if they saw them before and increase responses to the repeatedly presented stimuli [
7]. External factors reflect the quality of the sensory evidence. If the decision-relevant features or stimuli have well-pronounced distinctive features, e.g., color, they quickly grab our attention and speed up the final decision (bottom-up attention) [
8]. When the sensory information is ambiguous and contains many irrelevant features, we voluntarily shift our attention to the most relevant details (top-down attention) [
9].
In their early works, Philiastides and colleagues used ERP to characterize neuronal mechanisms of perceptual decision-making. They identified ERP whose latency depended on the strength of evidence. Thus, they reported that the evidence accumulation process starts at ate early processing stages, and its duration varies as a function of the evidence strength affecting decision speed [
10]. In 2007, the same authors used fMRI (functional magnetic resonance imaging) and reported that perceptual decision-making includes multiple spatially and temporally dissociated stages. These include early visual perception (early component), task/decision difficulty (difficulty component), and post-sensory/decision-related events (late component) [
11]. In 2009, Ratcliff et al. analyzed single trials during a perceptual decision-making task. They found that the late but not the early ERP component correlated with the single-trial behavioral performance. The authors concluded that the information used to drive the decision process—that is, reflected in the late EEG (electroencephalogram) component—is not the same as that which produces the early EEG response. Thus, they distinguished between neural responses to early perceptual encoding and, later, to post-sensory processing that provides the decision-relevant evidence [
12]. Note that the authors reported trial-to-trial variability in the decision performance even for identical images. It may reflect the reliance of the evidence accumulation on human internal factors that varied through experiment. One possible mechanism modulating the human state and affecting perceptual decisions is attention. With this in mind, Nunes and colleagues evaluated attention in single trials. Their results suggested that fluctuations in attention to visual stimuli account for some of the trial-to-trial variability in the brain’s speed of evidence accumulation on each trial [
13]. Recent work by Herding et al. tested the effect of another human factor on the evidence accumulation process. They reported that subjective assessment of the evidence plays a crucial role in the accumulation process. According to their results, sensory input appeared to be first modulated by the subjectively perceived features and later by its absolute value (i.e., the absolute strength of evidence) [
14]. Together, these works suggest that effects of attention and subjectiveness enhance when the evidence quality is poor, e.g., noisy or ambiguous. When the sensory information is not sufficient to reliably disambiguate the nature of a stimulus, it requires more attentional demands and efforts to acquire decision-related features. At the behavioral level, the ambiguity of sensory information influences many oculomotor variables, including fixation frequency and duration, the frequency, peak velocity, and amplitude of saccades, and phasic pupil diameter [
15]. At the neural activity level, increasing ambiguity induces high
-band power across the fronto-parietal areas, reflecting attentional modulation and the involvement of the internal processes e.g., memory and subjective experience [
6,
14].
Studying these attentional components also has a practical meaning. Attention is a fundamental cognitive function that develops in childhood and declines in old age. Thus, monitoring this vital brain function enables the brain-computer interface (BCI) development for the prediction and prevention of attentional deficits and disorders [
16,
17].
Attentional mechanisms are actively studied using within-subjects design and require large groups of participants. In contrast, BCI studies use within-trial methods due to the inter-subject variability [
18]. A common way of training BCI systems is subjecting a pre-trained algorithm to the additional calibration session, optimizing its performance for a particular subject. On the one hand, a pre-trained algorithm will be effective if it utilizes the between-subject features. On the other hand, calibration requires subject-specific features to learn. Thus, revealing potential electrophysiological biomarkers for BCI may require mixed within-trial and within-subject analysis.
A bulk of literature suggests that attention modulates neuronal activity in the
- and
-frequency bands (see [
4] for the literature review). On one hand, activity in these bands reflects the degree of attention in general regardless of the task [
19]. Thus, measuring the prestimulus
- and
-band power enables predicting the performance of the ongoing stimuli processing. Different studies report that low
- and high
-band power reflect high attention therefore, predicting high performance and vice-versa [
20]. On the other hand, some components of attention influence perceptual decisions when the sensory information is presented. Thus, stimulus onset involuntary draws our bottom-up attention. When the stimulus is ambiguous, its processing mobilizes top-down attentional components e.g., selective attention to facilitate the evidence accumulation process [
21]. Based on these studies, we suppose that stimulus-related attentional components also modulate the
- and
- band activity on the interval following stimulus onset depending on the attentional demands, e.g., the strength of evidence. Following [
13], we suppose that measuring power in these bands enables evaluating attentional demands on the single trials providing potential for BCI applications.
To test this hypothesis, we considered the
- and
-band wavelet power of an EEG signal registered during the perceptual decision-making task. This task required participants to respond to the particular orientation of the ambiguous bistable stimuli [
22,
23,
24]. According to the previous work, we supposed that high ambiguity makes subjects rely on the top-down rather than bottom-up mechanisms [
25]. Thus, we compared the stimuli with low and high ambiguity to reveal biomarkers of the bottom-up processes. First, we used the within-trial design to quantify the perceptual process for each subject. Then, we contrasted obtained quantities between low and high ambiguities following within-subject statistics. We observed that the
-band power decreased after the stimulus onset over most sensors. In contrast, the
-band power increased after the stimulus onset. The high ambiguity induced high frontal
-band power at 0.3–0.6 s post-stimulus onset. We suppose that ambiguous stimuli may induce the frontal
-band power evidencing internal top-down processes that control evidence accumulation to disambiguate stimulus, interpret it, and make an appropriate decision.
Since the observed -band biomarkers are significant in within-trial analysis, they may be pronounced in single EEG trials and may serve as control commands for BCI. Moreover, their significance in the group of subjects suggests that subjects share a common perceptual mechanism. Thus, revealed biomarkers may increase the effectiveness of BCI calibration for a particular user and its resistance to the between-subject variability.
2. Materials and Methods
2.1. Participants
Unpaid volunteers participated in the experiment. All subjects were conditionally healthy with normal acuity (thus, wearing no glasses or contact lenses) and with no history of eye diseases or pathologies. The experimental group consisted of 20 subjects (16 males and 4 females) at the age of 20–36. The volunteers were informed about the goals and methods of the experiment and possible inconveniences related to the experimental procedure. They were able to ask any questions about the experiment and received satisfactory answers. All subjects filled and signed a written consent form before their participation. The experimental study was designed and carried out in accordance with the Declaration of Helsinki. All works were approved by the local research ethics committee of the Innopolis University.
2.2. Visual Stimuli
We used Necker cube images as the visual stimuli [
26,
27]. The Necker cube is a drawing of a 3D-cube with transparent faces and visible edges projected on a 2D surface. Due to the specific arrangement of the edges, an observer with normal perception interprets the Necker cube as a 3D-object. The main feature of the Necker cube as a visual stimulus is the ambiguity of its perception: The cube can be interpreted as “left-oriented” or “right-oriented”. This bistability of perception can be further amplified and controlled by changing the contrast between some of the internal edges. In our work, we varied the contrast of three edges in the lower-left middle part of the image and three similar edges in the upper-right middle part (see
Figure 1A).
First, we introduced parameter
defining the inner edges contrast. It reflects the intensity of three lower-left lines, while
corresponds to the intensity of three upper-right lines. The parameter
I can be defined as
, where
y is the brightness level of three lower-left lines using the 8-bit gray-scale palette. The value of
y varies from 0 (black) to 255 (white). The Necker cube images with the different contrast parameters are shown in
Figure 1A. Then we introduced stimulus ambiguity,
a in the following way. We supposed that for
stimulus is unambiguously left-oriented, while for
its features barely reflect the orientation. Varying
I from 0 to 0.5, we increase the ambiguity of the left-oriented cube making it totally ambiguous at
. Thus setting
ambiguity for
and
ambiguity for
, we suggest that stimuli with
correspond to
ambiguity. Similarly, for the right-oriented stimuli, we obtain that cubes with
also correspond to
ambiguity. Finally, to exclude effects of the stimulus orientation (including the effects associated with the formation of the motor response), we combined left- and right-oriented stimuli for each ambiguity.
2.3. Experimental Protocol
During the experiment the subject was sitting in the chair with a monitor screen before their eyes and a joystick with two buttons, left and right, in their hands. The experiment lasted for ∼40 min and included an active session and short (∼120 s) background activity recording before and after the active session. During the active session, visual stimuli were demonstrated to the subject. Each visual stimulus consisted in demonstration of one Necker cube. The cube was demonstrated on the computer screen with a white background and red dot in the center to attract the subject’s attention to the cube. We used a 24”-LCD monitor with a spatial resolution of pixels and a 60 Hz refresh rate for stimulus presentation. Eye-to-monitor distance was 70–80 cm with visual angle of ∼0.25 rad. The Necker cube size on the screen was cm.
Each of the 8 cubes from the set were presented 50 times, which lead to total 400 presentations. The structure of each presentation is illustrated by
Figure 1B. Each
i-th presentation consisted of the cube’s presentation (
) and pause(
) before the next,
-th, trial.
The subjects were instructed to look at the cube and interpret it as either left- or right-oriented and then to press the corresponding button on the joystick—the left button with the left thumb and the right button with the right thumb.
In case of consecutive presentations previously demonstrated bistable images could affect the perception of subsequent images, known as the “memory effect” [
28]. For example, an observation of several left-oriented cubes in a row could stabilize the subject’s perception so the next right-oriented cube can be falsely interpreted as left-oriented. To reduce the influence of this effect we added a variation in the experimental design: The type of the Necker cube (the control parameter
a) was chosen randomly in each trial and the length of each cube’s presentation varied in range
s. Additionally, during the pause between consecutive cubes an abstract image was demonstrated (see
Figure 1B) to draw the observer’s attention and to make a perception of each Necker cube independent from the previous ones. Length of pause varied in range
s.
2.4. EEG Acquisition and Preprocessing
To record EEG signals we used an electroencephalograph “Encephalan-EEG-19/26” (Medicom MTD company, Taganrog, Russian Federation) with a two-button input device (joystick). EEG data were acquired using a monopolar registration method with the following set of electrodes: 31 recording EEG electrodes placed according to an extended “10–10” scheme, two reference electrodes (A1 and A2) placed on earlobes, and a ground electrode (N) placed above the forehead. EEG signals were recorded with the cup adhesive Ag/AgCl electrodes. During the experiments electrode, impedances were monitored to ensure the high quality of recorded EEG data. Common impedance values varied in a range of 1–25 k.
To remove low- and high-frequency noises as well as 50-Hz interference from the power grid, raw EEG data were filtered by the band-pass filter (cut-off frequencies 1 and 70 Hz) and 50-Hz notch filter. Recorded EEG signals were divided into a number of trials. Each trial was associated with a single Necker cube demonstration and included 0.5 s before the cube’s presentation and 1.5 s after the presentation, thus, forming a 2-s interval.
To reduce the presence of physiological artifacts on EEG signals, we performed preprocessing. To remove eye-movement and cardiac rhythm artifacts, we implemented a method based on empirical mode decomposition [
29]. Artifacts were considered removed if their amplitude reduced after preprocessing to at least 30% of the initial value. However, some artifacts could not be properly removed, so trials with such artifacts were rejected. This led to reducing the overall number of trials for each subject from 400 to ∼350. Additionally, muscle artifacts are known to be very difficult to remove on the preprocessing stage, so we asked participants to take a comfortable pose during the experiment to minimize spine, neck, and shoulder muscle tension.
To analyze the spatial properties of multichannel EEG data, we divided EEG channels on an extended “10–10” scheme into 7 regions (Fp, F, FC, C, CP, P, and O) as shown with the shaded bands in
Figure 1C. These regions are chosen to include channels that belong to one longitudinal brain axis and both hemispheres. Lateral effects were not considered since each group of visual stimuli (LA and HA) incorporates left- and right-oriented Necker cubes in equal ratio.
2.5. Behavioral Estimates
After the EEG preprocessing procedure, some trials were excluded due to high-amplitude artifacts. To keep the number of EEG trials constant for each ambiguity, we consider 320 trials out of the initial 400 including 40 trials for each ambiguity. For each stimulus, we estimated a behavioral response by measuring the response time (RT) corresponding to time passed from stimulus presentation to button pressing (
Figure 1C).
Figure 2A demonstrates the typical RT distributions obtained for each ambiguity for one single subject. One can see that the median RT shifts to higher values when the ambiguity increases. Simultaneously, an increased median RT accompanies increased dispersion leading to the overlapping of the distributions. To minimize overlap for each ambiguity, we consider 20 trials for which RT lies between the 25th and 75th of the distribution.
Figure 2B demonstrates that the median RT for the group of participants (group mean ± SE) increases with increasing stimulus ambiguity (*
,
t-test with Bonferroni correction).
In this work, we reduce the number of experimental conditions considering
as the low-ambiguity (LA) stimuli and
as high-ambiguity (HA) stimuli. Each group included 80 stimuli (20 per each ambiguity, 40 per each orientation). The reaction time (group mean) was
s for LA and
s for HA stimuli (
Figure 2C). This simplification is based on our previous works on the Necker cube images [
7,
25]. It enables revealing effects of ambiguity and provides a sufficient number of trials to minimize additional effects of orientation, a bias of the presentation moment, and the previously presented stimulus [
7]. Simultaneously, according to Brunye et al., very different perceptual and cognitive processes may underlie decision making at low, medium, and high uncertainty levels [
15]. Thus, our future studies will test this effect in Necker cubes with low, moderate, and high ambiguity.
Many studies used ambiguous visual stimuli, including Necker cube to analyze spontaneous perceptual reversals. They involve the presentation of totally ambiguous stimulus whose interpretation spontaneously alternates under endogenous or exogenous factors [
30,
31]. Our paradigm excluded completely ambiguous stimuli. Manipulating with the internal edge contrast, we vary ambiguity in the range where each stimulus had a particular interpretation defined by its morphology. The subjects were able to report a correct stimulus interpretation above chance. Their mean error rate (
Figure 2D) was
for HA stimuli and
for LA stimuli (
, Wilcoxon test).
2.6. Signal Analysis
We analyzed EEG spectral power in the
- and
-frequency bands, using continuous wavelet transform (CWT) [
32]. First, the wavelet power spectrum
was calculated for each EEG channel
in the frequency range
Hz including both
and
ranges. Here,
is the complex-valued wavelet coefficients calculated as:
where
is the EEG chanel number (
is the total number of chanels used for the analysis) and “*” defines the complex conjugation. The mother wavelet function
is the Morlet wavelet which is defined as:
where
is the wavelet parameter. According to [
33], the Morlet wavelet has advantages in the analysis of EEG recordings that includes many rhythms and oscillatory patterns. The comparison of EEG analysis studies with different mother wavelets [
32] shows that the Morlet wavelet provides a clear wavelet surface and better overall resolution in the time-frequency domain.
For
- and
-frequency bands, the wavelet amplitudes
and
were calculated as:
where
Hz,
Hz. In order to neglect the changes of the overall EEG signal amplitude, the values (
3) were normalized to the EEG spectral amplitude in the 1–30 Hz frequency band.
The time-series of the wavelet power (
3) was calculated for the whole time of the experimental session and then was split into the time segments
s and
s, before and after the
i-th visual stimulus presentation (see
Figure 3):
For each stimulus ambiguity, the difference between
and
for the
n-th EEG sensor was analyzed statistically via the paired samples
t-test based on 20 trials.
Figure 3 demonstrates the estimation of the change between
and
on the basis of an illustrative example. The dependences
are shown for 20 trials as the median and 25th–75th percentiles (
Figure 3A,C). In
Figure 3B,D the corresponding values
and
are shown as medians (bars), 25–75 percentiles (box) and outlines (whiskers).
To address the multiple comparison problem (MCP), we used the permutation tests in conjunction with cluster-based correction. A cluster was significant when its p-value was below 0.025, corresponding to a false alarm rate of 0.05 in a two-tailed test. The number of permutations was 2000. The sensors that passed a MCP correction were collected to the positive and negative clusters. Finally, we estimated the size of the positive and negative clusters as the number of EEG sensors in them. The values and were compared between the conditions via a repeated-measures ANOVA.
4. Discussion
We analyzed EEG sensor-level wavelet power in the
and
-bands during processing ambiguous stimuli. We observed that the
-band power decreased after the stimulus onset over most sensors. In the frontal regions F and Fp, the
-band power fell before the behavioral response (interval
in
Figure 5D). Moreover, at
, it becomes more pronounced for HA stimuli than for LA stimuli (
Figure 5C). In contrast, the
-band power increased after the stimulus onset. We found that HA stimuli induced high frontal
-band power at
and high parietal
-band power at
when compared to LA stimuli (
Figure 7C).
Some studies combined fMRI and EEG analysis and reported that the parietal and frontal area’s activation accompanied the decreased
-band EEG power reflecting attentional control (see [
36] for a literature review). Our results demonstrated that
-band power varied between HA and LA stimuli only at the latter processing stage,
. Thus, we suppose that HA stimuli require more attentional control to make a decision. Our results showed that frontal
-band power decreased only at
, supporting its potential role in the decision-making process [
37].
The
-band activity in the frontal and parietal regions during the visual stimulus processing is associated with attentional control. Stimuli processing relies on the top-down, and bottom-up attentional mechanisms originating in the frontal and parietal cortices [
38]. The bottom-up mechanism reflects involuntary attention induced by the stimulus onset (the stimulus presentation automatically makes the subject draw attention to it). The top-down mechanism shifts attention based on the knowledge about the current task. Research in monkeys [
38] and humans [
39] reported that neuronal activity in parietal areas at an earlier stage reflects bottom-up attention, whereases the neural activity in the prefrontal area reflects top-down attention.
In our paradigm, the stimulus onset may initiate bottom-up attention. Moreover, the morphology of LA and HA stimuli is almost the same, and we hypothesize that the bottom-up processes barely depend on ambiguity. In contrast, we suggest that the top-down mechanisms mainly contribute to goal-directed stimulus processing. On the one hand, the subject needs to draw attention to the particular properties (edges) of the Necker cube to define its interpretation. The latter requires selective attention, which according to the physiological studies, is supported by the
- band neuronal activity [
40]. Selective attention is a top-down mechanism that characterizes the ability to focus cognitive resources on information relevant to our goals [
21]. According to [
41], enhanced neuronal activity in the frontal area serves as a neurophysiological marker of selective visual attention. On the other hand, when ambiguity is high, subjects increase their reliance on the top-down processes, such as expectations and memory, rather than on stimulus features [
25]. Finally, influences of the top-down processes accompany the enhanced
-band activity [
6,
42].
According to our results, HA stimuli induces high frontal
-band power for 0.3–0.6 s post-stimulus onset. Ref. [
34] evidenced that the sensory-processing stage might last for 0.2–0.4 s depending on the strength of the evidence. These two intervals overlap, allowing us to suppose that the frontal beta-band power controls the evidence accumulation process. Our previous work on ambiguous processing reported increasing
-band power for 0.35–0.42 s post-stimulus onset. Together with [
43], we supposed that it might reflect the decision-making process. The current experimental design barely allows one to distinct the evidence accumulation and decision-making stages. At the same time, we suppose that frontal
-band power reflects top-down mechanisms that control these both stages. We speculate that ambiguous stimuli may induce the frontal
-band power evidencing internal top-down processes that control evidence accumulation to disambiguate stimulus, interpret it, and make an appropriate decision.