*2.4. Participants*

Six healthy subjects (22–25 years; four men, two women) with an average age of 23.2 years were recruited from the campus and participated in the study. No one was lefthanded. In addition, each subject reported no history of any psychiatric deficits. Following the Declaration of Helsinki, all subjects signed a letter of commitment after receiving a detailed description of the procedure.

#### *2.5. Signal Acquisition and Processing*

As shown in Figure 3, the eye-movement data were collected by Tobii Pro Nano at a frequency of 60 Hz and an operating distance of 80 cm. Subjects were required to calibrate their eye trackers before participating in experiments. An LCD screen (LEGION Y27gq-25, 1920 × 1080 pixels) was used to present stimuli with a refresh rate of 240 Hz.

**Figure 3.** Placement of data acquisition equipment.

A 64-channel extended international 10/20 system was used to record the EEG signals in this experiment. Figure 4 shows the placement of 9 electrodes for EEG collection, which were placed in Pz, PO7, PO3, POz, PO4, PO8, O1, Oz and O2. The reference electrode was placed behind the right ear, and the ground electrode was placed on the forehead. Before data acquisition with BrainAmp DC amplifier (Brain Products GmbH, Germany), the impedance of each electrode was reduced to less than 10 kΩ. The sampling frequency was 200 Hz and was filtered by a 4–35 Hz bandpass and notch at 50 Hz. BCI2000 [25] served as the control platform to collect EEG signals, the PyGame [26], a Python expansion package, presented the stimuli interface, and MATLAB was responsible for real-time signal processing. The display interface and the control platform were connected through the TCP/IP protocol.

**Figure 4.** Placement of electrodes. The blue circles are the placements of the sampling electrode. The reference electrode is placed on the green circle behind the right ear, and the ground electrode is placed on the forehead.

Canonical correlation analysis (CCA) [27] was applied to extract features of preprocessed EEG signals, which fuses multi-channel data and identifies the target by calculating the correlation coefficient between multi-channel EEG signal and stimuli frequency. The target was the option corresponding to the maximum SSVEP response score. Periodic stimuli were represented as square-wave periodic signals that could be decomposed into Fourier harmonic series:

$$Y\_f(t) = \begin{bmatrix} \sin(2\pi ft) \\ \cos(2\pi ft) \\ \sin(2\pi \ast 2ft) \\ \cos(2\pi \ast 2ft) \\ \cdots \\ \sin(2\pi \ast Nf) \\ \cos(2\pi \ast Nf) \end{bmatrix}, t = \frac{1}{S}, \frac{2}{S}, \dots, \frac{L}{S} \tag{1}$$

where *N* is the number of harmonics, *t* is the current time, *L* is the number of sampling points of the original signals, and *S* is the sampling rate of EEG. CCA is a multivariate statistical analysis method, which calculates the maximum correlation coefficient (*ρ*) of the linear combination of variables (*x* = *XTWx*, *y* = *YTWy*) in two data sets (*X*, *Y*), to reflect the correlation of the two groups of signals. The calculation formula for *ρ* is as follows:

$$\rho(\mathbf{x}, y) = \max\_{\omega\_{\mathbf{x}, \omega\_{\mathbf{y}}}} \frac{E\left[\mathbf{x}^T y\right]}{\sqrt{E[\mathbf{x}^T \mathbf{x}] E[y^T y]}} = \max\_{\omega\_{\mathbf{x}, \omega\_{\mathbf{y}}}} \frac{E\left[\boldsymbol{\omega}\_{\mathbf{x}}^T \mathbf{X} Y^T \boldsymbol{\omega}\right]\_{\mathbf{y}}}{\sqrt{E[\boldsymbol{\omega}\_{\mathbf{x}}^T \mathbf{X} X^T \boldsymbol{\omega}\_{\mathbf{x}}] E\left[\boldsymbol{\omega}\_{\mathbf{y}}^T Y Y^T \boldsymbol{\omega}\_{\mathbf{y}}\right]}},\tag{2}$$

The velocity threshold recognition (I-VT) filter is a popular speed-based eye-tracking method [28], which realizes the classification of eye tracks by analyzing the speed of eye movement. As shown in Formula (3), the eye-movement velocity can be obtained by the ratio of the distance between the two sampling points to the corresponding sampling time. Speed is commonly expressed in visual degrees per second (◦/s). When the speed is higher than the set threshold, the sample associated with the speed is determined to be a saccade, and below the threshold is fixation.

$$v\_x = \frac{x\_2 - x\_1}{t\_2 - t\_1}, v\_y = \frac{y\_2 - y\_1}{t\_2 - t\_1}.\tag{3}$$

where *vx* represents the velocity in the *x* direction, *vy* represents the velocity in the *y* direction, and (*x*1, *y*1) is the coordinate of the eyeball's position at the moment of *t*1. Similarly, (*x*2, *y*2) is the coordinate of the eyeball position at *t*<sup>2</sup> moment.

#### **3. Results**

#### *3.1. Evaluation Metrics*

The performance of hybrid BCI selection is evaluated by accuracy and information transfer rate (ITR). In addition, ITR is calculated as follows (bits per minute):

$$ITR = \frac{\left(\log\_2 N + P \log\_2 P + (1 - P) \log\_2 \left(\frac{1 - P}{N - 1}\right)\right) \* 60}{T},\tag{4}$$

$$T = t\_s + t\_{b\prime} \tag{5}$$

where *N* represents the total number of targets, *P* is the target selection accuracy, and *T* represents the time of target selection, including the stimuli flicker time of the target (*ts*) and flicker interval time (*tb*). It can be seen that the ITR is not only related to the classification accuracy, but also related to the number of selected targets.

#### *3.2. Performance of the Offline Experiment*

The threshold is set for the output of the SSVEP to distinguish between idle and working states in online experiments. If the maximum correlation coefficient is higher than the threshold, it is considered to be the working state; otherwise, it is considered to be the idle state. The goal is that the results are not output when the subjects are not staring at the target. In one trial of the offline experiment, participants tend to select threatening pedestrians by staring at flickering stimuli blocks according to cues. Each trial consists of an interval time of 2 s and a stimuli time of 4 s. Each participant participates in the experiment with 2 blocks, and each block contains 10 trials. After a block, participants are given a 5-min break.

Since the SSVEP responses of the participants are individually different, a specific threshold is set for each participant. Ten correct choices of each subject in the offline experiment are randomly selected to calculate the SSVEP response score, and the minimum value is taken as the threshold. As shown in Figure 5, the SSVEP response scores of S5 in 10 correct selection tasks are 0.6387, 0.5647, 0.7696, 0.7065, 0.5630, 0.6896, 0.7323, 0.5981, 0.4541, and 0.6721, respectively. In addition, the minimum response score (0.45) is set as the threshold for Subject 5. The SSVEP response scores of the 6 subjects in the random correct

selection tasks for ten times are shown in Table 1, and the statistical thresholds are 0.56, 0.62, 0.51, 0.47, 0.45, and 0.39, respectively.

**Figure 5.** The SSVEP response scores of S5 for selecting correctly for 10 trials. Multi-colored lines represent different stimuli frequencies. The minimum response score of 10 trials is 0.4541, which is set as the threshold for Subject 5.

**Table 1.** SSVEP response threshold of 6 subjects.


#### *3.3. Performance of Asynchronous Online Experiment*

Thresholds obtained from offline experiments are used for online experiments. In the online experiment, the subjects choose threatening pedestrians according to their subjective cognition instead of prompts. There is no time limit for the subjects to complete the experiment. The system continuously outputs control commands to realize the relative realtime selection of threatening pedestrians. The other settings are the same as those for offline experiments. EEG collection and eye-tracking acquisition are performed simultaneously. At the beginning of the experiment, the subjects saccade the stimuli according to the direction of arrows until the color of the stimuli flicker turns yellow and lasts for 0.5 s. If there is a result output, the subjects proceed to the next trial.

Figure 6 shows the change in sight of the subject scanning arrows in the hybrid BCI system. In the one-second time window of the selected target, one of the coordinates of the saccade points remains basically unchanged, and the absolute value of the other coordinate change is about equal to the length of the arrows (60 pixels).

SSVEP-BCI is introduced to verify the effectiveness and availability of the hybrid BCI structure. In Table 2, several evaluation metrics such as the accuracy, target selection time, and ITR are shown to evaluate the performance of two models in which six subjects select dynamic threatening pedestrians. Hybrid BCI achieves a higher selection accuracy (95.83%), shorter selection time (1.33 s), and higher ITR (67.5 bits/min). Compared to SSVEP-BCI, the selection time is shortened by 0.69 s, the accuracy is improved by 5%, and the ITR is increased by 25.2 bits/min. Subject 2 performs perfectly in both SSVEP-BCI and hybrid BCI, with a selection accuracy of 100%. It is worth mentioning that Subject 2 selects threatening pedestrians within 1 s, and the ITR reaches 92.88 bits/min in hybrid BCI. Subject 5 performs

poorly in SSVEP-BCI with an accuracy of 80% and an ITR of 25.71 bits/min. By combining eye tracks with EEG data, the accuracy of target selection is significantly improved to 90% and the selection time is shortened from 2.3 s to 1.6 s. These results show that in the hybrid SSVEP architecture, the selection time and accuracy of subjects selecting dynamic threatening pedestrians meet the requirements of online experimental tasks. The advantage of hybrid BCI lies in the addition of eye tracks, which effectively avoids the wrong results caused by inattention. At the same time, the multi-modal fusion of eye movements and EEG enables subjects to make choices in a shorter time. The single eye-tracking system is not stable, and the phenomenon of "Midas Touch" often occurs. In actual traffic scenarios, the wrong choice of threatening targets will lead to traffic accidents caused by the inaccurate operation of self-driving vehicles. The stability and robustness of the hybrid BCI can ensure that the drivers can make the judgment and choose the threatening targets quickly and accurately in assisted driving.

**Figure 6.** The change of eye movements during saccade in the direction of arrows. (**a**) Top-to-bottom saccade; (**b**) Bottom-to-top saccade; (**c**) Right-to-left scanning; (**d**) Left-to-right saccade. The blue line represents the change in the *x* direction and the orange line represents the change in the *y* direction. The distance between the two gray lines is 60 pixels.


**Table 2.** Results of asynchronous online selection of threatening pedestrians by SSVEP-BCI and hybrid BCI.

#### **4. Discussion**

In complex road environments, pedestrians have a great impact on the safety of vehicle driving. The threat to driving safety is usually only a few pedestrians with special locations or trajectories. However, they significantly interfere with the driving route, and even directly determine whether the vehicle can pass safely. Therefore, marking potential threats from many pedestrian targets and feedbacking the location information of these pedestrians to the computer can help vehicles make safer decisions in subsequent control.

This paper proposes a hybrid BCI paradigm for threatening pedestrian selection based on object detection and tracking. The object-detection and tracking method based on deep learning obtains the coordinates and IDs of pedestrian targets, providing initial information for hybrid BCI. This study takes the traffic scenes as the background and combines computer vision with hybrid BCI, aiming at the judgment of dynamic threatening pedestrians. Participants need to judge and select pedestrians who pose a threat to driving safety according to their own subjective experience. Six subjects participated in offline experiments and asynchronous online experiments. The thresholds determined by offline experiments are used to distinguish between the working and idle states of the online experiments. In asynchronous online experiments, the average selection time is 1.33 s, average accuracy reaches 95.83%, and an average ITR reaches 67.5 bits/min. These results show that hybrid BCI has great application potential in dynamic threatening pedestrian selection.

#### **5. Conclusions**

This paper designs a hybrid BCI that combines eye-tracking and EEG for threatening pedestrian recognition in the driving environment. The experimental results of six subjects show that hybrid BCI achieves better performance compared with a single SSVEP-BCI, with an average selection time of 1.33 s, an average selection accuracy of 95.83%, and an average information transfer rate (ITR) of 67.50 bits/min. The three proposed decisions filter out the results with low confidence, which effectively improves the selection accuracy of hybrid BCI. The driver's understanding of the environment is fed back to the machine, and human– machine collaborative driving is realized to a certain extent. Compared with methods that rely solely on computer vision, this method has more advanced environmental semantic understanding ability and is safer and more reliable in driving. The system has been verified online in several specific experimental scenarios, but its applicability needs to be further enhanced in scenarios where multiple threatening pedestrians exist or threatening pedestrians suddenly appear. In future work, we will develop more rapid and accurate signal-processing methods to analyze SSVEP, and combine Bayesian probability to decide on threatening pedestrians in different scenarios.

**Author Contributions:** Conceptualization, Y.L.; methodology, J.S.; software, J.S.; validation, J.S.; writing—original draft preparation, J.S.; writing—review and editing, J.S. and Y.L.; supervision, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript. **Funding:** This research was supported by National Natural Science Foundation of China, grant number U19A2083.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
