**1. Introduction**

In recent years, the brain–computer interface (BCI) has become a research hotspot in the field of artificial intelligence, aiming at building communication between the human brain and external devices. Electroencephalography (EEG), reflecting brain activity, is the common signal source of BCI applications. As a non-invasive and low-cost signal, EEG has shown high levels of reliability [1,2]. As a new interactive mode, BCI has been widely used in the fields of medical assistance [3] automobile driving [4], robot control [5], etc.

As a complex BCI application, there is a direct control pathway between the brain and the vehicle in Brain-Controlled Vehicles (BCV). At present, the BCI paradigms adopted by BCV systems are mainly P300 [6], motor imagery (MI) [7], and steady-state visual evoked potential (SSVEP) [8]. P300, which is always evoked by a visual stimulus with poor realtime performance, can only be used for the control of static targets, such as switches, wipers, etc. The real-time performance of MI is also poor, and the degrees of freedom available is limited (generally less than four), which makes it impossible to complete the overall driving task. SSVEP, which is an electrophysiological response to a repetitive visual stimulus, has a high information transfer rate (ITR) and good real-time performance. When subjects focus their attention on a stimulus, the corresponding frequency appears in the representation of the EEG signals recorded mainly in occipital regions [9]. Studies [10,11] have shown that the

**Citation:** Sun, J.; Liu, Y. A Hybrid Asynchronous Brain–Computer Interface Based on SSVEP and Eye-Tracking for Threatening Pedestrian Identification in Driving. *Electronics* **2022**, *11*, 3171. https:// doi.org/10.3390/electronics11193171

Academic Editor: Jose Eugenio Naranjo

Received: 3 September 2022 Accepted: 30 September 2022 Published: 2 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

human cerebral cortex will produce SSVEP characteristic components at the fundamental or multiplicative frequency of the target stimuli when exposed to a fixed-frequency visual stimulus. The target stimuli can be identified by detecting the dominant frequency of SSVEP. Based on its high applicability, simplicity, and high accuracy, BCI adopting SSVEP is conducive to the selection of threat targets in the process of automatic driving. However, it is easy to cause fatigue by long visual flicker.

With the maturity of eye-tracking technology and the continuous improvement of human requirements for interaction comfort, interaction based on eye-tracking has attracted more and more attention. In contrast to EEG, interaction based on eye-tracking is more natural, which can further reduce fatigue. In addition, eye-tracking interaction learning is inexpensive, and most users can operate it without special training [12]. However, there are still some drawbacks to eye-tracking. Some eye movements are not guided by volitional attention. If the system does not distinguish between these eye movements, it is likely to misunderstand human intentions and cause false triggering, which is called the "Midas Touch" problem [13]. In addition, eye-tracking technology is not completely reliable. In addition, some random instability factors can cause system errors. Several eye-movement interactions have been applied to text spelling [14] and robot control [15].

A hybrid BCI system is generally composed of one BCI and another system (which might be another BCI) and can perform better than a conventional BCI [16]. Some studies [17] adopt hybrid systems to recognize characters, combining EEG and EOG. In addition, eye-tracking, which is a popular technology in the field of computer vision, has been gradually adopted to combine with BCI to control games [18], robotic arms [19], and drones [20].

At present, the significant improvement of computer information fusion capability is constantly promoting the development of automatic driving. Autonomous driving is gradually moving from specific scenarios (such as highways, experimental parks) to complex urban traffic. Urban traffic conditions are relatively complex, with many dynamic pedestrian targets and variable trajectories. In such a complex road situation, the environment perception approach based on computer vision technology cannot predict a threatening pedestrian target quickly and accurately. Driver intention is integrated into the vehicle's environment perception through BCI, which can help to improve the comfort and safety of driving.

In this work, a multi-modal hybrid BCI combining SSVEP with eye-tracking is proposed for the selection of potentially threatening pedestrians. The arrows in different directions are randomly superimposed on pedestrian targets. SSVEP is evoked by the stimuli of the corresponding frequency while subjects scan the threatening pedestrian target according to the direction of arrows. I-VT filter is applied to process eye-movement tracks, and canonical correlation analysis (CCA) is adopted to detect EEG signals. The combination of eye-tracking and EEG can not only be used to distinguish between working and idle states, but also shorten target selection time and improve accuracy. The experimental results of six subjects show that the proposed hybrid asynchronous BCI system of eye-tracking and SSVEP achieves better performance compared with a single SSVEP-BCI, with an average selection time of 1.33 s, an average selection accuracy of 95.83%, and an average information transfer rate of 67.50 bits/min.

The remainder of this paper is presented as follows: Section 2 introduces a hybrid BCI system, target detection and tracking, graphical stimuli interface, participants, signal acquisition and preprocessing. Section 3 presents the process of experiments, evaluation metrics, and the results of experiments. Section 4 is the discussion of the hybrid BCI system, and Section 5 summarizes the main work of this paper.

#### **2. Materials and Methods**

Figure 1 shows the overall framework for threatening pedestrian identification. Yolov5 is introduced to detect pedestrian targets, and DeepSORT is used to track pedestrians. SSVEP stimuli of different frequencies are superimposed on the obtained pedestrian coordinates. Subjects scan pedestrians according to the direction of superimposed arrowhead stimuli. The three decisions effectively reduce the false positives and improve the reliability of threatening pedestrian identification.

**Figure 1.** Hybrid asynchronous BCI system for dynamic pedestrian detection.

### *2.1. System Description*

The purpose of this study is to evaluate the performance of a multi-modal BCI that combines eye-tracking and SSVEP for pedestrian tracking and selection. First, the ZED2 camera collects real-time video of driving foreground road conditions and performs multitarget detection and tracking. The coordinates and IDs of the pedestrians are transmitted to the remote computer through the LAN. Second, flashing arrows of different stimuli appear on the targets after receiving data, and follow their movements. The arrows point to a random distribution. After calibrating the eye tracker, participants gaze at the

stimulation interface. The eye tracker and the EEG acquisition instrument begin to collect the corresponding signals at the same time. The flow of online signal processing is shown in Figure 1. The sampling frequency of the eye tracker is 60 Hz. In the processing of eye-movement data, I-VT filter is introduced to process visual trajectories. Decision I: When the confidence of the trajectory change over 60 consecutive sampling points exceeds 70%, the result {*r*1, *r*2 ... } of eye-tracking is output. In the processing of EEG signals, the canonical correlation analysis (CCA) algorithm performs feature extraction on 1000 ms of EEG data and outputs the maximum correlation coefficient (*ρ*). Decision II: Output the result {*s*1} of EEG selection when *ρ* exceeds the pre-set threshold. Decision III: Output selection target when {*r*1, *r*2 ... } ∩ {*s*1} = ∅. Otherwise, no result output is considered an idle state. Slide the window forward 200 ms to acquire eye-tracking data and EEG data for the next 1 s and process until the target result is output.

### *2.2. Target Detection and Tracking*

Detecting and tracking pedestrians in the driving environment can reduce the cognitive load of drivers to a certain extent, and assist the vehicle intelligence system in making decisions, which plays a very important role in improving the safety of intelligent vehicles, and is a hot research topic in intelligent driving and computer vision.

In recent years, with the development of big data and the improvement of computer performance, deep learning has been widely applied in the field of computer vision and has achieved good performance. As a representative target-detection algorithm at the present stage, YOLO algorithms have excellent performance in both detection speed and accuracy, which can achieve end-to-end training. YOLO takes the whole image as the input of the network, and directly outputs the coordinates and IDs of the objects after inference. Compared with other algorithms, yolov5s [21] has higher detection accuracy, faster detection speed, and lower consumption of computation, which can better meet the real-time requirements and be easier to apply in the actual systems. However, detecting the position of pedestrians is not enough. Each object must be tracked before being chosen.

Pedestrian detection determines the position and ID of the object in a particular frame, and pedestrian tracking locks the target in consecutive frames. Most application scenarios involve the tracking of multi-targets. DeepSORT [22] extracts the appearance characteristics of targets, and adopts recursive Kalman filtering and frame-by-frame correlation [23] to match the trajectory of multi-objects, which can effectively reduce the number of target IDs transitions. In this study, we use yolov5 and DeepSORT to process the driving foreground video, which realizes multi-object detection and tracking accurately and quickly, and obtains the position coordinates and IDs of pedestrians in real time.

#### *2.3. Graphical Stimuli Interface*

According to the object positions and IDs obtained by the object-detection and tracking module, flicker stimuli of different frequencies are superimposed on each pedestrian in Figure 2, and participants can achieve their selection by staring at stimuli. The length of arrows that flash alternately in black and white is 60 pixels. The frequency list is set to meet a variable number of pedestrians. Studies [24] have shown that a frequency band of 8~15 Hz can induce a relatively strong SSVEP response. Moreover, each frequency should satisfy that there is no overlap between the fundamental frequency and the frequency doubling. The interval between frequencies is set as large as possible to ensure the distinguishability of signals. Considering the above factors, the frequency list is set to 6.10 Hz, 8.18 Hz, 15.87 Hz, 12.85 Hz, 10.50 Hz, 8.97 Hz, 13.78 Hz, 9.98 Hz, 11.23 Hz, 7.08 Hz, 14.99 Hz, and 11.88 Hz. The frequencies of the superimposed stimuli are sequentially selected from the frequency list according to the coding order of each pedestrian ID. During the experiment, participants find the threatening target and follow his movement until the flicker of the target stack stops and turn yellow.

(**a**) (**b**)

**Figure 2.** Stimuli presentation interface of hybrid BCI system on trial 6 of a block. (**a**) Arrows in different directions are randomly superimposed on pedestrians according to the frequency list corresponding to the ID order; (**b**) Subjects select the threatening object, and the arrow stops flashing and turns yellow.
