**1. Introduction**

The human eye is considered to be an intuitive way of interpreting human communication and interaction that can be exploited to process information related to the surrounding observation and respond accordingly. Due to several diseases like complete paralysis, multiple sclerosis, locked-in syndrome, muscular dystrophy, arthritis, Parkinson's, and spinal cord injury, the person's physiological abilities are severely restricted from producing controlled movement in any of the limbs or even the head, noting that there are about 132 million disabled people who need a wheelchair, and only 22% of them have access to one [1]. They cannot even use a technically advanced wheelchair. Thus, it is very important to investigate novel eye detection and tracking methods that can enhance human–computer interaction, and improve the living standard of these disable people.

Research for eye tracking techniques has been progressively implemented in many applications, such as driving fatigue-warning systems [2,3], mental health monitoring [4,5], eye-tracking controlled wheelchair [6,7], and other human–computer interface systems. However, there are several constraints such as reliable real-time performance, high accuracy, availability of components, and having a portable and non-intrusive system [8–10]. It is also crucial to achieve higher system robustness against encountered challenges, such as changing light conditions, physical eye appearance, surrounding eye features, and reflections of eye-glasses. Several related works have proposed eye-controlled wheelchair systems; however, these rarely address the constraints of the system's software performance, physical, and surrounding challenges beyond the system, novelty of algorithms, and user's comfort and safety altogether.

Furthermore, the Convolutional Neural Network (CNNs) is a state-of-the-art and powerful tool that enables solving computationally and data-intensive problems. CNN is pioneering in a wide spectrum of applications in the context of object classification, speech recognition, and natural language processing and even wheelchair control [11–13], a more detailed literature review will be discussed in the later sections. However, the paper lacks high accuracy, real-time application, and does not provide the details of such a design, which could be useful for further improvement. All of the above shortcomings are addressed in the current paper.

In this paper, we propose a low-cost and robust real-time eye-controlled wheelchair prototype using novel CNN methods for different surrounding conditions. The proposed system comprises of two subsystems: sensor subsystem and intelligent signal processing, and decision-making and wheelchair control subsystem as illustrated in Figure 1. The sensor subsystem was designed using an eye-tracking device and ultrasound sensors, which were interfaced to the intelligent data processing and decision-making module. The motor control module was already available in the powered wheelchair; only control signals based on the eye-tracker needs to be delivered to the microcontroller of the original wheelchair joystick bypassing the mechanical joystick input. An array of ultrasound sensors was used to stop the wheelchair in case of emergency. The proposed system can steer through a crowded place faster and with fewer mistakes than with current technologies that track eye movements. The safety provision, ensured by arrays of ultrasound sensors, helps the wheelchair to steer through a congested place safely. Accordingly, the proposed system can help most of the disabled people with spinal cord injury. Furthermore, as the proposed system is targeted to use inexpensive hardware and open source software platform, it can even be utilized to modify non-motorized wheelchairs to produce a very economical motorized wheelchair solution for third-world countries.

The rest of this paper is outlined as follows. Section 2 provides background and reviews the relevant literature on state-of-the-art Eye-Tracking methods, existing eye-controlled wheelchair systems, and Convolutional Neural Networks (CNNs) for Eye Tracking. Section 3 is the Methodology section, which discusses the design of the various blocks of the work along with the details of the machine learning algorithm. Section 4 provides the details of the implementation along with the modifications done in the hardware. Section 5 summarizes the results and performance of the implemented system. Finally, the conclusion is stated in Section 6.

**Figure 1.** Block representation of the proposed systems for pupil based intelligent eye-tracking motorized wheelchair control.

#### **2. Background and Related Works**

Previous studies are explored in this paper within three contexts in order to investigate all major related aspects as well as to cover the relevant literature as much as possible. The aspects are state-of-the-art methods for eye tracking, existing eye-controlled wheelchair systems, and other convolutional neural network (CNN)-based works for eye tracking application.

#### *2.1. State-Of-The-Art Eye Tracking Methods*

Generally, there are two different methods investigated widely for eye tracking: Video-based systems and Electrooculography (EOG)-based systems (Figure 2). A video-based system consists mainly of a camera placed at a distance to the user (remote), or attached to the user's head (head-mounted), and a computer for data processing [14,15]. However, the main challenge in remote eye tracking is robust face and eye detection [16,17]. The cameras can be visible-light cameras, referred to as Videooculography (VOG) [14], as examples proposed in [9,18–20], or infrared-illumination cameras such as in [21,22], where infrared (IR) corneal reflection was extracted. Based on near IR illumination, researchers in [17] investigated six state-of-the-art eye detection and tracking algorithms: Ellipse selector (ElSe) [23], Exclusive Curve Selector (ExCuSe) [24], Pupil Labs [25], SET [26], Starburst [27], and Swirski [28], and compared them against each other on large four datasets with an overall 225,569 public labeled eye images of frequently changing sources of noise.

Commercial eye-tracking systems are still very expensive using proprietary tracking algorithms that are not commercially deployed to any powered wheelchair. Note that although pupil tracking is a widely used tool, it is still hard to achieve high-speed tracking with high-quality images, particularly in a binocular system [29].

**Figure 2.** Different eye tracking techniques.

## *2.2. Existing Eye-Controlled Wheelchair Systems*

User's voice [30] and facial expression [31]-based wheelchair control systems were explored by different groups. However, voice control is laborious for the user and sound waves interference or noisy environment distractions can be introduced to the system establishing undesired commands. Facial expression control, on the other hand, is not helpful to all users, especially those who suffer from restrictions in facial expressions due to diseases like facial paralysis. Moreover, classification of facial expressions is more challenging than the eye-controlled system, where only the eye is targeted.

Eye tracking techniques have been previously employed to serve wheelchair systems for the disabled people. An eye-controlled wheelchair prototype was developed in [32] using an infrared (IR) camera fitted with LEDs for automatic illumination adjustment during illumination changes. A flow of image processing techniques was deployed for gaze detection beginning with eye detection, pupil location detection using pupil knowledge, and then converting the pupil location into the user's gaze based on a simple eye model. This was finally converted into a wheelchair command using a threshold angle. Although the algorithm had a fast processing time and robustness against changing circumstances, the used threshold was suitable only to a specific illumination condition enabling automatic illumination adjustment by the camera. This degrades significantly when hit by a strong illumination such as sunlight. In addition, the conceptual key control was quite inconvenient for the user, as the chair stops when the user blinks and when the gaze is deviated from the direction key unless the user looks upwards for free head movement. The user also has to look downwards for forward movement, which is impractical.

In [33], another eye-controlled wheelchair is proposed by processing the images of a head-mounted camera. Gaussian filtering was implemented for removing Gaussian noise from the image. A threshold was then employed for producing a binary image, and erosion followed by dilation was applied for removing white noise. The wheelchair moves in three directions (left, right, and forward) depending on the relative iris position, and starts and stops by blinking for 2 seconds. Although the proposed techniques were simple in implementation, the evaluation parameters of the system were not reported. In addition, the system's performance during pupil transition from one direction to another was not discussed.

Apart from depending on interfaces like joystick control, head control, or sip-puff control [34], the optical-type eye tracking system that controls a powered wheelchair by translating the user's eye movement on the screen positions was used by several researchers, which are reported below. The eye image was divided into nine blocks of three columns and three rows, and depending on the location of the pupil's center, the output of the algorithm was an electrical signal to control the

wheelchair's movement to left, right, and straight directions [34]. The system evaluation parameters like response speed, accuracy, and changing illumination conditions were not reported. In addition, safety parameters of the wheelchair's movement, such as ultrasound or IR sensors for obstacles detection, were not discussed.

Another wheelchair control system has been proposed in [35], where positions of the eye pupil were tracked by employing image processing techniques using a Raspberry-Pi board and a motor drive to steer the chair to left, right, or forward directions. The open computer vision (OpenCV) library was used for image processing functions, where the HAAR cascade algorithm was used for face and eye detection, Canny edge was used for edges detection, and Hough Transform methods were used for circle detection to identify the border of the eye's pupil. The eye pupil's center is located depending on the average of two corner points obtained from a corner detection method. The pupil was tracked by measuring the distance between the average point and the eye circle's center point, where the minimum distance indicated that the pupil was at left, and the maximum indicated the eye had moved to the right. If there was no movement, the eye would be in the middle position, and the chair would be moving forward. Eye blinking was needed to start a directional operation, and the system was activated or deactivated when the eye was closed for 3 seconds. Although the visual outputs of the system were provided, the system was not quantitatively assessed, and no rigid evaluation scheme was shown. Therefore, accuracy, response latency, and speed for instance are unknown.

On the other hand, Electrooculography (EOG) is a camera-independent method for gaze tracking, and generally, it requires lower response time and operating power than the video-based methods [10]. In EOG, electrodes are placed on the skin at different positions around the eyes along with a reference electrode, known as ground electrode, placed on the subject's forehead. The eye is modeled as an electric dipole, where the negative pole is at the retina and the positive pole is at the cornea. When the eyes are in their origin state, the electrodes measure a steady electric potential, but when an eye movement occurs, the dipole's orientation changes making a change in the electric corneal–retinal potential that can be measured.

An EOG-based eye controlled wheelchair with an attached on-board microcontroller was proposed in [36] for disabled people. Acquired biosignals were amplified, noise-filtered, and fed to a microcontroller, where the input values for each of the movement (left, right, up, and down) or stationary conditions were given, and the wheelchair movement in the respective direction was performed according to the corresponding voltage values from the EOG signals. The EOG-based system was cost effective, independent from changing light conditions, and containing lightweight signal processing with reasonable functional feasibility [10]; however, the system was not yet fully developed for commercial use because of the electrodes used for signal acquisition. Moreover, it was also restricted for particular horizontal and vertical movements of the eye; it did not respond effectively for oblique rotation of the eye.

Another EOG-guided wheelchair system was proposed in [37] using the Tangent Bug Algorithm. The microcontroller identified the target point direction and distance by calculating the gaze angle that the user was gazing at. Gaze angle and blinks were measured, and used as inputs for the controlling method. The user only asked to look at the desired destination and blink to give the signal to the controlling unit for starting navigation. After that, the wheelchair calculated the desired target position and distance from the measured gaze angle. Finally, the wheelchair moved towards the destination in a straight line and go around obstacles when detected by sensors. Overall, EOG-based systems are largely exposed to signal noise, drifting, and artifacts that affect EOG signal acquisition. This is due to the interference of noise from residential power lines, electrodes, or circuitry [14]. In addition, with placing electrodes at certain distances around the eye, EOG-based systems are considered to be impractical for everyday use.

There are some recent works on using commercially available sensors for eye controlled powered wheelchair for Amyotrophic lateral sclerosis (ALS) patients [38]. A similar work using commercial brain–computer interface and eye tracker devices was done in [6].

#### *2.3. Convolutional Neural Networks (CNNs) for Eye Tracking*

CNN is a pioneer in a wide spectrum of applications in the context of object classification; however, only a few previous studies have presented CNNs for the specific task of real-time eye gaze classification to control a wheelchair system as in the current paper. Some examples for CNNs employed for eye gaze classification application are discussed below.

The authors of [39] proposed an eye-tracking algorithm that can be embedded in mobiles and tablets. A large training dataset of almost 2.5 M frames was collected via crowdsourcing of over 1450 subjects, and used for training the designed deep end-to-end CNN. Initially, the face image was used as original image, and the images of the eyes were used as inputs to the model. For real-time practical application, dark knowledge was then applied to reduce the computation time and model complexity by learning a smaller network that achieves a similar performance running at 10–15 frames per second (FPS). The model's performance increases significantly when calibration was done, and when there was variability in the collected data with higher number of subjects rather than higher number of images per subject, but not neglecting the importance of the second. Although the model has achieved robustness in eye detection, the FPS rate should not be less than 20–25 FPS for reliable real-time performance; a rate of which the error would increase significantly in this case if was not addressed.

The authors of [40] proposed a real-time framework for classifying eye gaze direction applying CNNs, and using low-cost off the shelf webcams. Initially, the face region was localized using a modified version of the Viola–Jones algorithm. The eye region was then obtained using two different methods: the first one was geometrically from the face bounding box, and the second (which showed better performance) was localizing facial landmarks through a facial landmark detector to find the eye corners and other fiducial points. The eye region was then catered to the classification stage, where classes of eye access cues are predicted and classified using CNN into seven classes (center, upright, up left, right, left, downright, and down left), where three classes (left, right, and center) showed higher accuracy rates. The algorithm's evaluation was performed on two equal data 50% subsets of testing and training, where CNNs are trained for left and right eyes separately, but eye accessing cues (EAC) accuracy was improved when combining information from both eyes (98%). The algorithm has achieved an average rate of 24 FPS.

The authors of [12] implemented an eye-controlled wheelchair using eye movement captured using webcam in front of the user and using Keras deep learning pre-trained VGG-16 model. The authors also discuss the benefits of the project working for people with glasses. Details of its real-time implementation, in terms of FPS, was not mentioned in the paper.

#### **3. Methodology**

Considering the pros-and-cons of the previous works, we propose an eye-controlled wheelchair system running at real-time (30 FPS) using CNNs for eye gaze classification, and made up of several low-cost, lightweight controller subsystems for wheelchair. The system takes the input images from an IR camera attached to a simple headset, providing comfort and convenient movement control to the user with complete free-eye movement. Ultrasonic sensors were mounted to avoid collisions with any obstacles. The system classifies the targeted direction based on a robust CNN algorithm implemented on Intel NUC (a mini-computer with I7-5600U) at 2.4 GHz (2 central processing units (CPUs)) and 8 GB random access memory (RAM) using C++ in Microsoft Visual Studio 2017 (64 bit). Although it is not a graphics processing unit (GPU)-optimized implementation, multiprocessing with a shared memory was attained by deploying the Intel®OpenMP (Open Multi-Processing) application programming interface (API).

A block diagram for the eye-controlled wheelchair is shown in Figure 3. The diagram shows different steps, starting from capturing a new image, until the appropriate command, which is given to the wheelchair.

**Figure 3.** Eye Controlled Wheelchair block diagram.

The wheelchair is primarily controlled through the eye movements that are translated into commands to the chair's motor drives. This is achieved through a gaze detection algorithm implemented on the minicomputer. Yet, a secondary commanding system was added to the design for the means of safety of the user. This safety system is ultrasonic-based that can stop the wheelchair in cases of emergency, suddenly appearing objects, unawareness of the user, etc.

The minicomputer should make the decision of whether the wheelchair should move next mainly according to the gaze direction, or to stop the wheelchair if the safety system is activated. In either case, a command is sent to the control unit, which in turn produces the corresponding command code to send to the wheelchair controller to drive the motor to move the wheelchair in the respective direction.

The Titan x16 power wheelchair was converted to an eye-tracking motorized wheelchair (Figure 4). It is originally equipped with a joystick placed at the right arm of the chair. An eye-tracking controller (image acquisition system and mini-computer) was connected to the electronic control system beside the joystick-based controller. Furthermore, the functionality of the main joystick was not altered; the new control system can be superseded by the original control system.

**Figure 4.** Titan x16 wheelchair.

Based on this overview, the system implementation can be divided into the following steps; design and implementation of an image acquisition mechanism, implementation of the gaze estimation algorithm, design and implementation of the ultrasonic safety system, and finally modifying the joystick controller. Each of these parts is discussed separately as below.
