**Automatic Distortion Rectification of Wide-Angle Images Using Outlier Refinement for Streamlining Vision Tasks**

#### **Vijay Kakani 1, Hakil Kim 1,\*, Jongseo Lee 2, Choonwoo Ryu <sup>2</sup> and Mahendar Kumbham <sup>3</sup>**


Received: 16 January 2020; Accepted: 4 February 2020; Published: 7 February 2020

**Abstract:** The study proposes an outlier refinement methodology for automatic distortion rectification of wide-angle and fish-eye lens camera models in the context of streamlining vision-based tasks. The line-members sets are estimated in a scene through accumulation of line candidates emerging from the same edge source. An iterative optimization with an outlier refinement scheme was applied to the loss value, to simultaneously remove the extremely curved outliers from the line-members set and update the robust line members as well as estimating the best-fit distortion parameters with lowest possible loss. The proposed algorithm was able to rectify the distortions of wide-angle and fish-eye cameras even in extreme conditions such as heavy illumination changes and severe lens distortions. Experiments were conducted using various evaluation metrics both at the pixel-level (image quality, edge stretching effects, pixel-point error) as well as higher-level use-cases (object detection, height estimation) with respect to real and synthetic data from publicly available, privately acquired sources. The performance evaluations of the proposed algorithm have been investigated using an ablation study on various datasets in correspondence to the significance analysis of the refinement scheme and loss function. Several quantitative and qualitative comparisons were carried out on the proposed approach against various self-calibration approaches.

**Keywords:** automatic distortion rectification; wide-angle lens; fish-eye lens; advanced driver-assistance system (ADAS); video-surveillance; vision tasks

#### **1. Introduction**

The usage of wide-angle camera lenses in vision-based applications demands greater precision in terms of image projection geometry such as distortion compensation and maintaining pixel consistency. There appears to be a plethora of challenges involved in the context of employing wide-angle lens models for applications such as advanced driver-assistance system (ADAS) and video surveillance.

#### *1.1. Challenges*

The image projections from the wide-angle and fish-eye lens are generally affected by the radial distortions and thereby create a scenario of severe pixel inconsistencies along the edges which depend on the properties of the lens such as horizontal FOV, curvature, etc. [1,2]. This indeed influences the performance of the lens employed in various metric-based tasks such as height estimation and single metrology, and even in geometrical tasks such as camera localization, stereo-vision, etc. This analogy can be observed in Figure 1, where a self-calibrated camera frame is used for streamlining various vision-based tasks in scenarios of diverse vision applications.

**Figure 1.** Effects of larger FOV distortions on high-end vision-based tasks. (**a**) Metric-based tasks and (**b**) feature/appearance-based tasks.

The flexibility of handling diverse lens models is another major concern in the formulation of a robust self-calibration technique. The presence of various larger FOV lens models such as fish-eye (165◦ < FOV < 190◦), wide-angle (120◦ < FOV < 150◦), and super wide-angle (160◦ < FOV < 180◦) impose severe challenges in determination of distortion parameters for each class and compensating the specific lens models automatically. The variations in lens models and real-time scenarios are depicted in Figure 2.

The fish-eye and wide-angle lens models are manufactured with a basic notion of the coverage area that the lens can capture. In accordance with that, the lens usually possesses severe distortions due to which the scene aspects on the image plane tend to deviate from the factual representation of a 3D real-world plane. Under such circumstances, the calibration is very important to retrieve the distortion-rectified scene while simultaneously preserving the automatic sense of adaptability without the involvement of any chessboards or objects. Self-calibration totally depends on the scene aspects such as lines, curves, points at infinity, edge candidates, special elements, etc. Several methodologies have been proposed to get past these challenges to formulate robust self-calibration techniques [3–5] but they still get severely caught off with inevitable real-world scenarios such as variation in illuminations, shadow castings, different timings in the day and night, and scenes with limited scene attributes to rely on.

**Figure 2.** Real-time challenging scenarios: (**a**) fish-eye model (165◦ < FOV < 190◦); (**b**) wide-angle (120◦ < FOV < 150◦); (**c**) super wide-angle (160◦ < FOV < 180◦).

#### *1.2. Purpose of Study*

The primary purpose of this work is to develop a flexible automatic distortion rectification methodology that can refine the outliers simultaneously, optimizing the best-fit parameters with minimum error possible. As an underlying investigation, the study has been incorporated by streamlining the distortion-rectified frames for acquiring better performance on tasks such as object detection and fixed monocamera-based height estimation. The two main aspects that this work clearly studies are how the proposed system can be robust towards various real-time scenarios with diverse challenges, and how the streamlining of vision tasks can be done with respect to the distortion-rectified frames. The main contributions are as follows: (1) Proposing an iterative optimization with refinement of the outliers from the pool of robust line-member set; (2) formulating plumbline angular cumulative loss over refined line-member set and investigating the significance through an ablation approach; (3) validating the proposed system with respect to quantitative (accuracy, processing time, practical significance) and qualitative (adaptability, practical significance) aspects on diverse real/synthetic, public and private datasets with respect to ADAS and video-surveillance applications. The scope of this study is targeting the high-end vision-based applications such as intelligent transportation, video surveillance, and advanced driver-assistant systems (ADAS).

The paper is organized as follows. Section 2 extensively discusses the previous works and their characteristics regarding the automatic distortion rectification. Section 3 elaborates on the proposed outlier refinement enabling the automatic distortion rectification process. Section 4 is dedicated to investigating the significance of proposed aspects with respect to various datasets and metrics. Section 5 illustrates the experimental design and evaluation metrics employed in the study. Section 6 reports the outcomes and corresponding discussions based on employed data and evaluations. Finally, Section 7 concludes the paper with a summary.

#### **2. Literature Review**

#### *2.1. Automatic Distortion Rectification*

In the literature, there are a plethora of studies that were designed to deal with radial distortion rectification via autocalibration of the camera systems [6–8]. Most of these simply followed the approach of employing the calibration object such as a checkerboard or circular patterns [9]. In practice, these camera systems tend to suffer from the variations in the weather conditions with respect to overheating or cold [10,11]. In situations as such, the calibration of the camera must be done to adjust the intrinsic parameters. Automatic distortion rectification, being a more practical approach, can come in handy in such circumstances. Especially, lens models such as fish-eye and wide-angle camera systems demand a better algorithm that can rectify the radial distortions.

Few works like Zhang et al. [12] and Barreto et al. [13] proposed their version of approaches in solving this problem through autocalibration of the visual sensor using scene attributes. However, their approaches demand a specific set of the environment such as precise structured lines (presence of at least three orthogonal straight lines). Brown et al. [14] was the first study to coin the term plumbline, specifying the usage of scene geometry for retrieving the camera's intrinsic parameters. Additionally, this study specified the radial distortions using the polynomial lens distortion model. Later, the one-parameter rational model was proposed by [15,16] which were extensively used in the automatic camera calibration. In literature, the variants of plumbline approaches were used, among which employing of vanishing points to calibrate the camera yielded better results [17]. Yet, their approach was not able to handle wide-angle lens models with heavy distortions.

#### *2.2. Previous Works*

The automatic distortion rectification problem can typically be resolved using two main methodologies such as traditional and deep-learning approaches. In the traditional approach, various geometrical aspects are exploited to estimate the distortion parameters of the lens. On the other hand, deep-learning approaches estimate the distortion parameters through learned radial distortion values and image samples. Though there are various algorithms in the above two portfolios, there exist some limitations which make the algorithm venerable towards various real-world conditions.

In the past decade, few remarkable studies were proposed in the context of automatic rectification of wide-angle and fish-eye lens models. A few studies were formulated to explore the arithmetic approach on the line curvatures to estimate the distortions [4]. A few others exploited the scene lines to estimate the parameters with intense iterative optimizations [3,5] within parametric Hough spaces, and a few employed the semiautomatic algebraic approach of tracing line segments over the curved lines for estimating distortions [18]. The semiautomatic study proposed by Alvarez et al. [18] heavily requires user-interaction in the line tracing approach, which is not appropriate for real-time usage. Although Bukhari et al. [4] was able to rectify the distortions with reliable performance for nonsevere distortion cases, it suffers from longer processing times and deformed outputs in the case of heavy distortions. The Hough parametric space approaches from Aleman et al. [3] and Santana et al. [5] were able to rectify the wide-angle and fish-eye lens models with reasonable performance. However, the heavy dependency on hyper-parameters and disability to handle samples acquired using low-quality camera sensors under low-light conditions make it less reliable for ADAS and video surveillance applications. Although, the algorithm proposed by Kakani et al. [19,20] was able to rectify multiple lens models which include a wide-angle and fish-eye lens. Yet, the schematic includes model-specific empirical *γ*-residual rectification factor for heavy fish-eye distortions with FOV > 165◦. The design of this factor requires a certain amount of prior knowledge about the lens models from an optical perspective.

CNN deep-learning approaches such as Bogdan et al. [21] and Lopez et al. [22] cannot rectify the distortion samples with illumination changes, and certain higher distortion ranges cannot be handled with consistency. Additionally, deep GANs such as Liao, Kang et al. [23] are used for generating corresponding rectified samples for a distorted image. Yet, the trained distortions are confined to certain ranges such as <−10<sup>−</sup>5. Another GAN-based architecture proposed by Park et al. [24] was able to rectify the synthetic distorted samples as well as real sensor data within a specified distortion range. However, in the context of heavy distortion ranges, the model fails to rectify the samples. The major concern regarding these learning approaches is that the training examples must cover almost all the sensor types and ranges of the distortions in order to develop a model that can best rectify all the possible sensor units. In reality, this is not quite possible with the currently available advancements. This raises an issue of using only a certain sensor type and distortion range for a specific application such that one can attain the best performance using learning-based methodologies on that sensor unit. This must be done with each and every sensor unit in correspondence to the use-case that has to be deployed on the rectified frames. Due to this ambiguity, the present proposed work ruled out the learned method in performance evaluations. The details of the summarized state-of-the-art automatic distortion rectification techniques are stated in Table 1.


**Table 1.** Insights of traditional and learning-based automatic distortion rectification methods.

This study focuses mostly on the drawbacks encountered in our previous work [19] and proposes a solution to handle heavy distortions without having to use any model-specific residual factors. Especially, this work introduces the outlier refinement scheme in conjunction with the plumbline angular loss function that makes the whole system more robust to outliers and thereby able to handle heavy distortions FOV > 190◦. The significance of the novel aspects—such as loss aggregation over line-member sets—of the outlier refinement scheme was extensively tested through ablation study,

and the corresponding results are discussed in Section 4. The major difference between our previous work [19] and the current study is as follows:


#### **3. Outlier-Refinement-Enabled Distortion Estimation**

#### *3.1. Lens Distortion Parameter Modeling*

In this study, the distortion estimation and optimization procedures were followed as per the odd polynomial lens-distortion model with up to two distortion coefficients *D*1, *D*<sup>2</sup> as per the design in our previous work [19], which maps rectified pixel coordinates to the distorted pixel coordinates, as shown in Equation (1) below.

$$\begin{cases} r\_{dist} = r\_{undist} + D\_1 \cdot r\_{undist}^3 + D\_2 \cdot r\_{undist'}^5\\ r\_{dist} = r\_{undist} \left( 1 + D\_1 \cdot r\_{undist}^2 + D\_2 \cdot r\_{undist}^4 \right) \end{cases} \tag{1}$$

where *<sup>r</sup>*(*radius*) = (*a* − *a*0) <sup>2</sup> <sup>+</sup> (*<sup>b</sup>* <sup>−</sup> *<sup>b</sup>*0) 2 , (*a*, *b*) is a point coordinate, (*a*0, *b*0) is the image center, and *D*1, *D*2, ··· *DN* are distortion coefficients.

#### *3.2. Plumbline Angular Loss Estimation*

The plumbline angular loss is estimated on the robust line-member set, the line members are extracted using parameter-free edge drawing algorithm [25]. Line members emerging from the same edge sources are further filtered based on length threshold heuristics. The line-member set was formed with the elements as line members emerging from same edge. There exists several line-member sets which are to be considered to calculate the cumulative loss on a whole.

The image *Iwxhx*<sup>3</sup> represents an image and *n*¯ denotes the number of line-member sets within the image *I*. The collection of all line-member sets as a matrix *Ln*¯×4, where each line-member set consists of several line members. Each line member is a 4-tuple (*x*0, *y*0, *x*1, *y*1), where (*x*0, *y*0) represent the starting point and (*x*0, *y*0) represent the ending points of the line member. The grouped line members are collected as

$$\begin{aligned} l\_{ki} &= \begin{bmatrix} (\mathbf{x}\_{01\cdot 1} y\_{01\cdot 1}, \mathbf{x}\_{11\cdot 1} y\_{11}) \\ (\mathbf{x}\_{02\cdot 2} y\_{02\cdot 1}, \mathbf{x}\_{12\cdot 2} y\_{12}) \\ \vdots \\ (\mathbf{x}\_{0k}, y\_{0k}, \mathbf{x}\_{1k}, y\_{1k}) \end{bmatrix}, & \mathbf{x} &= \mathbf{y}, \\\\ L\_{\mathbb{R}\times\mathbf{4}} &= \begin{bmatrix} l\_{k1\times 4} \\ l\_{k2\times 4} \\ \vdots \\ \vdots \\ \vdots \\ l\_{kn\times 4} \end{bmatrix}, & \mathbf{x} &= \mathbf{y}. \end{aligned} \tag{3}$$

where *k* ∈ 1, 2, ... .*n*¯, for instance, *lki* = *l*<sup>23</sup> indicates that this is the second line-member set and it consists of three line members.

The angular plumbline error *α* can be estimated through the function *A* (*l*1, *l*2) which computes the angular difference between the line members in a set as shown below:

$$A\left(l\_1, l\_2\right) = \begin{cases} \Delta a\\ 360 - \Delta a & \text{if } \Delta a > 180 \end{cases} \tag{4}$$

$$
\Delta \mathfrak{a} = |\mathfrak{a}\_1 - \mathfrak{a}\_2| \,. \tag{5}
$$

$$\mathfrak{a}\_{i} = \arctan2\left(y\_{1} - y\_{0\prime}\mathbf{x}\_{1} - \mathbf{x}\_{0}\right). \tag{6}$$

The angular plumbline error *α* with respect to all *N* line members is estimated, and an individual line member errors *LE* for the *i*th element of the line-member set is calculated by applying cross-entropy of the angular plumbline error:

$$\mathrm{LE}\_{i \times \mathbb{R}} = -\frac{1}{N} \begin{pmatrix} |\Delta a\_{i, i+1}| \log |\Delta a\_{i, i+1}| \, |\Delta a\_{i, i+2}| \log |\Delta a\_{i, i+2}| \, \dots \\ |\Delta a\_{i, i+k}| \log |\Delta a\_{i, i+k}| \, |\Delta a\_{i+1, i+2}| \log |\Delta a\_{i+1, i+2}| \\ |\Delta a\_{i+1, i+3}| \log |\Delta a\_{i+1, i+3}| \, \dots \\ |\Delta a\_{i+1, i+k-1}| \log |\Delta a\_{i+1, i+k-1}| \, \dots \\ |\Delta a\_{i+k-2, i+k-1+\theta}| \log |\Delta a\_{i+k-2, i+k-1+\theta}| \end{pmatrix} \,, \tag{7}$$

where *k* = |*LEi*×*n*¯ |, i.e, the length of *i*th row

$$\text{SE} = \left(\frac{\sum\_{1}^{|LE\_1|} \, LE\_1}{|LE\_1|}, \frac{\sum\_{1}^{|LE\_2|} \, LE\_2}{|LE\_2|}, \dots, \frac{\sum\_{1}^{|LE\_{\overline{n}}|} \, LE\_{\overline{n}}}{|LE\_{\overline{n}}|}\right),\tag{8}$$

where *SE* (line-member set errors) is a row vector of length *n*¯, which represents the average of the *i*th line-member set.

The mean cumulative loss *SMCE* which computes the mean errors of a line-member set given by *Ln*¯×<sup>4</sup> as follows:

$$\text{SMCCE}\left(L\_{\mathbb{N}\times4}\right) = \frac{\sum\_{1}^{|SE|} SE}{|SE|},\tag{9}$$

where |*SE*| is the cardinal set of all line-member set errors.

This overall error loss must be minimized such that we can accomplish two things in one-shot:


#### *3.3. Refinement Optimization Scheme*

The Levenberg–Marquardt (LM) optimization, which was employed in the current study, estimates the best fit parameters with simultaneous outlier elimination, where the camera lens parameters are initial with default initial guess:

$$\begin{array}{c} \text{Parameters} = \begin{bmatrix} f\_x \\ f\_y \\ c\_x \\ c\_y \\ c\_1 \\ D\_1 \\ D\_2 \end{bmatrix} . \end{array} \tag{10} $$

*Sensors* **2020**, *20*, 894

Let *fx*, *fy*, *cx*, *cy*, *D*1, *D*<sup>2</sup> represent the focal length of *x* (in pixels), the focal length of *y* (in pixels), the *x* position of the camera center, the *y* position of the camera center, and the distortion parameters, respectively.

$$\mathbf{r}\_{\mathbb{R}\times1} = \begin{bmatrix} r\_{1\times1} \\ r\_{2\times1} \\ \vdots \\ \vdots \\ \vdots \\ r\_{\mathbb{R}\times1} \end{bmatrix}; \quad \mathbf{x}\_{\mathbb{R}\times1} = \begin{bmatrix} \mathbf{x}\_{1\times1} \\ \mathbf{x}\_{2\times1} \\ \vdots \\ \mathbf{x}\_{\mathbb{R}\times1} \end{bmatrix}; \quad \mathbf{y}\_{\mathbb{R}\times1} = \begin{bmatrix} y\_{1\times1} \\ y\_{2\times1} \\ \vdots \\ y\_{n\times1} \end{bmatrix}, \tag{11}$$

where *r <sup>n</sup>*¯×<sup>1</sup> is the column vector of radial distortions for each line member within the line-member set given by *Ln*¯×4, and *r <sup>i</sup>*×<sup>1</sup> <sup>=</sup> *x*2 *<sup>i</sup>*×<sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *<sup>i</sup>*×<sup>1</sup> in which *<sup>x</sup> <sup>i</sup>*×1, *<sup>y</sup> <sup>i</sup>*×<sup>1</sup> are the corresponding *<sup>x</sup>* and *<sup>y</sup>* coordinates of the *i*th radial distortion—*i* ∈ {1, 2, . . . *n*¯}.

$$\mathbf{x}\_{i} = \frac{\mathbf{x}\_{i}^{\prime}}{\left(1 + D\_{1}r\_{i}^{2} + D\_{2}r\_{i}^{4}\right)^{\prime}}, \quad y\_{i} = \frac{y\_{i}^{\prime}}{\left(1 + D\_{1}r\_{i}^{2} + D\_{2}r\_{i}^{4}\right)^{\prime}} \tag{12}$$

where undistorted *xi* and *yi* points are mapped using the distorted parameters *D*<sup>1</sup> and *D*<sup>2</sup> with respect to *ri*, resulting in distorted points *x <sup>i</sup>* and *y i* . In addition, *Pts*<sup>1</sup> = *Ln*¯×[1,2] *Ln*¯×[3,4] ! 2*n*¯×2 1 represent the matrix of undistorted start and end points of the line-member set.

$$\begin{aligned} L\_{\mathfrak{n}\times[1,2]} ^{1} = \mathbf{x}\_{i} \times f\_{\mathfrak{x}} + c\mathbf{x} = \begin{bmatrix} \begin{pmatrix} \mathfrak{x}\_{0,1}, y\_{0,1} \\ \mathfrak{x}\_{0,2}, y\_{0,2} \end{pmatrix} \\ \vdots \\ \mathfrak{x}\_{0,n}, y\_{0,n} \end{bmatrix} ^{1} \end{aligned} \tag{13}$$

$$\left[L\_{\mathbb{R}\times\left[3,4\right]}\right]^{1} = y\_{i}\times f\_{\mathcal{Y}} + c\mathcal{Y} = \begin{bmatrix} \left(\mathbf{x}\_{1,1}, y\_{1,1}\right) \\ \left(\mathbf{x}\_{1,2}, y\_{1,2}\right) \\ \vdots \\ \left(\mathbf{x}\_{1,n}, y\_{1,n}\right) \end{bmatrix}^{1}.\tag{14}$$

Let *Ln*¯×<sup>4</sup> represent a matrix for the set of line members of an image, where *lki* is the matrix formed by all the line members. The overall mean cumulative line-member set error (*SMCE*) in the image is estimated using the initial parameters and line members *Ln*¯×<sup>4</sup> 0:

$$\begin{aligned} \left[ \begin{array}{c} l\_{k1\times4} \\ l\_{k2\times4} \\ \vdots \\ l\_{k\ell\times4} \end{array} \right]; \quad l\_{ki} = \begin{bmatrix} (\mathbf{x}\_{0,1\prime}, y\_{0,1\prime}, \mathbf{x}\_{1,1}, y\_{1,1}) \\ (\mathbf{x}\_{0,2\prime}, y\_{0,2\prime}, \mathbf{x}\_{1,2\prime}, y\_{1,2}) \\ \vdots \\ (\mathbf{x}\_{0,k\prime}, y\_{0,k\prime}, \mathbf{x}\_{1,k\prime}, y\_{1,k}) \end{bmatrix}. \end{aligned} \tag{15}$$

The parameters are used to refine the outliers by eliminating unwanted set of line members with respect to minimum error and then an iterative process of elimination takes place to see if the error is getting minimized further by eliminating unwanted outliers *i*th line member and forming new line-member set *l*(*k*−1)*<sup>i</sup>* for distortion estimation as shown below:

$$\text{Err}\_{L\_{\mathbb{R}\times4}}{}^0 = \text{SMCE}\left(D\_1, D\_2, I, L\_{\mathbb{R}\times4}{}^0\right);\tag{16}$$

$$\mathbf{d}\_{(k-1)i} = \begin{bmatrix} (\mathbf{x}\_{0,1\prime} y\_{0,1\prime} \mathbf{x}\_{1,1\prime} y\_{1,1}) \\ (\mathbf{x}\_{0,2\prime} y\_{0,2\prime} \mathbf{x}\_{1,2\prime} y\_{1,2}) \\ \vdots \\ (\mathbf{x}\_{0,k-1\prime} y\_{0,k-1\prime} \mathbf{x}\_{1,k-1\prime} y\_{1,k-1}) \end{bmatrix} \tag{17}$$

Similarly, *Ln*¯,(*j*−1)×<sup>4</sup> is the submatrix formed by removing the outliers and retaining *j* − 1 line members from the *n*th line-member set; thereby, the error *ErrLn*¯,(*j*−1)×<sup>4</sup> corresponding to the outlier refinement can be estimated simultaneously such that the sequence of submatrices *Ln*¯,(1)×4, *Ln*¯,(2)×4, ... *Ln*¯,(*j*−1)×<sup>4</sup> and their corresponding line-member set errors Err*Ln*¯,(1)×<sup>4</sup> , Err*Ln*¯,(2)×<sup>4</sup> , . . . Err*Ln*¯,(*j*−1)×<sup>4</sup> are formed:

$$L\_{\mathfrak{A},(j-1)\times 4} = \begin{bmatrix} l\_{k1\times 4} \\ l\_{k2\times 4} \\ \vdots \\ l\_{k3\times 4} \\ \vdots \\ \vdots \\ l\_{k(j-1)\times 4} \end{bmatrix}',\tag{18}$$

$$\text{Err}\_{L\mathfrak{n},(j-1)\times 4} = SMCE\left(D\_1, D\_2, I, L\_{\mathfrak{n},(j-1)\times 4}\right).$$

The final line-member sets containing refined line members with minimum error are elected for the distortion parameter estimation. The election process of robust line-member set (ELS) is depicted in the Figure 3.

$$\begin{aligned} ELS = \begin{cases} \; \text{if } \min\left(Err\_{L\mathbb{A}\times\mathbb{A}}, \text{Err}\_{L\mathbb{A}, (j-1)\times\mathbb{A}}\right) = Err\_{L\mathbb{A}, (j-1)\times\mathbb{A}} & L\_{\mathbb{A}, (j-1)\times\mathbb{A}} \\\\ \text{Otherwise} & L\_{\mathbb{A}\times\mathbb{A}} \end{cases} \end{aligned} \tag{19}$$

where *j* ∈ {1, 2, . . . *i*}; *i* ∈ {1, 2, . . . *n*¯}.

**Figure 3.** Outlier refinement scheme based on line-member set aggregations.

#### **4. Ablation Study**

#### *Practical Significance Analysis*

The ablation study serves as a practical significance analysis investigating the novel aspects introduced in this work. Additionally, this study differentiates the method using straightness loss constraint on individual line candidates [19] from the proposed method of cumulative set aggregation loss and refinement scheme. This investigation will assist in understanding the real significance of using these aspects in the proposed system and their influence on the output performance:


The three Figures 4–6 illustrated below depict the quantitative and qualitative significance analysis of the proposed novel elements over various public and private datasets with respect to diverse metrics. The clear influence of the proposed elements such as cumulative set aggregation loss and refinement scheme can be observed in the qualitative analysis depicted in Figure 6. The following acronyms are used: B—Baseline; B + RO—Baseline + Refined optimization scheme; B + SC—Baseline + Set cumulative aggregation; B + RO + SC—Baseline + Refined optimization scheme + Set cumulative aggregation. Various combinations were used in the ablation study to mainly understand the practical significance of the proposed elements. The clear explanation of the combinations is as follows:


**Figure 4.** Quantitative: Significance of proposed cumulative set aggregation loss and refinement scheme with respect to image quality, edge stretching, and processing time on distorted KITTI dataset.

**Figure 5.** Quantitative: Significance of proposed cumulative set aggregation loss and refinement scheme with respect to pixel-point error and processing time on distortion center benchmark dataset.

**Figure 6.** Qualitative: Significance of proposed cumulative set aggregation loss and refinement scheme with respect to severe distortions.

#### **5. Experiments and Evaluations**

#### *5.1. Pixel Quality and Consistency Experiments*

The experiments were carried out to examine the pixel quality and consistency of the rectified image and low-level image-quality metrics were considered accordingly. The synthetic distorted KITTI dataset using [26,27] was employed to evaluate the rectified image with respect to GT (distortion-free KITTI sample). The accuracy of the distortion-rectified image can be evaluated in two different ways such as image quality metrics, peak signal-to-noise ratio (PSNR); structural similarity index (SSIM); spectral, spatial, and sharpness metric (*S*3); local phase coherence sharpness index (LPC-SI); and pixel consistency metrics such as pixel-point error (PPE). The subsections below illustrate the individual significance of each evaluation method present in both strategies.

#### 5.1.1. Image Quality Evaluations

The image quality of the distortion-rectified image must be preserved, and it can be validated using comparative measures with respect to original distortion-free samples in terms of similarly and noise aspects.


#### 5.1.2. Pixel-Point Error Evaluation

The pixel-point error was calculated by estimating the distance between the ground truth pixel point location and the refined image pixel point. For this experiment, the synthetic distortion center benchmark dataset [4] was utilized as shown in the Figure 7 below:

**Figure 7.** Pixel-point error calculation on distortion center synthetic dataset [4].

#### *5.2. High-Level Metrics: ADAS and Video-Surveillance Experiments*

This subsection elaborates on the essential usage of wide-angle and fish-eye lens models with proposed automatic distortion rectification techniques to yield better performance in the ADAS, video-surveillance-based vision tasks. In the ADAS context, the state-of-the-art (SOTA) pretrained models were employed to evaluate the proposed algorithm in terms of object detection on real and synthetic data. In the video-surveillance tasks, the height estimation using fixed camera intrinsics from [31] was employed to evaluate the proposed algorithm. The datasets used in this study were collected at Computer Vision Laboratory, Inha University, among which some are publicly available [31] and few were stated in our previous works [19].

#### 5.2.1. Datasets Used

The datasets utilized in the experiments were of three types:


#### 5.2.2. Object Detection Using Pretrained Models

Various pretrained models were employed, such as YOLOv3 (pretrained on PASCAL VOC) and SSD (pretrained on MS COCO), as object detectors. These experiments were carried out on diverse lens models such as fish-eye (190◦) and wide-angle (120◦). The qualitative comparisons were made between various automatic rectification algorithms with respect to detection along the edges. Additionally, for the quantitative measure, the distorted KITTI data samples are rectified using various algorithms alongside the proposed method, and the detection mean average precision (mAP) scores were recorded. The major intent of investigating the proposed algorithm against various algorithms on SOTA pretrained object detectors is to validate the improved performance on rectified frames in streamlining (deploying) object detection tasks. In normal raw samples, the detection accuracy drops due to the distortions along the edges and using SOTA object detectors on those frames would not help, as shown in Figure 8:

**Figure 8.** Performance of pretrained state-of-the-art (SOTA) models on different larger FOV raw samples: (**a**) Pretrained YOLOv3 on 190◦ fish-eye sample (car undetected along the edge); (**b**) Pretrained SSD on 120◦ wide-angle sample (person undetected along the edge).

#### 5.2.3. Height Estimation on Fixed Monocamera Sensor

The height estimation is considered a metric-based task, as the pixel distribution in the image plays a vital role in deciding the metric information. For a fixed camera setup, the experiments were designed on the basis of estimating the intrinsic using walking humans metrology, proposed by Li, Shengzhe et al. [31], employing the Computer Vision Lab's video-surveillance dataset collected at Inha University.

During this study, we modified the previous height estimation method [31] such that the rectified pixel points are retrieved and used to initiate the pixel locations of the walking human (top and bottom) for intrinsic-based height estimation. The modified phenomenon is illustrated in Figure 9, where the objects are not deformed as they are in the raw distortion samples. The camera sensors used in evaluating the algorithm under this portfolio are wide-angle lens cameras. They are employed to capture all the data, as specified in [31], and the subjects used in that study were used in our study as well to maintain the consistency in the ground truth. The height estimation errors in cm is used as a metric for better comparison.

**Figure 9.** Retrieval of distortion-rectified reference pixel correspondences for better accuracy: (**a**) Top and bottom reference points in distorted case; (**b**) Corresponding top and bottom rectified reference points in rectified case.

#### **6. Results and Discussions**

#### *6.1. Pixel Quality and Consistency*

The consistency in the pixel information, especially regarding the stretching issue, was clearly investigated, as shown in Figure 10 below. The stretching along the edges caused the inconsistency in the case of traditional OpenCV and Santana et al. [5]. Due to the refinement of outliers, the stretching was significantly reduced in the proposed method.

**Figure 10.** Qualitative analysis: pixel quality and consistency.

#### 6.1.1. Quantitative Analysis: Image Quality

The proposed method was able to rectify the random synthetic distortions, and the average image quality scores in terms of similar metrics and spectral context seem to be high compared to that of the manual and automatic methods. The corresponding results are illustrated in Table 2.


**Table 2.** Qualitative analysis: image quality metrics on synthetic distorted dataset.

#### 6.1.2. Quantitative Analysis: Pixel-Point Error

The pixel-point error calculations were made using difference of distances from two pixel points in the rectified image distortion center and given GT distortion center on difference samples. The average pixel-point errors were calculated against [5,18] algorithms and the results are stated in Table 3 below. The average pixel-point error in the case of Alvarez et al. [18] and Santana et al. [5] appears to be higher for the examples that have higher variations in the distortion center. The filtering of line-member set for robust line candidate selection influences the proposed method to attain lower average pixel-point error. For the better understanding of quantitative analysis, the average pixel-point errors of all the three methods are indicated in bold.


**Table 3.** Quantitative analysis: pixel-point error metrics on synthetic distorted dataset.

#### *6.2. High-Level Metrics: ADAS Use-Case*

The data samples utilized in the experiments were mainly ADAS-centered and are heavily distorted in terms of field-of-view and real-time challenges. The performance analysis was carried out both qualitatively and quantitatively against various automatic distortion rectification methodologies.

#### 6.2.1. Qualitative Performance Analysis

The performance comparisons were carried out between original samples, Aleman et al. [3], Santana et al. [5], and the proposed method with respect to two pretrained models on 3 different cameras. The results were depicted in Figures 11–15 to illustrate the case-by-case scenario robustness of object detection. The objects such as person, car, truck, motorbike, and bus were successfully detected in the case of rectified samples using the proposed method. Although the same pretrained detector was employed on all the SOTA-rectified frames, the proposed method frame yields best performance.

**Figure 11.** Pretrained YOLOv3 object detection on various rectified 190◦ fish-eye frames: car detected along the edge in the proposed rectified algorithm.

**Figure 12.** Pretrained YOLOv3 object detection on various rectified 190◦ fish-eye frames: van detected along the edge in the proposed rectified algorithm.

**Figure 13.** Pretrained YOLOv3 object detection on various rectified 190◦ fish-eye frames: motorbike detected along the edge in the proposed rectified algorithm.

**Figure 14.** Pretrained YOLOv3 object detection on various rectified 190◦ fish-eye frames: bus detected along the edge in the proposed rectified algorithm.

**Figure 15.** Pretrained SSD object detection on various rectified 120◦ wide-angle frames: person detected in the proposed rectified algorithm frame.

#### 6.2.2. Quantitative Performance Analysis

The quantitative analysis has been carried out using the synthetic distorted KITTI dataset on various rectified algorithms—Aleman et al. [3], Santana et al. [5], and the proposed method—alongside distortion-free and randomly distorted samples. The SOTA pretrained YOLOv3 and SSD were employed to detect the objects in the scene, and comparisons were done with respect to various cases. The corresponding quantitative analysis in terms of mAP is depicted in Figure 16. The pretrained SSD achieved 72.4 mAP on rectified samples using the proposed method, which is higher than the distorted an other rectified samples. Similarly, pretrained YOLOv3 achieved 79.8 mAP on proposed method rectified samples, which is greater than the distorted and other rectified samples. The rectified samples used in the streamlining of trained detectors must perform well in order to improve the detection accuracy, and this must be validated using distortion-free samples for proper analysis. The original samples are considered as a ground-truth benchmark such that the algorithm which can produces better rectified samples can therefore be streamlined on to pretrained detectors for better accuracy. This phenomenon proves that the rectified samples using the proposed method are more pixel-consistent and preserved the object characteristics through stretch-free rectification compared to the other rectification algorithms.

**Figure 16.** SOTA pretrained YOLOv3 and SSD were employed to detect the objects in the scene on distorted KITTI samples rectified with various algorithms.

#### *6.3. High-Level Metrics: Video-Surveillance Use-Case*

The quantitative and qualitative analysis was carried out on various samples retrieved from different camera systems. Primarily, the comparisons were carried out between the use cases where the inevitability of distortion is high. Both the quantitative and qualitative analyses were dealt with using experiments where the distortions were rectified and thereby the intrinsic estimation and height calculations were performed. This process was done for both cases—the distortion rectification process proposed in this study as well as the manual rectification following the approach of Li, Shengzhe et al. [32]. The accuracy in height measurements was estimated with a straightforward method of retrieving errors between the estimated and available ground truth.

The results corresponding to the camera IDs 03, 04, and 08 are depicted in Figures 17–19, respectively, as they spread-over the samples retrieved from both indoor and outdoor. The distortion effect was nullified using both the rectification methods, and the rectified pixel points were used for the further process of estimating the heights of all 11 subjects recorded using a similar camera ID. The red plot line represents the height error values in the case of manual rectification, where the distortions are not completely rectified and that resembles a concave effect due to inappropriate estimation of distortion parameters. The blue plot line represents the error in height estimations in case of the rectification using proposed method.

The results clearly state that the method used in Li, Shengzhe et al. [32] is manual in a manner with the intrinsic-based height estimation, which can be termed as manual distortion-rectification-guided intrinsic-based height estimation (DR-IE) has an effect due to pixel irregularities. This inconsistency in pixel locations and corresponding error in metric information increases with the increase in the distortion levels. The method proposed by Li, Shengzhe et al. [32] is unable to handle such irregularities through manual rectification. In contrast, the proposed method uses the rectified frames to get the pixel location which has relatively low pixel inconsistency resulting in the low height estimation error in cm. This can be clearly shown in the error plots where the height estimation errors are relatively larger in Li, Shengzhe et al. [32] than the proposed method.

**Figure 17.** Height Estimation errors using (Li, Shengzhe et al. [31] vs. proposed method) on Outdoor camera ID.03: (**a**) Qualitative pixel-consistency. (**b**) Height estimation error plot corresponding to all the 11 subjects.

**Figure 18.** Height Estimation errors using (Li, Shengzhe et al. [31] vs. proposed method) on Indoor camera ID.04: (**a**) Qualitative pixel-consistency. (**b**) Height estimation error plot corresponding to all the 11 subjects.

**Figure 19.** Height Estimation errors using (Li, Shengzhe et al. [31] vs. proposed method) on Indoor camera ID.08: (**a**) Qualitative pixel-consistency. (**b**) Height estimation error plot corresponding to all the 11 subjects.

The effect of the distortion-rectification-guided height estimation can be observed clearly in the context of the wide-angle camera scenario. The below Figure 20 illustrates the robustness of the proposed system in the presence of darkness and severe illumination changes.

**Figure 20.** Robustness of proposed distortion-rectification-guided height estimation on wide-angle camera at night time.

The overall height estimation errors with respect to various camera sensors in the context of 11 subjects have been extensively tested with the Li, Shengzhe et al. [31] result as a baseline. The proposed method preserved the pixel consistency in the distortion-rectified image, thereby when those rectified pixels are used for the height estimations, the errors seem to decline. These quantitative comparisons are clearly illustrated in Table 4 below. The camera IDs 1, 2, 6, 7 were used to compare the distortion effects on the metric height estimation because these camera sensors posses a slightly higher amount of distortions compared to the other camera sensors used in the study. The average height estimation errors are indicated in bold in the below table which clearly explains the effectiveness of height estimation via the proposed automatic distortion rectification method.



#### **7. Conclusions**

An outlier refinement methodology for automatic distortion rectification of wide-angle and fish-eye lens camera models was proposed. The novel cumulative plumbline angular loss over line-member set aggregation exhibits better performance in conjunction with the outlier refinement optimization scheme. The design elements were evaluated using various metrics on real datasets (wide-angle: 120◦ < FOV < 150◦; fish-eye: 165◦ < FOV < 190◦) and synthetic distortions on distorted KITTI comprising of several real-time challenges and diverse distortion variations. The practical significance of the proposed novel elements was investigated using an ablation study in accordance with public and private datasets on image quality and pixel consistency metrics. The novel cumulative plumbline angular loss in conjunction with outlier refinement optimization scheme exhibited better performance in rectifying severe distortions compared to other rectification options in the ablation study. A diverse range of experiments were conducted in relevance to the low-level metrics such as image quality, stretching, and pixel-point error on various metrics such as PSNR, SSIM, S3, and LPC-SI. Besides, most of the experiments were carried out in the context of streamlining vision tasks on the rectified frames. The high-level scenarios, such as object detection in ADAS and metric height estimation in video surveillance, were extensively exploited on the distortion-rectified frames to validate the proposed method. Application-oriented metrics such as mean average precision (mAP) and height estimation errors (in cm) were employed to investigate the adaptability of the proposed method in both learning-based appearance tasks and metric-based tasks. Both the quantitative and qualitative metrics were employed in all the streamlined experiments to examine the practical usage of the proposed method. The rectification algorithm proposed using the outlier refinement optimization scheme guided the streamlining vision-based tasks to achieve better accuracy.

**Author Contributions:** Conceptualization, V.K., H.K.; methodology, V.K.; validation, V.K., H.K.; formal analysis, V.K., M.K. and H.K.; writing—original draft preparation, V.K.; writing—review and editing, V.K., M.K. and H.K.; visualization, V.K., J.L., C.R.; supervision, H.K.; project administration, H.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received funding from INHA UNIVERSITY Research Grant: 60507-01.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Analyzing Passive BCI Signals to Control Adaptive Automation Devices**

**Ghada Al-Hudhud 1,\*,†, Layla Alqahtani 2,†, Heyam Albaity 1,†, Duaa Alsaeed 1,† and Isra Al-Turaiki 1,†**


Received: 26 May 2019; Accepted: 4 July 2019; Published: 10 July 2019

**Abstract:** Brain computer interfaces are currently considered to greatly enhance assistive technologies and improve the experiences of people with special needs in the workplace. The proposed adaptive control model for smart offices provides a complete prototype that senses an environment's temperature and lighting and responds to users' feelings in terms of their comfort and engagement levels. The model comprises the following components: (a) sensors to sense the environment, including temperature and brightness sensors, and a headset that collects *electroencephalogram* (EEG) signals, which represent workers' comfort levels; (b) an application that analyzes workers' feelings regarding their willingness to adjust to a space based on an analysis of collected data and that determines workers' attention levels and, thus, engagement; and (c) actuators to adjust the temperature and/or lighting. This research implemented independent component analysis to remove eye movement artifacts from the EEG signals and used an engagement index to calculate engagement levels. This research is expected to add value to research on smart city infrastructures and on assistive technologies to increase productivity in smart offices.

**Keywords:** Passive Brain Signals; adaptive automation and controller; EOG artifact; independent component analysis; engagement index

#### **1. Introduction**

Worker engagement and concentration are essential to ensure productivity in the workplace. However, busy workers may find it hard to concentrate since their focus can be easily broken by many factors, and this may affect their engagement at work. The environment surrounding the employee is one of these factors [1]. A room's temperature [2], brightness level, window size [3], and noise level [4] can affect focus at work, especially when employees have a busy schedule. For instance, small changes in room temperature may directly affect engagement, which influences the productivity of employees; this may sometimes occur without anyone noticing the causes for it. Therefore, providing a control system to maintain an environment that helps increase user engagement can improve productivity at work. Furthermore, the control system's level of interaction with users in maintaining an appropriate environment is critical, as most busy workers find it difficult or time consuming to track their environment in order to continually adjust it. Although such a control system sounds promising as an assistive technology to accommodate workers with movement disabilities, it would be impractical in offices with more than one worker. Hence, the proposed research assumed the environment of a small office with one or two workers that is equipped with sufficient infrastructure to assist workers with disabilities.

#### *1.1. Passive Brain Computer Interaction*

The original goal of *Brain Computer Interaction* (BCI) is to provide a communication and a control channel for people with severe disabilities, especially those who are completely paralyzed. Most BCIs are used for direct or *explicit* control, which might involve users controlling a cursor or selecting letters on a computer screen using mental activity. The channel transfer rate of these applications remains under 25 bits per minute [5] Such explicit BCIs often require a long training period but remain a solution for patients.

*Non-command* user interfaces [6] have been proposed to use a BCI as an implicit communication channel between a user and a computer. Implicit or passive BCIs refer to BCIs in which the user does not try to control their brain activity. Passive BCIs have been deployed in recent research on adaptive automation. In the field of adaptive automation, the first brain-based system was developed by Pope et al. [7]. In this system, tracking tasks were allocated between a human and a machine based on an engagement index, which was calculated using users' brain activity. More recently, Kohlmorgen et al. presented the use of implicit BCI in the context of a real driving environment [8]. In this study, the user was engaged in a task mimicking an interaction with the vehicle's electronic warning and information systems. This task was interrupted when a high mental workload was detected. This experiment showed good reaction times on average using BCI based on implicit interaction.

#### *1.2. BCIs in the Office*

The environment around a person working in an office has a direct effect on their engagement and productivity [2,4]. In the last few years, BCI researchers have studied BCI technology and its uses, both for disabled and healthy users [9]. Finding an easy and smart way to automatically detect the best environmental conditions and then adjust them accordingly would make the work environment a more enjoyable and proactive place. Recent research has been conducted to develop smart offices using various technologies and techniques, such as smartphones [10], speech commands, gestures [11], and even active brain signals [12]. While some systems do not require direct interaction with the user, most do. Such an interaction involves using passive BCI signals and concentrating on only one environmental factor, such as a window [13]. However, to our knowledge, no research has been conducted that includes adjusting multiple factors in the office environment to enhance worker engagement and concentration [13]. Thus, an intelligent system that passively observes workers' mental status, automatically and passively acquires their brain signals, analyzes these signals alongside environmental measurements to find the perfect state to enhance workers' concentration, and adjusts environmental factors to meet the required state is needed.

#### **2. Background and Literature Review**

Using BCIs as a tool in smart offices [12] and homes [14,15] varies in terms of purpose, methodology, and environment. Hence, in this section, a background on brain wave types and the use of BCI technologies and methodologies in smart offices is discussed.

#### *2.1. Brainwaves*

Brain waves are classified based on bandwidth, and each type serves a different function. Low-frequency waves dominate when a person is tired or daydreaming. High-frequency waves appear more often when a person is active. The bandwidths of brain waves are shown in Table 1.


**Table 1.** Brainwave bandwidths and functions.

#### *2.2. BCI*

*Electroencephalogram* (EEG) electrodes can be used to measure the voltage resulting from brain activities. In the signal-processing stage, several steps are taken to obtain control signal, including preprocessing, feature extraction, and classification. In the last stage, the processed signal is interpreted into the desired action. Figure 1 describes these stages.

**Figure 1.** Typical Brain Computer Interaction structure, including data acquisition and signal processing; finally, the interpreted action is shown as a result.

Advancements in the development of BCI systems in recent years have helped to make them more appealing to a wider range of user groups. The cost of such systems has dramatically dropped, and they have become more convenient to use; the electrodes are now wireless, dry, and easy to move during wear. Currently, many commercial BCI devices are available, including NeuroSky [16] and EMOTIV EPOC [17]. EMOTIV EPOC is a BCI device that was developed for research and development

applications. It contains 14 sensors to acquire brain signals at the following locations: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4, as shown in Figure 2 [18].

**Figure 2.** Emotiv EEG neuroheadset sensor position [18].

BCIs vary in terms of their properties and the ways they acquire, analyze, and translate signals. Typical BCIs involve brain signal acquisition, processing, and interpretation. Brain signals can be acquired using various methods, such as EEG.

#### *2.3. Electrooculography/Electromyogram Artifact Removal*

Artifacts are the undesirable signals and noise that can interfere with acquired brain signals. Artifacts have a much stronger amplitude than EEG signals and may affect the acquired brain signals, thereby reducing the performance of BCIs. There are two types of artifacts: physiological and non-physiological. Common physiological artifacts include eye movements, which are detected by *electrooculography*(EOG), and muscle movements, which are detected by *electromyography* (EMG). They usually appear as large-amplitude, high-frequency distortions within brain signals. Non-physiological artifacts are usually technical and caused by the environment; they include power-line noises and disturbances caused by recording equipment (e.g., changes in electrode impedances). Non-physiological artifacts are easy to handle and prevent (by applying filtering and the proper recording procedure, respectively). However, physiological artifacts are challenging to eliminate from brain signals and are a significant problem in designing BCIs [19]. Recently, researchers have published many methods to remove eye movement and blinking artifacts from EEG data. Among these methods is rejecting contaminated EEG epochs; however, this method results in a significant loss of collected information. Another method is performing regression on simultaneous EEG recordings, including EMG and EOG recordings in the time or frequency domain. This method aims to derive the parameters that characterize the appearance and spread of EOG artifacts in EEG channels. EOG records may also contain brain signals; hence, removing EOG activity would result in the loss of relevant EEG signals. As there is no clear reference channel for the artifacts, regression methods cannot be used to removed them. A recent method was proposed by Hsu et al. 2016 [20] that involves applying *Independent Component Analysis* (ICA) to eliminate artifacts from EEG sensors. In comparison

with results obtained using regression-based methods and principal component analysis, Hsu et al.'s published results show that ICA can effectively detect, separate, and remove artifacts from EEG records.

#### *2.4. Processing EEG to Measure Engagement Levels*

The growth of the BCI technology has attracted many researchers, who often use this technology to measure the engagement of a user. The purpose of measuring user engagement differs from that of enhancing user experience, interfaces, games, and online learning systems. For example, [21] developed an attention-aware system that monitors a student's attention to an online education system and alerts their teacher when attention decreases. The system uses NeuroSky and achieved an 89.52% accuracy rate on average. Ref. [22] developed a prototype system to enhance user experience in museums. This system uses real-time feedback regarding user engagement, using a BCI to provide a tailored museum experience based on a user's taste. The system provides guided tours and suggests exhibits based on a user's engagement level.

A smart office was presented for the first time in [11], in which a user was observed and their intentions anticipated to augment their environment and communicate useful information. At first, the system was controlled using voice commands and gestures [11]. In recent years, much research has adapted BCIs for use in smart offices in order to enhance worker experience and productivity. The system in [13] was designed to improve user engagement by blocking outside distractions; this was done by controlling the opacity of an office's glass wall. The system uses BCI to passively measure a user's level of engagement through NeuroSky's ThinkGear device. The user's level of engagement was used to determine the opacity of an electrochromic smart glass tile, which could change from being fully opaque to being fully transparent. As the user focused, the system increased the opacity of the window as a signal for others to not to disturb the user. However, this system ignores other surrounding factors, such as lighting and room temperature. Ref. [12] used an EMOTIV headset to actively acquire a worker's brain signals and translate them to control the office environment. The system allows users to control the temperature and brightness using their thoughts. However, the system does not control the environment passively; it requires user intervention. Therefore, using BCIs to develop smart offices is a growing research area that still has room for improvement.

#### **3. Methodology**

#### *3.1. Proposed Structure*

The basic structure of the proposed system is divided into the following phases: brain and environment signal acquisition, signal processing, user engagement calculation, and decision making, as shown in Figure 3. These phases continue working in a cycle to ensure the continuing functionality of the system in order to provide a suitable environment for the worker.

**Figure 3.** Basic structure of the proposed smart office controller.

#### *3.2. Development Environments and Tools*

LabVIEW (also called G) is a dataflow programming language that uses virtual instruments to represent a program. In the field of BCI research, the system in [23,24] used LabVIEW as the development environment to build a system for a smart house. The system's purpose is to monitor the temperature, humidity, lighting, fire and burglar alarms, and gas density in the house in order to ensure safety. Ref. [25] built a smart home system based on a wireless sensor network designed to ensure the safety of elderly people living alone. However, neither of these studies used BCIs. Although many researchers have used LabVIEW to develop BCIs, none were used to create smart offices.

#### *3.3. Signal Acquisition*

The proposed system controls the office environment automatically as it detects the user's engagement and intensity levels and the temperature of the environment, which it then maps onto the user's comfort level. Hence, the system adjusts temperature and light intensity as needed. An EMOTIV headset is placed on a user's scalp to collect real-time data (brain signals) in various situations (comforted, stressed, engaged, and distracted), while environmental sensors acquire the temperature and brightness level. The sampling frequency of the headset is fixed at 128 samples per second. Since electrodes placed on the frontal and occipital lobes perform better in obtaining cognitive EEG data than in other locations, the electrodes are placed at the following locations: F3, F4, FC5, FC6, P7, P8, O1, and O2. These channels were chosen because they are closest to those used in other engagement research [22,26–28].

To acquire environmental signals, a temperature sensor was used, and a photoresistor was used to acquire the light intensity. The sensors were set on an electronic circuit connected to an Arduino UNO board, which functions as an interface between the sensors and the computer.

#### *3.4. Signal Processing*

After the brain and environmental signals are acquired, they are imported for processing. The processing of the temperature and light signals is done by conditioning them. The temperature and light signals are analog signals; signal conditioning involves converting these signals into digital signals for the next stage. This conversion is done using an existing function in LabVIEW.

The signal processing of the acquired EEG signals is considered difficult due to noise and artifacts. As such, this process is divided into the following stages in order to extract the frequencies required for the task:

	- 1. Figure 4 shows all the components, including those that contain eye artifacts and spatial mixtures of brain and artifact activities.

**Figure 4.** Mixed electroencephalogram/electrooculography (EEG/EOG) data. Note the pulses in the independent components.

**Figure 5.** EEG data after eye blink removal.

#### *3.5. Engagement Index Computations and Environmental Control*

This system uses the engagement value to determine the best action. If a new high engagement value is recorded, the system saves its related temperature and light intensity values to use when controlling the system. Since temperature and light intensity do not change suddenly in smart offices, the overload on the system is reduced, making it more efficient. The system logs all environmental data and their related engagement levels as references to find the best condition for the user.


#### *3.6. Feature Selection and Classification of Actions*

The purpose of feature selection is to convert digitized brain signals that are recorded at various locations into features [20]. For the feature extraction, a fast Fourier transform (FFT) is used. A Hanning window is first applied; then, the FFT is performed for each epoch. After, the average power spectral density value is extracted for the alpha, beta, and theta frequency bands in order to extract the features of each band. The FFT reduces the computation complexity by obtaining faster results than the discrete Fourier transform. In addition, the FFT is a commonly used algorithm for signal processing; in this project, it is used to capture the frequency components of the EEG signals recorded by the EMOTIV headset. Figure 6 shows a real-time EEG data spectrum. The system acquires new EEG data and processes these data in each iteration. Then, it identifies the engagement level of a user by analyzing the extracted features associated with their mental status using a suitable classification. The alpha, beta, and theta powers collected previously are used to calculate the engagement score. The engagement index is calculated using the following function:

$$\text{Engagementent level} = \frac{\beta}{\alpha + \theta} \tag{1}$$

**Figure 6.** EEG data spectrum.

Then, the engagement values are scaled from zero to one; the higher the value is, the higher engagement level is. In this stage, the system makes a decision and applies changes depending on the results of the previous stage. All the environmental data and engagement data are logged. In a case where the engagement level is below the threshold, the system takes action. First, the system checks whether the environmental data are in the normal range to ensure that the cause of this loss is the environment. If so, the system adjusts the environmental data to a suitable range (from the log data) and then notifies the user about the changes. The decision algorithm is shown in Algorithm 1.

#### **Algorithm 1** Decision algorithm.

**if** engagement value ≤ Threshold **then**


#### **4. Experimental Setup**

The raw EEG data were collected from two participants, both male and aged 20–29 years of age. The mental status and health conditions of both participants were normal. For this experiment, there was no pre-knowledge required from the participant about how to use a computer or how to operate an office appliance. The experiment took place in a room where the participants were asked to perform specific tasks that required their attention. The tasks included sending emails, editing a Word document, and reading a document. The participants performed the tasks wearing a headset connected to a computer containing the system software. The headset sampling frequency was set to 256 samples per second. In addition, an Arduino board with two circuits containing a temperature sensor and light sensor were connected to the same computer.

#### **5. Discussion**

In this section, the results are presented in more detail and analyzed in three phases. The first phase records the values of a user's engagement level at a neutral status in order to learn and record the engagement level of the user. The second phase analyzes the impact of changing the temperature and fixing the light intensity value on the user's engagement level. The third phase analyzes the effect of changing the light intensity value while fixing the temperature on the user's engagement level.

Figure 7 shows the engagement across one full session. Figure 8 presents the maximum engagement, temperature, and light intensity values over time to show changes to the system over time and to allow for a rapid analysis of the system's performance and the user's engagement.

**Figure 7.** Engagement data for one session.

**Figure 8.** Engagement and sensor results for one session.

Table 2 shows the results of calculating the maximum engagement value and its related environment sensory readings. The system accurately saved the high engagement value. The highest score in this session was achieved at a temperature of 24 and a light intensity of 86. By observing the engagement recorded after changing the office environment status, an improvement in engagement was seen for a reasonable time (Table 2). The recording, at first, was below the threshold (0.342805); after setting the temperature to 24.9 and the light intensity to 86.14, the engagement score slightly increased.


**Table 2.** Calculating the maximum engagement results for one session.

#### *5.1. Maximum Engagement Records and Associated Temperature and Light Intensity*

Table 3 shows the temperature and light intensity values related to the maximum engagement value from each session. The engagement values changed over 15 min across the three sessions. The first session was run under a low temperature (18–20), as shown in Figure 9. The second session was run under a high temperature (24–26), and the third session was run in the middle of the low and high temperatures used (21–23). The system calculated the engagement values and saved the maximum engagement values. The light intensity remained in the same range. Figure 10 shows the changes to the saved maximum engagement values over 15 min for each session.



**Figure 9.** Engagement values for different temperatures.

**Figure 10.** Engagement values for different light intensity values.

#### *5.2. Engagement versus Synthesized Changing Temperature Values*

In this phase, the temperature values were changed to record the associated engagement values. The results show the influence of changes in temperature on the engagement level of the user; our results conform to those reported in [7,29].

By changing the temperature and light intensity (Figures 9 and 10), at first, all sessions started with low engagement values, at 0.25056, 0.266476, and 0.225049. Since these values are below the threshold (0.4), the system set the temperature and light intensity values to the saved optimal values for each session. After, as shown in Figure 11, the engagement values slightly increased over time until they reached high values (above 0.6) at the end of the sessions.

**Figure 11.** Engagement values over 15 min.

#### **6. Conclusions**

EEG is expected to be a future user input technology. This research provides a prototype and a first step to relate EEG data with the office environment in order to enhance and develop smart offices. The environmental sensory data remained the same from sensors connected to Arduino board. Alpha,

beta, and theta powers were extracted from the EEG data and used to calculate user engagement. Based on the engagement value, the system sets the temperature and light density values. The system saved 22.3 C as the optimal temperature and 76 lux as the optimal light intensity.

The experimental results show efficient control in terms of the focus level of users by correctly adjusting the office temperature and light intensity.

In this study, the best possible environment was determined based on the engagement value. There are two factors considered in this study: room temperature and illumination. The value for each was set based on the highest engagement value obtained. In the future, Artificial Intelligence algorithms may be utilized to determine the best environment. The temperature and lumen values in this study are assumed to be fixed. However, during the testing phase we changed the values within a limited range (range of normal situation in offices).

Many factors affect EEG, including: emotional state, fatigue, sleepiness, age, body temperature, and blood oxygen saturation. All these factors are important to consider. Thus, it is suggested to collect more feedback from the subjects and also to vary the duration and frequency of training sessions.

Due to time and device limitations, our system only deals with room temperature and lighting. Future enhancements should count more than two parameters and investigate the use of other mechanisms to track eye blinking as a sign of discomfort. In addition, low engagement should be eliminated using linear discriminant analysis for the classification of the feature vectors extracted from the ICA components. Further enhancements could use a more convenient EEG headset—this may generate effective results since the user would not feel a difference in their daily routine at the office.

**Author Contributions:** Conceptualization, G.A.; Methodology, G.A., H.A., D.A. and I.A.-T.; Software, L.A.; Validation, G.A. and L.A.; Writing original draft, L.A.; Writing, review, and editing, G.A., H.A., D.A. and I.A.-T.

**Acknowledgments:** This research project was supported by a grant from the *Research Center of the Female Scientific and Medical Colleges*, the Deanship of Scientific Research, King Saud University. The tools and technical software used in this research were partially supported by *King Abdulaziz City for Science and Technology*, grant number 1-38-089.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18