*2.2. Broad Learning System (BLS) and Logistic Regression (LogR)*

#### 2.2.1. Broad Learning System (BLS)

The concept of BLS [8] is a new technique. BLS and other variants [8,30,31] connect hidden nodes of a neural network broadly. As shown in Figure 4, the nodes are put together in a broad flat structure. A BLS network contains two hidden layers, namely the enhancement layer and feature-mapped layer.

**Figure 4.** A typical structure of a BLS network.

The concept introduced in the BLS framework is promising. It is an efficient, simple learning algorithm. Due to the efficient feature-extraction capacity of the nodes in the feature-mapped layer and enhancement layer of the BLS, the original BLS and hybrid methods, where feature-mapped layer of BLS and other techniques are combined, have been used in many applications. However, much work has not been completed using neural network-based algorithms to predict APS failure in the case of APS failure. In view of the points and that BLS's feature-mapped nodes and enhancement nodes can extract effective features from the input data, which can enhance the performance of a classifier, we combine BLS and logistic regression (LogR) to study APS failure prediction. Thus, we propose broad embedded logistic regression (BELR) for APS failure prediction.

#### 2.2.2. Operation of BLS Networks

This subsection gives background knowledge on the operation of a BLS network. This paper proposed BLS for solving the air pressure system failure classification problem. In this paper, the classification problem is formulated as nonlinear logistic regression, where the input of a logistic regression algorithm is the feature from the BLS network. The final output is the sign of the predicted value or the probability of the predicted output. In other words, if the sign is positive, then the predicted value belongs to the positive class. Otherwise, the predicted class is negative. Let **<sup>x</sup>** <sup>∈</sup> <sup>R</sup>*<sup>D</sup>* be the input data to the BLS network, where *<sup>D</sup>* is the dimension of the input data, and *<sup>o</sup>* <sup>∈</sup> <sup>R</sup> be the output of a BLS network. In this section, for smooth and clear presentation, we present the input **x** augmented with 1 as χ = [**x**T, 1] T .

#### (a) Feature-mapped nodes

The BLS network has two main layers, namely feature-mapped layer and enhancement layer. The feature-mapped layer is to extract features from input data. For the featuremapped layer, there are n groups of the feature-mapped nodes. These *n* groups are concatenated together to form one main feature extraction. The output of the main featureextraction group is passed to the output layer, and another layer is called the enhancement layer. Each group from the *n* groups is used to extract distinctive features. Each group has it on a specific number of nodes. For instance, in this paper, *fi* represents the *i*-th group of the feature-mapped nodes. Hence, for *n* groups of features-mapped nodes, the total number of features-mapped nodes is the following:

$$f = \sum\_{i=1}^{n} f\_i \tag{1}$$

It should be noted that *fi* for *i* = 1, ... *n* may not be equal. For each group of the featuremapped node, which is the *i*-th group of features-mapped nodes, there is an associated learned projection matrix, and the *i*-th learned projection matrix is given by:

$$\mathbf{\varPsi}\_{i} = \begin{pmatrix} \psi\_{i,1,1} & \cdots & \Psi\_{i,1,(D+1)} \\ \vdots & \ddots & \vdots \\ \psi\_{i,f\_i,1} & \cdots & \psi\_{i,f\_i,(D+1)} \end{pmatrix} \tag{2}$$

where **<sup>Ψ</sup>***<sup>i</sup>* <sup>∈</sup> <sup>R</sup>*fi*×(*D*+1). It is designed to generate features from the input data. The *<sup>i</sup>*-th group of mapped features *gi* are obtained by projecting the input data with the matrix **Ψ***i*. They are given by the following:

$$\begin{array}{rcl} \mathfrak{g}\_{i} &=& \Psi\_{i}\chi\_{\rangle} \wedge \cdots \wedge \mathfrak{g}\_{i,f\_{i}}\mathfrak{l}^{\top} \\ &=& \Psi\_{i}\chi\_{\rangle} \qquad \forall \; i=1,\cdots,n\_{\prime} \end{array} \tag{3}$$

where *gi*,*<sup>u</sup>* is the *u*-th feature of the *u*-th group, where *i* = 1, ··· , *n*, and *u* = 1, ··· , *fi*.

In the classical BLS, **Ψ***<sup>i</sup>* s is constructed based on sparse optimization steps. There are many ways to achieve these steps. One way is to solve sparse optimization problems based on the alternating direction method of multipliers ADMM [32] algorithm. In Section 2.2.3-(a), we present the construction procedure of **Ψ***<sup>i</sup>* s. In the classical BLS scheme, a linear operation is applied on <sup>g</sup>*i*s. It should be noted that <sup>g</sup>*i*s is not <sup>χ</sup> but features-extracted from <sup>χ</sup>. Similarly, a nonlinear operation can be applied on g*i*'s as well. In this paper, we apply a linear operation on g*i*'s, that is, this paper follows the classical BLS framework. The outputs from the *n* group of the feature-mapped nodes are gathered as

$$\mathbf{g} = \begin{bmatrix} \mathfrak{g}\_1^\mathsf{T}, \cdots, \mathfrak{g}\_n^\mathsf{T} \end{bmatrix}^\mathsf{T} \in \mathbb{R}^f \tag{4}$$

Additionally, we let

$$\mathbf{q} = \begin{bmatrix} \mathbf{g}^{\mathrm{T}}, \mathbf{1} \end{bmatrix}^{\mathrm{T}} \in \mathbb{R}^{f+1} \tag{5}$$

for a smooth mathematical model presentation and as the augmented vector of **g**.

#### (b) Enhancement nodes

Like the feature-mapped nodes, the enhancement nodes of the BLS network have *m* groups of enhancement nodes. In the enhancement layer, the *j*-th group of enhancement nodes has *ej* nodes. The total number of enhancement nodes in the BLS network is given by

$$e = \sum\_{j=1}^{m} e\_j \tag{6}$$

In addition, the output of *j*-th group of enhancement nodes is given by

$$\mathfrak{H}\_{\rangle} = \begin{bmatrix} \mathfrak{H}\_{\rangle,1}, \dots \ \mathfrak{H}\_{\rangle,c\_{\rangle}} \end{bmatrix}^{T} = \xi(\mathbf{W}\_{\rangle}\mathbf{q}) \tag{7}$$

where *j* = 1, ··· , *m* and **W***<sup>j</sup>* is the weight that connects the output of feature-mapped nodes to the input of the enhancement nodes together. It should be known that, in the original BLS framework, **W***<sup>j</sup>* is a randomly generated. The elements of **W***<sup>j</sup>* are denoted as

$$\mathbf{W}\_{j} \quad = \quad \left( \begin{array}{cccc} w\_{j,1,1} & \cdots & w\_{j,1,f+1} \\ \vdots & \ddots & & \vdots \\ w\_{j,x\_{j},1} & \cdots & w\_{j,x\_{j},f+1} \end{array} \right) . \tag{8}$$

It should be known that *ξ*(·) is the activation function for enhancement nodes. Each group of enhancement nodes can have its activation function. In the original BLS algorithm, the hyperbolic tangent is employed as the activation function for all the enhancement nodes. This paper uses hyperbolic tangent as the activation function for all enhancement nodes. We gather all the enhancement node outputs together as

$$\mathfrak{u} = \begin{bmatrix} \mathfrak{h}\_1^\mathsf{T}, \cdots, \mathfrak{h}\_m^\mathsf{T} \end{bmatrix}^\mathsf{T} \in \mathbb{R}^c \tag{9}$$

#### (c) Network Output

For a given input vector **x**, the output of the network is

$$\boldsymbol{\rho} = \left[\mathbf{g}^{\mathrm{T}} \middle| \boldsymbol{\eta}^{\mathrm{T}}\right] \boldsymbol{\mathfrak{B}} \tag{10}$$

where β is the output weight vector. The number of elements in β is equal to *f* + *e*. Hence, its components are given by

$$\mathfrak{B} = \begin{bmatrix} \beta\_1, \dots, \beta\_{f+c} \end{bmatrix}^{\mathrm{T}} \tag{11}$$

2.2.3. Construction of Weight Matrices and Vectors

Given N training pairs <sup>D</sup>*train* <sup>=</sup> {(**x***k*, *yk*) : *<sup>k</sup>* <sup>=</sup> 1, . . . , *<sup>N</sup>*}}, where **<sup>x</sup>***<sup>k</sup>* = [*xk*,1, ··· , *xk*,*D*] T is a *D* dimensional training input and *yk* is the corresponding target output as the training set. Consider that the training data matrix is formed by packing all the input **x***<sup>k</sup>* s together. The augmented data matrix denoted as **X** is given by

$$\mathbf{X} = \begin{pmatrix} \mathbf{x}\_1^T \\ \vdots \\ \mathbf{x}\_N^T \end{pmatrix} \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix} \tag{12}$$
