**1. Introduction**

Advanced sensing technologies are being increasingly applied in data collection systems for the areas including public health, geological condition, environment pollution, and manufacturing process. If the output of sensors is represented by the data with space and time structure, it can be termed as spatial-temporal data [1]. A lot of research focuses on the abnormality identification of abnormal-spatial temporal data, such as identifying outliers of the hourly air quality [2], detecting abnormal ozone measurements caused by air pollution or correlation among neighbor sensors [3], and diagnosing whether a disease is randomly distributed over space and time [4]. With the development of manufacturing technology, many sensors have been installed in the production lines, and a large number of spatial-temporal data can be collected from such processes. In order to improve the quality of manufacturing process, the abnormality identification of such spatial-temporal data has attracted much attention. Wang et al. [5] proposed a spatial-temporal data modeling method to identify the abnormality of a wafer production process. The identification scheme developed by Megahed et al. [6] can quickly detect the emergence of a fault in the nonwoven textile production process. Yu et al. [1] presented a rapid spatial-temporal quality control procedure for detecting systematic and random outliers. Current research is being conducted on identifying whether the process with spatial-temporal data is normal or not. Their common objective is to accurately detect the time and location of changes in

the occurrence rate as soon as possible [7]. In other words, existing research only focuses on the process monitoring of spatial-temporal data, but it is very rare to consider the root cause of abnormal data. In a real process, however, both normal and abnormal data can be collected from the process, and process monitoring and fault diagnosis can be applied simultaneously with spatial-temporal data. Normal and abnormal data display different variation patterns, which can be observed in process. Hence, how to identify such patterns precisely is the key problem in modern manufacturing process control.

Generally speaking, spatial-temporal data collected from the production process has many observations over time and location, and the adjacent observations are highly correlated. Thus, the high volume and their correlation of spatial-temporal data provide a considerable challenge for the identification of abnormal process. Moreover, the curse of dimensionality and complex data structure makes it difficult to build an identification model. Therefore, dimension reduction techniques are required before using the identification model. To capture intrinsic spatial- and temporal- correlations in an abnormal process state, principal component analysis (PCA), a widely used technique in dimension reduction, can be applied to extract features from spatial-temporal data via unfolding the original data set [8]. However, PCA cannot be directly applied to two or higher dimensional tensor data, unless such data are reshaped into a vector. Because the vectorization operation breaks the spatial and temporal correlation structure, it would lose potentially useful information contained in original data [9]. It is known that analyzing spatial-temporal data is more challenging than analyzing one-dimensional data. To overcome this issue, multilinear PCA (MPCA) and uncorrelated multilinear PCA (UMPCA) are proposed as alternatives to PCA [10]. In these methods, the tensor structure of spatial-temporal data are considered and more effective representation can be extracted [11,12]. Although these methods have desirable performance in processing spatial-temporal data, they focus separately on how to extract useful features from raw data and construct an effective identification model of an abnormal process [13]. If extracted features cannot interpret abnormal processes sufficiently or the identification model does not understand the extracted features, the performance is not robust. Hence, how to propose an effective approach integrating features self-learning and the identification of abnormal process with spatial-temporal data are still a challenge to be overcome.

Convolutional neural networks (CNN), one of the most effective deep learning models for tensor data processing, has been widely applied in natural language processing [14], image recognition [15], electrocardiogram (ECG) analysis [16], and fault diagnosis [17]. Benefiting from the mechanism of CNN in tensor data processing, the correlation structure of spatial-temporal data can be well-preserved by CNN. Meanwhile, CNN does not need to extract abnormal features manually, as the features can be learned from spatial-temporal data hierarchically and automatically. Taking the advantages of CNN, a novel method for identifying abnormal production process with spatial-temporal data is proposed in this paper. The case study considered is a pasting process, which is a critical process in lead-acid battery production and the output of sensors constitute a typical example of spatial-temporal data. Motivated by this process, a CNN-based identification approach for abnormal process with spatial-temporal data are presented. In order to show the recognition accuracy and effect of this approach, UMPCA is used as a benchmark in our study.

This paper is organized as follows. In Section 2, the pasting process as a motivating example is introduced and its spatial-temporal data are acquired. Section 3 develops a general CNN framework for identifying abnormal process with spatial-temporal data. We investigate the validation of the CNN recognition model in Section 4. In Section 5, the CNN method is applied to online identify the abnormal pasting process, and then the performance of the proposed method is evaluated. Suggestions and directions for further research are discussed in conclusions.

#### **2. Case Study: Pasting Process**

#### *2.1. Spatial-Temporal Data Acquisition*

A lead-acid battery consists of basic cell blocks, and each cell block contains several plates. Plates are the basic components of lead-acid batteries, and unqualified plates will directly affect the initial capacity and cycle life of batteries. In general, plate production includes five processes: ball-milling, paste mixing, grid casting, pasting, and plate curing. Pasting is a critical process in plate production [18,19], and most components of poor quality batteries can be traced back to this process. The identification of abnormal pasting process is the key to ensure the quality of batteries. In the pasting process, the lead oxide pastes are squeezed into the gap between two sides of the grid, and then it is turned into a plate. The mechanism of the whole pasting process is shown in Figure 1. The uniformity of the lead oxide paste in the plate surface is a critical quality characteristic, which can be measured by plate thickness. Therefore, the change of uniformity will directly reflect the abnormal state of the pasting process.

**Figure 1.** Pasting process.

To obtain plate thickness data, a laser sensor is installed at the end of pasting equipment to collect the observations of the uniformity as shown in Figure 1. When a plate moves in the pasting process, the laser sensor records its thickness values at different locations over time as shown in Figure 2. In the pasting example, there are *m* locations to measure the uniformity of the plate. When a plate moves through the sensor, the data observed over *m* locations can be collected at one time. In other words, the uniformity of the plate can be described by the observations measured in the *m* locations, and the uniformity of different plates can be observed over time to indicate the condition of the current process. Actually, the observation of uniformity is represented as a vector, which can be collected at time *t*. The vector will become a matrix over time, which can indicate the stability of the pasting process. The matrix collected from the pasting process is bi-dimensional, including space and time dimension. The matrix visualized by a surface is shown in Figure 2.

**Figure 2.** Spatial-temporal data collection in the pasting process.

The abnormal changes of plate thickness in the pasting process often result from unexpected causes, such as the failure of the pasting machine and unqualified grids, which are the root causes of the abnormal process. Once these root causes are identified and removed, the pasting process will return to normal. The plate thickness data change randomly in space and time when the pasting process is running normally, which is referred to as a normal process pattern, *F*0, seen in Figure 3a. In general, different causes will lead to various abnormal process patterns, which can be reflected by the changes of plate thickness in space and time domains. For example, when the plate thicknesses have an upward shift, it is usually caused by the wear of the parts in the pasting machine.

According to the changes of spatial-temporal data from pasting process and engineering experience, we find out seven common abnormal process patterns. When the uniformity of lead oxide paste becomes worse, the plate thickness data will change unevenly in time and space, which is denoted by abnormal process pattern *F*1, seen in Figure 3b. This abnormal process pattern results from the failure of the compression roller or acid spouting system, such as the blockage of the sprayer in the acid spouting system and aging spring in a compression roller. When the plate thickness is not uniform on both sides, that is, one side is thick and another is thin, the spatial-temporal data of plate thickness at different locations will gradually become steeper with time. This abnormal process pattern is labeled as *F*2, and its corresponding causes are unusual clearance between the pasting machine and conveyor belt. When the plate thickness becomes thicker gradually, there is a steady rise over time in the spatial-temporal data, seen in Figure 3d. This pattern, denoted by abnormal process pattern *F*3, is caused by the insufficiency of conveyor belt tension under the pasting machine. When the thickness uniformity of plates becomes worse in a sudden way, the spatial-temporal data will become nonuniform suddenly, which is denoted by abnormal process pattern *F*4, as seen in Figure 3e. In general, this situation could be attributed to the low strength of the steel in a new batch of grids. When the thickness between the two sides of plates becomes nonuniform suddenly, the spatial-temporal data of plate thickness at one side will step up and become steep suddenly, which can be seen in Figure 3f. This situation is denoted by abnormal process pattern *F*5 and its corresponding cause is usually that the roller in the pasting machine slants to one side. When the plate thickness becomes thicker suddenly, the overall upward step will be shown in the spatial-temporal data of plate thickness, seen in Figure 3d. The abnormal process pattern *F*6 could be used as the label of this situation, which results from the electromagnetic fault in the pressure machine of compressed air. When the thickness of plates become thicker in a periodic manner, a periodical change will be observed in spatial-temporal data, seen in Figure 3h. This type of situation is labeled as abnormal process pattern *F*7, and the corrosion status of pasting conveyor belt is usually required to be checked by engineers. All types of abnormal process patterns are shown in Figure 3.

**Figure 3.** *Cont*.

**Figure 3.** Normal and abnormal process patterns in the pasting process.

From Figure 3, it can be observed that the obvious shape differences among the spatial-temporal data of plate thicknesses show different abnormal process patterns. Therefore, the identification of abnormal process with spatial-temporal data are converted into the problem of identifying these abnormal process patterns. Once a certain abnormal process pattern is identified, its corresponding root causes could be found out simultaneously.

#### *2.2. Abnormal Process Image Collection*

Although the abnormal process patterns describe the abnormal states of the process with spatial-temporal data very well, high dimensionality, spatial-temporal correlation, and a high amount of noise make it difficult to identify them directly. Grayscale image can visually capture important abnormal patterns of observation data without any parameters predefined by users' experiences [20]. In order to effectively distinguish normal and seven abnormal process patterns, spatial-temporal data from the pasting process can be transformed into grayscale images, which is called process image. Given a spatial-temporal data matrix *X*, *xi*,*j* refers to a measured value, where *i* and *j* represent the location and time of the spatial-temporal data matrix, respectively. For generating the process image of the spatial-temporal data, each grayscale can be calculated from data matrix *X* by normalizing, multiplying 255 and taking integer. The transformation formula is given by:

$$y\_{i,j}^{(0)} = r
uanding(\frac{x\_{i,j} - \text{Min}(X)}{\text{Max}(X) - \text{Min}(X)} \times 255),\tag{1}$$

where *y* (0) *i*,*j* is the grayscale value corresponding to *xi*,*j*, *rounding* is the function for taking integer, and *Min*(*X*) and *Max*(*X*) are used for extracting maximum and minimal elements from matrix *X*. Thus, the spatial-temporal data have been represented as a process image to visualizing the operation status of the manufacturing process. Process images of normal and seven abnormal process patterns discussed above are shown in Figure 4. For the convenience, they are still denoted as *F*0, *F*1, *F*2, *F*3, *F*4, *F*5, *F*6, and *F*7.

In the grayscale image of a normal process pattern, the pixels are arranged randomly without obvious change. Because the spatial and temporal data of plate thickness varies unevenly, in the grayscale image *F*1, some pixels are white and some are dark. When the spatial-temporal data of plate thickness becomes steeper gradually in *F*2, the pixels of the steep data gradually appear to be white in the process image *F*2. The important features of other abnormal process patterns can be directly represented in their process images, as shown in Figure 4. From the above discussion, it can be observed that each process image can capture the trend and variation features of its corresponding pattern shown in Figure 3. Therefore, the problem of identifying abnormal process patterns is converted into detecting abnormal process images. How to use an effective recognition method of the abnormal process image is a challenge in this paper.

**Figure 4.** Normal and abnormal process images in the pasting process.

#### **3. CNN Framework for Process Images Recognition**

#### *3.1. Architecture of the CNN Model*

Convolutional neural network (CNN) is a kind of feed-forward artificial neural network (ANN), which is inspired by biological vision from pixel to abstract feature [21]. Unlike the traditional neural networks that need to concatenate raw data into a vector, CNN can directly deal with image data. Taking this advantage of CNN, the recognition model for process image will be constructed in this paper. A CNN recognition model is comprised of an input layer, several convolution layers and pooling layers, fully-connected (FC) layers, and an output layer, as illustrated in Figure 5. The input layer is to import process images for detection. The convolution layer and pooling layer are used to extract abnormal information from process images. The fully-connected layer serves as the integration of abnormal process information. The output layer is to provide the categories of abnormal process images.

**Figure 5.** The general architecture of convolutional neural network (CNN) model for the process image.

A CNN model usually consists of several convolution and pooling layers. For convenience, suppose that there are R convolution and pool layers in CNN model and a process image is a square with *N* × *N* size, seen in Figure 5. Convolution layer is a most important component in the CNN model, which assigns weights to the grayscales of the input image by a convolution kernel, so as to extract process abnormal features.

#### *3.2. Underlying Mechanism of the CNN Model*

In the first convolution layer, suppose that there are *L*1 convolution kernels. The (*w*(1) *k*,*i*,*j* )*<sup>M</sup>*×*M*, (*k* = 1, 2, . . . *L*1; *i* = 1, 2, . . . , *M*; *j* = 1, 2, . . . , *M*) is used to represent the *k*th convolution kernel with *M* × *M* size ( *M* < *N*), where *w*(1) *k*,*i*,*j* means the weight value at row *i* and column *j* of the *k*th kernel. In order to ge<sup>t</sup> the convolution results, a moving window with *M* × *M* size should be set up. The window moves one stride at a time, where the area formed is called a receptive field. When all pixels in the process image are covered by the moving window, (*N* − *M* + 1) × (*N* − *M* + 1) receptive fields can be obtained. For all the receptive fields and convolution kernels in the first layer, the convolution result can be obtained by the following:

$$\begin{aligned} y\_{k,j,j}^{(1)} &= f(\sum\_{s=1}^{M} \sum\_{t=1}^{M} w\_{k,s,t}^{(1)} \cdot y\_{i+s-1,j+t-1}^{(0)} + b\_k^{(1)}), \\ \text{if and } j &= 1, \dots, (N-M+1), \\ \text{is and } t &= 1, \dots, M, \\ k &= 1, \dots, L\_1. \end{aligned} \tag{2}$$

where *y* (1) *k*,*i*,*j* is the convolution output of the current receptive field for the *k*th convolution kernel, *w*(1) *k*,*s*,*t* is the weight element at row *s* and column *t* of the *k*th convolution kernel, *y* (0) *i*+*s*−1,*j*+*t*−1 is the grayscale in current receptive field, and *b* (1) *k* (*k* = 1, 2, ... *L*1) represents the bias value. *f*() is RuLU activation function [22]. For the *k*th convolution kernel, all of the *y* (1) *k*,*i*,*j* values from receptive fields formed a matrix, which is called a feature map corresponding to the *k*th convolution kernel (*k* = 1, 2, ... *L*1). The feature maps of the first convolutional layer will be input to the pooling layer.

The pooling layer is mainly used to compress feature maps and obtain condensed feature maps via pooling function. In the first pooling layer, *L*1 feature maps are entered respectively. Each feature map is partitioned into several non-overlapping square areas, which is known as pooling fields. In general, every pooling field with 2 × 2 size is preferred [23]. To ge<sup>t</sup> the pooling results, maximum pooling and average pooling are widely used pooling functions [24]. For the *k*th feature map, the pooling results, also known as condensed feature map, are obtained. The pooling value at row *i* and column *j* of the *k*th condensed feature map, *y* (2) *k*,*i*,*j*, can be computed by following equation:

$$\begin{aligned} y\_{k,i,j}^{(2)} &= \text{pooling}(y\_{k,2i-1,2j-1}^{(1)}, y\_{k,2i-1,2j}^{(1)}, y\_{k,2i,2j-1}^{(1)}, y\_{k,2i,2j}^{(1)}),\\ i &= 1, \dots, \frac{N-M+1}{2}, j = 1, \dots, \frac{N-M+1}{2}, \end{aligned} \tag{3}$$

where *y* (1) *k*,2*i*−1,2*j*−1, *y* (1) *k*,2*i*−1,2*j* , *y* (1) *k*,2*i*,2*j*−1, *y* (1) *k*,2*i*,2*j* are four values in the current pooling field connecting to *y* (2) *k*,*i*,*j*, and *pooling* refers to pooling function. After pooling operation, *L*1 condensed feature maps are obtained, which will be input into the next convolution layer.

Similarly, the above work is repeated on several alternative convolution and pooling layers until the last pool layer. The condensed feature maps of the last pooling layer will be unfolded and imported to the Fully-connected (FC) layer. Before fully-connected operation, *LR* condensed feature maps are unfolded to a vector, *yFC i*, *i* = 1, ... , *LR* × *q* × *q* . Suppose there are *H* nodes in the FC

*Processes* **2020**, *8*, 73

layer, and, for the *h*th node, the weights connected to the *i*th input nodes can be denoted as *w*(*FC*) *i*,*h* (*i* = 1, 2, ... , *LR* × *q* × *q*, *h* = 1, 2, ... , *H*). The value of the *h*th node in the FC layer can be computed as follows:

$$y\_h^F = f(\sum\_{i=1}^{L\_R \times q \times q} y\_i^{(FC)} \times w\_{i,h}^{(FC)} + b\_h^{(F)}), h = 1, \dots, H, i = 1, \dots, L\_R \times q \times q,\tag{4}$$

where *yFh* is the output the *h*th node in FC layer, *b*(*F*) *h* (*h* = 1, 2, ... *H*) represents the bias of the *h*th node, and *f*() is ReLU activation function. In the output layer, the connection between the *h*th FC node and the *j*th output node is represented by *w*(*O*) *hj* , (*h* = 1, 2, ... *H*; *j* = 1, 2, ..*T*). To classify the input data, the probability output of the *j*th output node is required to be computed as follows:

$$P\_{\bar{\jmath}} = f(\sum\_{h=1}^{H} y\_h^{(F)} \times w\_{h\bar{\jmath}}^{(O)} + b\_{\bar{\jmath}}), h = 1, \dots, H, \bar{\jmath} = 1, \dots, T,\tag{5}$$

where *T* is the number of output nodes, *Pj* is the result of the *j*th node in the output layer, and *bj* is the bias value of the *j*th output nodes. *f*() is the normalized exponential function, through which the probability can be obtained:

$$f(y\_j) = \frac{c^{y\_j}}{\sum\_{j=1}^{T} e^{y\_j}}, j = 1, \dots, T. \tag{6}$$

As discussed above, the weights and biases in each layer of the CNN model can directly affect the final results of the output layer. If the difference between the actual output and the expected output is too large to be accepted, the weights and biases are required to be updated. The difference can be measured by the following loss function:

$$F = -\frac{1}{T} \sum\_{j=1}^{T} [P\_j^\* \ln(P\_j) + (1 - P\_j^\*) \ln(1 - P\_j)],\tag{7}$$

where *T* refers to the total number of trained categories, *P*∗*j* and *Pj* are the expected output and the actual output, respectively. A small loss value means an accurate probability output. To reduce the loss value, The back-propagation (BP) algorithm [25] is utilized. For convenience, the weights and biases in all layers are referred to as *w* and *b*, respectively. Thus, *F* is the loss function of *w* and *b*. The partial derivatives of *F* for *w* and *b* can be expressed as follows:

$$\begin{aligned} w\_t &= w\_{t-1} - \eta \frac{\partial F}{\partial w'} \\ b\_t &= b\_{t-1} - \eta \frac{\partial F}{\partial b'} \end{aligned} \tag{8}$$

where *wt* and *bt* represent the updated weights and biases after the *t*th iteration. *η* is the learning rate, and *η* = 0.01 is a common selection [26]. The CNN architecture parameters consist of the numbers of convolution layers and convolution kernels, the size of convolution kernels per layer, the selection of pooling function, and the number of output nodes in the FC layer. These parameters are required to be stepwise determined, which will be discussed in the next section.

#### *3.3. CNN Identification Framework*

A CNN framework for the identifying abnormal process with spatial-temporal data is presented in this paper. This framework can be divided into two phases: offline learning and online identifying. The offline learning phase aims to train a CNN model from offline collected process images and establish an appropriate CNN recognition model. In the online identifying phase, this CNN model trained in the offline phase can be applied to identify the abnormal process with real-time

spatial-temporal data. The CNN identification framework is shown in Figure 6, and the details are introduced in the following.

There are two main steps in offline learning: The first step is to obtain training samples, which consists of process images and their corresponding categories. Process images are converted by spatial-temporal data and their corresponding categories can be obtained according to engineering experience. The second step is to determine the architecture parameters of CNN, and update the weights and biases from the convolution, pooling, and FC layer of the CNN using the BP algorithm. After offline learning, the CNN recognition model will be obtained.

**Figure 6.** The CNN identification framework.

After the offline learning phase, the CNN recognition model is established and applied to online recognition of the abnormal process with spatial-temporal data. Two main steps in the online identifying phase are as follows: The first step is to collect the real-time spatial-temporal data from a process through the moving identification window. The size of the window should be determined by the product processing time, so that the spatial-temporal data in the current window can be mapped into a suitable process image. The second step is to identify the abnormal process images. The process images in the current identification window will be recognized by the CNN recognition model. If the decision result made by the CNN recognition model is a normal process image, the sliding window will move forward to collect new observations until an abnormal process image is identified.

According to the above two phases, the abnormal process with spatial-temporal data can be identified in a real-time way. It benefits from the powerful learning ability of the CNN recognition model.

## **4. Validation Results**

The validation of the proposed approach depends on the architecture parameters of the CNN model, which will be investigated by the spatial-temporal data collected from the pasting process in this section. To evaluate the ability of detecting process images under various architectures, the recognition accuracy of test data (ATD) is used for the evaluation index, which is a ratio of the number of samples recognized correctly and the total number of samples. When an abnormal pattern happens in the process, the CNN model is expected to identify the abnormal type as precisely as possible. Meanwhile, when the process is normal, the CNN model is required to recognize the normal pattern more accurately, so that the higher ATD indicates the better validity of the CNN model. The following validation analysis consists of the determination of layer and node number, and selection of the kernel size and pooling function. In addition, 700 process images with 50 × 50 size are collected from the pasting process under each process pattern separately, where 200 and 500 process images are selected randomly as the training and test samples, respectively. Training data are applied to learn the CNN recognition model with different architectures, and test data are used to compare their recognition accuracy to find the best one, which is conducted in Caffe, Ubuntu 16.04 with a Tesla K80 GPU.

#### *4.1. Determination of Layer and Node Numbers*

The number of layers and nodes discussed here contains the numbers of convolutional layers, convolution kernels per layer, and FC nodes. To achieve the optimal performance of CNN model, the number of convolution layers and kernels per layer are important parameters required to be specified in advance [27]. Adding more convolutional layers and kernels to the network could capture high-level features of input data at the price of making the model complex to train [28]. Thus, to obtain an optimal network architecture, the numbers of convolutional layers and convolution kernels will be determined layer by layer.

For the process image size of 50 × 50 size, the size of a condensed feature map will be 2 × 2 after four convolution and pooling operations, which cannot be further convolved. Thus, here we consider the CNN model with 4 convolution layers. Twenty scenarios for the numbers of convolution kernels from 5 to 100 are tested via recognition accuracy. The average accuracy and standard deviation are used to measure the performance of the proposed CNN model. To evaluate recognition accuracy, all the results of proposed method are replicated at least 100 times, and the results are shown in Table 1. By comparing different scenarios in the first convolution layer, the highest average recognition accuracy can reach 95.18% when the number of convolution kernels is 65. Considering the same scenarios in the second convolution layer, the number of convolution kernels is set to 70 because the highest average accuracy is 96.36%. In a similar way, the optimal numbers of convolution kernels can be determined layer by layer, shown in Table 1. It is expected that the recognition accuracy will increase until the optimal number of convolution layer is found out.

Generally, the recognition accuracy will be improved with the increase of convolution layers. However, we can see that the recognition accuracy begins to decrease from the third convolution layer in Table 1. In this situation, it can be inferred that the optimal number of convolutional layer is 3, and the corresponding number of convolution kernels is 65, 70, and 80 respectively, which is denoted as 65-70-80.

Under the 65–70–80 structure of the convolutional layer, the number of nodes in the FC layer needs to be determined. Ten scenarios for the numbers of FC nodes from 100 to 1000 are shown in Figure 7. By comparing the average recognition accuracy and standard deviation in different scenarios, we find that the accuracy increases to 98.08% at 800 nodes, and, after that, the accuracy does not exceed 98.08% by adding more nodes. Additionally, we observe that the variation of recognition accuracy becomes lower as the number of nodes in FC layer increases. From Figure 7, it is noted that the difference of recognition accuracy using various FC node numbers is not obvious enough, especially using 600 and 800. In this research, the node number corresponding to higher recognition accuracy and lower variation is preferred. Because the recognition accuracy using 800 nodes is slightly better than using 600, and the optimal number of FC nodes is set to 800. For the above discussion, the optimal architecture of CNN model is denoted as 65–70–80–800.


**Table 1.** The recognition accuracy comparison among different number of layers and convolution kernels. The boldface entries represent the highest average accuracy under the current layer. Numbers in parentheses are standard deviation of recognition accuracy.

**Figure 7.** The accuracy comparison among fully-connected (FC) node numbers.

#### *4.2. Selection of Kernel Size and Pooling Function*

The sizes of convolution kernels and pooling function are also important architecture parameters of a CNN model. The size of convolution kernels will be first discussed. In general, a convolution kernel with a small size can capture details of abnormal information from process images. Thus, under the optimal architecture, 65–70–80–800, the CNN recognition models with kernel sizes 3 × 3, 5 × 5, and 7 × 7 are considered respectively, seen in Table 2. The results of the CNN model with three kernel sizes are obtained and the optimal kernel size can be determined as 3 × 3. Then, the optimal pooling function will be selected in the following.



Like the above discussion, the max-pooling and average-pooling shown as Equations (4) and (5) are widely used pooling functions. The comparison of recognition accuracy for two pooling functions is shown in Table 3. The max-pooling function is the best one, which can be used in the CNN recognition model.

**Table 3.** The recognition accuracy comparison for pooling functions. The boldface entries represent the highest accuracy.


To summarize, the CNN recognition model for abnormal pasting process images shown in Table 4 has been constructed, and its validation has been proved.


**Table 4.** The architecture parameters of the proposed convolutional neural network (CNN) recognition model for the pasting process.
