*3.2. Data Analysis*

3.2.1. Layer 1: Sensor Data

Sensor data layer capture is accomplished by three boards, which gather the variables listed in Table 1. Once the data of the different failure cases have been obtained, they are analyzed and compared with those of the normal operation case. It is important to know the shapes and values of the data distributions in order to better understand the indicators to be extracted in subsequent layers, since this allows us to assess the best strategies to analyze them and perform a more efficient maintenance.

We will begin by analyzing the data obtained by the energy managemen<sup>t</sup> board. Some of the measurements taken are voltage measurements at different points or temperature measurements, among others. When comparing the voltage data of various banknotes according to their orientation and obtained in different situations, differences can be appreciated. In Figure 13, it can be seen how the failure case shows higher values for *Vaux* than the case of normal operation. Although these are differences of a very small order of magnitude, given that the values present a very small variation, they must be taken into account.

The engines board has sensors that take measurements related to the mechanical operation of the machine, such as consumption of the machine's motors or an FFT for the vibrations analysis. Figure 14 shows the current consumption during the passage of different banknotes. We can see that when introducing the modification in the machine, the current figures have been altered. Although there is still a significant overlap in the ranges, the values of the failure case present values below what would be considered normal.

**Figure 13.** Comparison of the Vaux tension measurements of a 5 € banknote on reverse back orientation in the case of normal operation (**left**) and a failure case (**right**).

**Figure 14.** Comparison of the measurements of the transport motor current of a 50 € banknote on reverse front orientation in the case of normal operation (**left**) and that of a failure case (**right**).

The last of the boards included in the complex machine analyzed is responsible for monitoring the condition of the banknotes through various measurements. One of the sensors used provides values that are proportional to the thickness of the banknote passing through the machine, called the doubles sensor. Figure 15 shows the measurements obtained in the normal case versus one of the failure cases analyzed, presenting clear differences that would allow the identification of such operation as erroneous.

### 3.2.2. Layer 2: Board Data

The amount of data provided by the machine is very high, since for each banknote (a banknote takes 610 ms to pass through the machine) 33,000 bytes would be received from the engines board, 9150 from the banknotes board and 7320 from the energy managemen<sup>t</sup> board. Therefore, the need to reduce this number through filtering and extraction of indicators becomes evident.

In order to perform an initial filtering to reduce the number of data to be processed, it is decided to use only the data relating to the passage of a banknote through the machine. In this way, all data taken between banknotes are discarded. In addition, since the measurements of some sensors are only of interest when the banknote passes through them, it is necessary to generate a specific window for each of them. For this purpose, position sensors are used, which allow us to know the position of the banknote in the machine, being able to select the data only for those moments. Just through this filtering, we reduce to 8808 bytes per banknote from the engines board, 640 from the banknotes board and 394 from the power managemen<sup>t</sup> board, a reduction of an order of magnitude.

When proposing the indicators to be extracted, a layered data analysis was chosen, as shown in Figure 16. The first layer would be the sensorization layer, the output of which is the raw data. After the filtering process that would follow the sensorization layer, the next layer would be that of the indicators per banknote, in which various indicators corresponding to each note are obtained. Finally, the last one would be that of the indicators per bundle, which aggregates the indicators per banknote into groups of a given number.

**Figure 16.** Diagram of the designed monitoring infrastructure.

Observing the output data rates of the last of the layers, it can be seen that a reduction of three orders of magnitude in the bytes per second that are obtained from each board has been achieved. With this extraction of indicators, by comparing the values of the training phase with the values obtained in subsequent measurements, it will be possible to detect deviations that will allow the identification of possible failures.

The indicators per banknote to be used are, in general, the means, medians, maximums, minimums, standard deviations, asymmetries and kurtosis of the different measurements available in the frames, adding the effective value in the case of currents.

Regarding the FFTs with a Hanning window (Figure 6) obtained for the vibration data, a more complex analysis is conducted. To obtain the indicators extracted in the layer of indicators per banknote, the areas under the curve of different parts of the FFT will be

obtained. In order to discover the more interesting parts, we will begin by identifying the existing peaks. This identification consists of two phases: a first one in which the base noise is eliminated, leaving a flatter FFT in which the peaks stand out more; and a second one in which the peaks higher than half the maximum value are marked. Having identified the most interesting parts as those that concentrate the majority of the peaks, the areas under these parts of the FFTs will be used as the indicators of the vibration data (the limits of the areas are defined based on observation). In a preliminary analysis of the FFTs, it has been seen that in most of them, there are two areas of interest in which most of the peaks are concentrated (see Figure 17). Therefore, it is decided to work with these two areas for subsequent analysis.

**Figure 17.** Graphs of the different phases of the identification process, from the original FFT, to the FFT without the base noise, with the detected peaks and the colored areas of interest.

Once the indicators per banknote have been extracted, they are passed to the layer of the indicators per bundle. Since the objective is the data reduction for a fault detection application, it is not sought to have an instantaneous view of the machine operation. A broader vision that allows observing the variations in a larger temporal space is more interesting. Therefore, the integration of the banknote level indicators will be done in groups of the same number of banknotes, from which indicators will be extracted per bundle. The indicators extracted from the indicators of the previous layer will be the same seven previous statistical values as above, the means, medians, maximums, minimums, the standard deviations, the skewness and the kurtosis.

### 3.2.3. Layer 3: Machine Data

Finally, after obtaining the indicators from the bundle layer, the indicators reach the machine level. In this layer, conclusions will be drawn from the indicators extracted in the previous processes. For this purpose, this analysis seeks to identify the most relevant indicators for each failure case, as well as the type of variation that should be expected based on their probability distributions. Next, we will comment on the results obtained by comparing the distributions of the indicators of the respective failure case with those of the normal case.

The objective is to indicate whether the indicators of the failure case have higher or lower values than those of the normal case, as well as the degree of discordance between the distributions of these indicators. The indicators analyzed will be the indicators perbundle-mean. If no specific indicator is mentioned (AVG, Med, MAX, MIN, DES, SK or KUR), the mean values are assumed to be the ones mentioned.

To assess the direction of the variation, the median of the distributions is used. On the other hand, to assess the degree of discordance, the Kullback–Leibler divergence is used. This is a unitless measure that compares the probability densities of two distributions. It provides values close to zero with two similar distributions and it grows as the difference between both distributions increases. It is not symmetric, so two calculus are made, considering first the normal case and then the failure one (*P Q*) and then viceversa (*Q P*). From these two values, the larger one is the one considered. The choice of the

Kullback–Leibler divergence as a measure of comparison of the data distributions obtained in each failure case is based on the fact that the final model for failure classification will be implemented by neural networks trained with the cross-entropy cost function, which is directly related to the divergence measure. Thus, the nomenclature used is the one shown in Table 3 (limits used are based on experimental observation) and the results obtained can be seen in Table 4.

**Table 3.** Nomenclature used to classify the differences between distributions.


It is convenient to take the data in the summary table with caution, since there are distributions that, although they do not present divergences greater than the minimum, they do show variations with respect to the normal case.


**Table 4.** Summary of the information of the indicators of interest in each case.

Next, the indicators of interest for the three failure cases in group five, associated with defects in the doubles sensor (case 5), will be shown. This case of failure has been chosen because it presents deviations in a grea<sup>t</sup> variety of indicators.

In the data of the **transport motor current** (Figure 18), it can be seen how in all cases of failure, the values obtained are reduced, the most notable being that of the first failure. In the following cases, in Table 5 it can be seen how the medians decrease in the distributions and the divergences increase as the eccentricity increases.

**Figure 18.** Comparison of the probability distributions obtained for the mean values of the transport motor current in the case of faults associated with the double sensors. Indicator: mean.

**Table 5.** Kullback–Leibler divergence values associated with the mean values of the transport motor current for the identification of the faults associated with the double sensors.


The data from the **doubles sensor 1** (Figure 19) shows deviations in all three cases analyzed, something that might be expected, as the defects are introduced in the sensor itself. Regarding the average values, it is seen that the one that suffers the most divergence is the first failure of the chopped roller, while those associated with eccentricities show smaller variations, but that vary in a linear way with increasing eccentricity. Analyzing the shape statistics, it can be seen how the first failure shows values quite similar to the normal case of standard deviations and coefficients of skewness and kurtosis. However, the failures associated with eccentricities show much larger differences, highlighting the standard deviation in the case of greater eccentricity, which is more than five times higher than that of the normal case. All this can be supported by the divergence values obtained in Table 6.

**Figure 19.** Comparison of the probability distributions obtained for the mean values of the doubles sensor 1 measurements in the case of failures associated with the doubles sensors. Indicators: (**top left**) mean, (**top right**) standard deviation, (**bottom left**) skewness and (**bottom right**) kurtosis.

**Table 6.** Kullback–Leibler divergence values associated with the means, standard deviations, skewness and kurtosis associated with the mean values of the double 1 sensor measurements for the identification of the failures associated with the double sensors.


Regarding **the doubles sensor 2** (Figure 20), distributions of the mean values are very similar to those of the previous sensor. However, in the shape statistics, there are differences with respect to the previous sensor in the cases of failures associated with eccentricities. The standard deviations are no longer so far apart, although they still show considerable divergences (Table 7). The asymmetries no longer present values so far away from the normal ones, and only in the case of higher eccentricity is a significant divergence observed.

**Figure 20.** Comparison of the probability distributions obtained for the mean values of the doubles sensor 2 measurements in the case of failures associated with the doubles sensors. Indicators: (**top left**) mean, (**top right**) standard deviation, (**bottom**) skewness.

**Table 7.** Kullback–Leibler divergence values associated with the means, standard deviations and asymmetries associated with the mean values of the double 1 sensor measurements for the identification of the failures associated with the double sensors.


Analyzing the values obtained for the Vint voltage (Figure 21), it can be seen that the second fault shows hardly any variations with respect to those of the normal case, while the other two show values higher than the usual ones. These effects are reflected in the divergences in Table 8, obtaining the highest value in the fault with the highest eccentricity.

**Figure 21.** Comparison of the probability distributions obtained for the mean values of the voltage Vint in the case of failures associated with the double sensors. Indicator: mean.

**Table 8.** Kullback–Leibler divergence values associated with the means associated with the average values of the Vint voltage for the identification of the failures associated with the double sensors.


Observing the results of this first analysis (Table 4), it is clear that some measurements such as the feed motor currents (I\_feed), some infrared pass-times or the intervals between encoder pulses do not provide much information. In the case of the supply currents, it may be due to the fact that none of the altered elements affected it directly, which means that there are hardly any changes from one case to another. As for the T\_IR31 and T\_IR33 times, since they are associated with short distances compared to the T\_IR11 time, and with sections far away from the middle zone where the defects are located, this means that they are not so easily altered. Finally, the intervals between encoder pulses present a multimodal distribution that makes the extraction of information more difficult.

By detecting these differences with the naked eye on a measure related to the crossentropy cost function used to train the network, it is expected that the neural model in charge of processing these indicators will also be able to identify the failures. To this end, a feedforward multilayer neural network with one hidden layer was explored and, after testing several architectures, a 39:128:14 architecture was chosen. The 39 input neurons correspond to the most relevant indicators, while the 14 output neurons are associated with the 13 failure cases analyzed and the case of normal operation.

Due to the large variety of ranges present in these input variables, a normalization is performed to equalize the contributions of each variable to the multilayer perceptron. This normalization is performed before the data enters the network. Between the two processing layers that compose it, another batch normalization layer has been added with the same objective.

The processing layers that compose the model are two dense layers of 128 and 14 neurons, respectively. The first layer uses the RELU function as the activation function. In the case of the second one, there is one neuron for each class and the activation function

chosen is softmax. Thus, the output of each neuron will be between 0 and 1, summing all of them 1 (generates a probability distribution), assuming that the output with the maximum value is the correct failure/normal case.

The supervision labels follow the one-hot encoding (13 labels to zero and the corresponding class to one) and the optimization of the training hyperparameters was performed with the Adam optimization algorithm [28]. Figure 22 shows the summary of the implemented MLP architecture. Cross entropy was used as a cost function to perform the training over 40 epochs.


**Figure 22.** Summary of the implemented MLP.

A cross-validation methodology was implemented to perform the training phase of several networks to generate the full model. To reduce the possibility of the data sets in the cross-validations becoming unbalanced due to sparse data, cross-validation with a low k-fold of 5 was used. In addition, to maintain data representativeness, these subsets were randomly generated using a stratified split to ensure that each test subset and training subset has statistical values of the label distribution equivalent to those of the whole dataset. In this way, five MLPs are generated, with the final model prediction being the mean or vote of the five different independent results.

At the end of the training, five MLPs were obtained with accuracies around 90%. However, as the classification result will be the result of the vote of the five networks, the reliability of the final model is even higher. The possibility of three networks being wrong, generating a bad prediction, would be close to 1%, which considerably increases the confidence of the classification. Figure 23 shows the confusion matrices of two of the five networks, which confirm the good performance of the methodology when applied to a specific case of fault detection on complex machinery.

Despite the good results obtained, it must be taken into account that this model has been made with data that were not isolated for validation. This has been the case because, as there is not a large amount of data available, we have worked with the available data, extracted manually by the manufacturers themselves. That is why the neural model, although showing good results and could be developed at some point, must currently be taken with caution. However, the fact that manual rule-based analysis of the results shows perceptible differences proves that the effort put into preprocessing the data to reduce the number of bytes to be sent has been successful. In this way, the objective of our proposal is reached, being able to identify the failures with a smaller amount of information and also being able to implement this methodology in complex machines with limited capacities.

**Figure 23.** Confusion matrices of two neural networks designed to recognize the 14 failure cases.
