*2.3. Deep Learning Architectures Tested*

The efficiency of different DNN architectures was evaluated by dividing the Z\_all dataset into training (70%), validation (15%), and testing (15%) categories. The training, validation, and testing datasets were balanced, i.e., there was an equal number of sequences from each zone. All models were trained up to 100 epochs with categorical cross-entropy loss function and Adam optimizer [36]. Once the model was trained, the performance of the model was measured using the testing dataset, evaluating precision, recall, and F1 score.

The DNN models evaluated were a CNN, an RNN, and a bidirectional RNN and CNN combined with RNN. For the RNN, the GRU was used since it has recently been shown [34] that GRU slightly outperformed vanilla LSTM on almost most tasks and because the GRU is faster to train. The model studied is the following:


### **3. Results and Discussion**

The results of Table 3 clearly show that the architecture combining a convolutional layer and GRU network outperformed other models for all the metrics analyzed. Therefore, for the other datasets (z\_135 and z\_15) this model was used to analyze the performance of the model and the complexity of the dataset. Due to the reduced dataset for classifying between Zones 1 and 5 and the fact that selecting the best model was not among the aims of the current study, the dataset was divided into training (70%) and testing (30%).


**Table 3.** Model results for Z\_all dataset.

The results achieved with fewer zones (Table 4) were much higher than those yielded from all the zones. Moreover, the F1 Score for Z\_135 dataset was 0.9169 and for Z\_15 was 1. These results were outstanding, as they highlight the ability of the CGRU (Convolutional Gated Recurrent Unit) network to classify voltage sequences with high accuracy. The models with GRU units clearly outperformed those with CNN. This appeared logical because current research has indicated that GRU units deal accurately with sequences. In fact, CNN without any gate unit cannot satisfactorily classify WEDM spark sequences with a F1 score lower than 60%. Therefore, for classifying WEDM spark sequences, it is highly recommended to use DNNs with GRU units.

**Table 4.** CGRU model results for Z\_135 and Z\_15 datasets.


Focusing on models with GRU units, it is interesting to see that, in terms of precision, the model with bidirectional GRU achieved almost the same result as the GRU model but outperformed the GRU in terms of recall and F1 Score. This is interesting because the BiGRU model has less GRU units in the input layer (10 in the BiGRU and 50 in the GRU). However, the results were not sufficiently clear to draw the conclusion that for sequence classification, a BiGRU model would outperform the GRU model in all cases.

Similarly, analyzing the results from Table 4, it appears that adding a convolutional layer in the input of a GRU model helped to classify WEDM spark sequences. Thus, the first convolutional layer helped to extract features from spark sequences and then GRU units modeled these new sequences generated by the convolutional layer. Therefore, the results show that a CGRU model works accurately when classifying WEDM sequences with high precision (0.7260). Moreover, Table 4 shows that this model classified almost perfectly when dealing with less complicated datasets. Indeed, the model was capable of achieving 100% precision for classifying sequences of Zones 1 and 5.

In contrast, from the process point of view, these results can be analyzed as follows. As the wire got closer to the thickness variation point, the behavior of the signal varied. This can be observed in the confusion matrixes of Figure 4. As explained in Section 2.2, Zone 1 (Z1) was the one that described a stable process, while Zone 5 (Z5) was the one closest to the point at which the thickness change occurred.

Figure 4 displays the confusion matrix considering the available data for the five zones (Z\_all dataset). It can be observed that there was no interference between Zones 1 and 2 and Zones 4 and 5 or vice versa, or between Zones 3 and 5. From this result it can be stated that, when the cutting process came to be unstable, this was always successfully detected in advance by the neural network. For the experiment carried out, since the average feed for part thickness 100 mm was 1.4 mm/min, this means that there were at least 1.4 min to act before thickness variation occurred. Clearly, this time decreased when part thickness was smaller because of the higher feed, but in any case it would be tens of seconds (about 30 s when part thickness is 50 mm), which is more than enough time to make corrective actions. Moreover, the misclassification between zones was less than 3% among all

consecutive zones throughout the degradation process, excluding Zones 1 and 2 (in any case, less than 10%), which was the starting point of the cutting degradation process.

**Figure 4.** Confusion matrix for Z\_all datasets with the CGRU model.

Figure 5 shows the confusion matrix results when only Zones 1, 3, and 5 were used, that is, an stable cutting zone, an intermediate degraded zone and finally, the one in which the thickness variation occurred. Again, the first and most obvious consideration was that there is no confusion between Zones 1 and 5, which means that a stable cut was clearly distinguished from the nearest region to the thickness variation point. The mix between Zones 1 and 3 occurred in less than 4% of the cases; however, it is more likely that a degraded cut is confused for a stable cut than vice versa. Finally, it is worth noting that there were 1% of cases in which Zone 5 was classified as Zone 3. In conclusion, even if in 4% of the cases a thickness variation of 4 mm could not be predicted in advance, this change can always be predicted 2 mm earlier than it occurs.

**Figure 5.** Confusion Matrix for Z\_135 dataset with CGRU model.

To conclude, to the extent that the regions have been arbitrarily chosen, a variation in the behavioral pattern between a stable cut and a degraded cut was clearly shown in both cases (Z\_all dataset and Z\_135 dataset). Thus, the DNN enabled rapid action to be taken during the WEDM process.

Finally, Goodfellow et al. [10] stated that in supervised deep learning, to achieve acceptable results, more than 5000 labelled examples per category are needed and with at least 10 million labelled examples it is possible to match or exceed human performance. However, in this study, only 567 labelled examples for each category were available, considerably less than the recommended 5000. Although from a process point of view the results are excellent, one might consider that there is much room for improvement, because with more labelled examples, deeper ANNs can be used to reach higher precision results. However, as stated in the introduction, due to the difficulties in collecting large amounts of data in machine-tool workshops, in many cases it is not feasible to use DNNs. Therefore, if improvements in deep learning are to be exploited to their full potential for use in pattern recognition, then the companies and researchers involved in manufacturing and data collection must play a role in ensuring that new strategies are put in place.

#### **4. Conclusions**

The aim of this study was to evaluate the possibility of predicting an unexpected event during an industrial WEDM machining process by using DNNs to recognize hidden patterns from process raw voltage signals. A precision, recall, and F1 score comparison for different DNN models and datasets was provided. The results clearly showed that a model with a first convolutional layer with two GRU layers outperformed the other models. Moreover, this model achieved outstanding performance for the other datasets, with a precision of around 100%. From the process point of view, confusion matrixes showed that thickness variation can be predicted, at least 2 mm in advance, which allows sufficient time to act on machining parameters. New possibilities for applying DNNs in the field of advanced manufacturing and high-performance machine tools must be examined in the future. In particular, given the difficulty in collecting large quantities of labeled examples from machining processes, new strategies will need to be developed to resolve this problem. When large amounts of labeled data become available, the possibility of extensively applying DNNs in manufacturing will become a reality.

**Author Contributions:** Conceptualization, J.A.S. and A.A.; Methodology, A.C.; Software, A.A.; Validation, S.P. and J.W.; Formal Analysis, J.A.S. and A.A.; Investigation, A.C.; Resources, J.A.S.; Data Curation, A.C. and S.P.; Writing-Original Draft Preparation, A.A.; Writing-Review & Editing, J.A.S.; Visualization, A.A.; Supervision, J.A.S.; Project Administration, J.A.S.; Funding Acquisition, J.A.S.

**Funding:** The authors gratefully acknowledge the funding support received from the Spanish Ministry of Economy and Competitiveness and the FEDER operation program for funding the project "Scientific models and machine-tool advanced sensing techniques for efficient machining of precision components of Low Pressure Turbines" (DPI2017-82239-P) and UPV/EHU (UFI 11/29). The authors would also like to thank Euskampus and ONA-EDM for their support in this project.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
