**1. Introduction**

Machine tools and in general, manufacturing industries, have traditionally been characterized by relying on empirical approaches when it comes to process optimization. Due to the large number of phenomena and variables involved in each operation, the practical application of theoretical models is difficult. In fact, although theoretical models are very interesting for understanding the underlying physical phenomena, they usually exhibit important limitations for industrial practice.

This fact is particularly evident in the case of manufacturing components for high-added value sectors, such as aircraft manufacturing. The aerospace industry has experienced an exponential increase in recent years. It is expected that by 2032 there will be double the total number of aircraft worldwide [1]. This trend has generated great investment by manufacturing companies in order to adapt their products to the increasing tolerance and accuracy requirements that are demanded by this sector. Unconventional machining processes have gained acceptance and within them, much attention has been directed towards wire electrical discharge machining (WEDM). This technology allows for the processing of difficult-to-cut, extremely hard materials with very tight tolerances and with impressive surface finish [2,3]. Nonetheless, trial and error approaches are still required for process optimization due to the above-mentioned limited accuracy of theoretical models [4]. In this context, artificial intelligence (AI) and more specifically, deep learning (DL) techniques appear to be an interesting approach, provided that massive amounts of data can be collected from the process.

Deep learning using Deep Neural Networks (DNNs) has achieved impressive state-of-the-art results in very difficult learning tasks such as image recognition [5], handwriting recognition [6], natural language processing [7], image description [8], and mitosis detection [9]. In contrast to shallow neural networks (SNNs), in a DNN, a series of hidden layers extract abstract features from a sequence or images [10]. Because of this, an overwhelming number of new applications are being developed in the field of information and communication technologies (ICTs), including automatic translation and voice recognition, thus increasing the interest from both academia and industry. A brief summary of the main network architectures for deep learning is presented in the following paragraphs.

For processing sequential data, recurrent neural networks (RNNs) are a common approach in many fields [10]. The earliest attempts to train RNNs were made by Rumelhart et al. using back-propagation through time [11]. Later, Elman introduced the Elman network with feedback from the output of the hidden layer to the input of said layer [12]. However, these training methods and architectures do not deal properly with long-term time dependencies due to vanishing and exploding gradients [13]. Thus, in order to solve the vanishing gradients problem, in 1997 Hochreiter & Schmidhuber introduced the long short-term memory networks (LSTMs) [14]. Unlike the classic RNN, an LSTM uses gates to decide whether or not to keep the existing memory [15]. Thus, an LSTM unit is able to keep an important feature over a long distance and therefore, deal with long-term time dependencies. Although variations of the LSTM architecture have been proposed, probably the most commonly used is the gated recurrent unit (GRU) that replaces the input, forget, and output gates by an update gate and a reset gate, reducing the number of gates from 3 to 2 [16].

To extract features from sequences of data, convolutional neural networks (CNNs) exhibit outstanding performance. CNNs are feed-forward neural networks that combine three ideas: local receptive fields, subsampling, and shared weights [17]. The local receptive fields and subsampling ideas were already in the neocognitron neural network model proposed by Fukushima [18]. By using CNNs with local receptive fields, neurons can extract features from images (2D structures), sequences, or time series (1D structures). From a convolutional layer, multiple futures can be extracted using several future maps. Furthermore, by combining these features in the subsequent layers, CNNs are capable of detecting higher-order features. Generally, each convolutional layer is followed by a subsampling layer that reduces the resolution of the feature map to reduce the sensitivity of the output to distortions [19]. Unlike the neocognitron, a CNN is trained with the back-propagation technique. Thus, weight sharing reduces the number of free parameters, improving generalization.

When looking at industrial applications outside the cope of ICTs, DNNs have been traditionally used in fault diagnosis for various sectors. For instance, Yin et al. [20] presented a novel method for fault diagnosis in high-speed railways, which is currently based on manual operation. In particular, the authors proposed an automated diagnosis network in order to detect failures in vehicle-on-board equipment. The results show that a deep belief network outperforms other trained networks and improves the accuracy of fault diagnosis by up to 95%. A further illustration can be found in the selection of different techniques for improving fault diagnosis in rolling bearings [21]. Following a thorough study of different AI techniques, the authors concluded that rule based method could become an extremely versatile tool in the fault diagnosis of rotating machinery.

Efficient training of deep neural networks is only possible if a massive number of labeled data is available to apply back-propagation training algorithms and this is not always possible in manufacturing environments. Though, some interesting approaches can be found in scientific literature. Most published research is devoted to optimization of process parameters in advanced machining techniques (such as laser cutting, electro-chemical machining, ion beam micro-milling, and grinding) using SNNs [22–25]. Only a very limited number of studies have examined the use of DL in machining. In a very interesting recent study, Wang developed a DL based approach to material removal rate prediction in polishing technologies [26]. A pattern recognition, identification, and process control system has also been developed by Gunter to achieve an intelligent laser-welding machine [27].

To the knowledge of the authors, none of the published research focusing on the WEDM process has addressed process modeling using deep learning techniques. Selection of optimum process parameters has been a classic application of SNNs [28], which have also been used [29] to extract information about degradation of the cutting conditions during WEDM. In a more recent study, Conde et al. [30] proposed using a variant of SNNs to predict the accuracy of components machined by WEDM. By combining the predictions of the network with the simulated annealing optimization technique, wire paths of variable radii can be designed so that radial deviations due to wire deformations can be minimized. The results revealed that the average deviation between network predictions and actual components is below 6μm, which falls within the current limits of process accuracy. The search and recognition of behavioral patterns of voltage and current signals in the WEDM process has been studied by Caggiano et al. [31,32], who presented a SNN that effectively correlates voltage and current signals with the defects and marks originated on the machined component during the WEDM process. In all cases, the success of the network exceeded 81%.

At this point, it must be highlighted that during the WEDM process, extremely large amounts of data can be collected using high-frequency voltage and current probes. Data from every single discharge during the process can be collected and patterns that contain useful information about the actual machining process, no matter process conditions, can be examined. Taking into account the efficiency shown by DNNs in tasks related to pattern recognition, in this work an original contribution to advance unexpected event prediction during practical WEDM operations using deep learning techniques is presented. The occurrence of an unexpected event, namely a change of the thickness of the machined part, can be predicted in advance by recognizing hidden patterns from process signals. Raw data are directly obtained from the machining process carried out in industrial conditions by using voltage and current probes and a high-frequency oscilloscope. Various DNN architectures have been studied and it was found that the combination of a convolutional layer with gated recurrent units achieved the best performance. By adopting this approach, thickness variation can be predicted in 97.4% of cases, at least 2 mm in advance, which is fast enough as to act before the WEDM process is degraded. New possibilities for applying DNNs in the field of advanced manufacturing and high-performance machine tools must therefore be examined in the future.

#### **2. Materials and Methods**

#### *2.1. Instrumentation and Measured Variables*

The WEDM erosion mechanism is based on the generation of discrete short discharges between two electrically conductive electrodes (wire and workpiece) in a fluid dielectric medium. In a recent paper, Almacinha et al. [33] proposed the feasible possibility that in the sinking electrical discharge machining (EDM) process, with hydrocarbon oil as a dielectric, multiple discharges occur during pulses with long on-time. However, in the case of ire WEDM, the short duration of the pulses (1.2 μs in commercial machines as the one used in the experiments) and the use of deionized water as a typical dielectric medium, the possibility of occurrence of multiple discharges during one pulse has not yet been proven. Therefore, in this work, the common hypothesis in WEDM [2] that workpiece material removal mechanism is attributed to the consecutive occurrence of discharges was used. As each single discharge generated a crater of a few micrometers on the workpiece, the combined contribution of millions of these resulted in the removal of part material, thus drawing the shape of the part [34].

Figure 1 shows voltage evolution during a single discharge, as collected in an actual WEDM operation. A discharge can be divided into three parts. Before a discharge occurred (Phase 1, Figure 1), an off-time (voltage zero) was programmed, during which the gap between electrodes was cooled down and dielectric flow tried to remove the debris resulting from the previous discharge. In Phase 2 (Figure 1) the isolating capacity of the dielectric (deionized water in this case) was locally broken by the application of a voltage (commonly known as open-circuit voltage, see Table 1) between the wire and workpiece. The open-circuit voltage was applied by the machine generator. Then, ionization started (voltage signal was constant and current was zero amperes). This period, known as ionization time, was not controlled by the machine generator but by the local conditions of the dielectric. In other words, ionization time was not a machine parameter: for each single discharge, it depended on the specific electrical conductivity conditions of the dielectric. If flushing was effective and the gap was clean, ionization time was long. On the contrary, if flushing was difficult and debris was present in the gap, ionization time was short or even zero. Finally Phase 3 (Figure 1), when the electrical local conductivity of the dielectric between wire and workpiece was high enough (ionization ended), voltage dropped and current flowed during the on-time, resulting in part material removal due to the generated heat. For the experiments carried out, discharge current during on-time was 5A (see Table 1). This general pattern was reproduced during the process, although it is difficult to model phenomena such as the presence of debris between the electrodes, since differences were introduced between the discharges, affecting process performance.

**Figure 1.** Voltage signal evolution during a single discharge in an industrial wire electrical discharged machining (WEDM) operation.

**Table 1.** Electrical parameters as selected by machine table look-up.


Each single discharge contained valuable information about process performance. As shown in the previous section, other authors have recently attempted to correlate process signals with final part quality. However, advanced pattern recognition may be far more efficient in evaluating process performance. The fact that large amounts of information can be collected during the process (with sampling rates as high as 10.0 MS/s, as explained below) opens the possibility of training DNNs that have already proven their excellence in other fields (see Section 1).

Voltage sequences were acquired during WEDM using a high-frequency oscilloscope Tektronix DPO5034B (Tektronix UK Ltd., Berkshire, UK) and a voltage probe Tektronix TMDP0200 (Tektronix UK Ltd., Berkshire, UK) connected to the WEDM machine. Figure 2 shows an example of voltage signal measurement. The sampling rate was 10.0MS/s, with a resolution of 100 ns. This is required to record any possible event during ionization of the discharge channel. A significant number of consecutive discharges must be recorded, because, as shown in Figure 2, there was a certain degree of variability between them. Therefore, a signal length of 200 μs was chosen. Measuring range for the voltage probe was set at −120 V to +120 V. The reason for this was that, although open-circuit voltage was set at 60 V (see Table 1), some random voltage peaks appeared and they would also be recorded. Also, commercial WEDM machines implement the feature of ensuring zero average voltage to avoid parasite currents that may affect process performance. This is why wire polarity changed as shown in Figure 2.

**Figure 2.** Example of voltage data sequence.

#### *2.2. Experimental Methodology: WEDM Tests*

In order to simulate the conditions of a degraded WEDM operation, a typical industrial situation in which process parameters cannot be controlled in advance was reproduced in our experiments. During an industrial operation, WEDM process parameters are set by machine-table look-up. Those parameters apply to a given combination of factors including part material, part thickness, and machining time. The parameters are available in the machine, and have been obtained through controlled experiments by the machine manufacturer. The machine user therefore finds in the WEDM machine the optimum combination of parameters for his/her application.

However, during WEDM cutting, operation conditions may vary. A typical example is an unexpected variation in part thickness. Parameters can be changed on-line once thickness change has happened [35] but anticipation of thickness change before it occurs has not been addressed. In fact, this feature is not present in commercial machines.

Controlled experiments were designed and carried out involving variation of part thickness during a WEDM operation. Part material for the samples was AISI D2 (ISO 160CrMoV12) 62 HRc tool steel, quenched and tempered. Stepped sample parts were prepared for the experiments, in which the WEDM cut faced a sudden thickness variation from 100 mm to 80 mm. Figure 3 shows a scheme of the process during the cutting of the test part.

**Figure 3.** Scheme of the process with the sample geometry.

The experiments were conducted under industrial conditions using commercial WEDM machinery (ONA AX3 WEDM machine, ONA Electroerosión S.A., Durango, Spain). The wire used was an uncoated brass wire (CuZn37) of 0.25 mm diameter, with ultimate strength of 900 N/mm<sup>2</sup> and 1% elongation. As explained previously, WEDM electrical parameters (listed in Table 1) were selected by machine table look-up, and they correspond to roughing conditions.

When the wire approached the point of thickness change, the cut began to degrade because dielectric pressure was lost. This resulted in the occurrence of different voltage patterns in the discharges with respect to those occurring when the cut was performed under optimum conditions. To the best of our knowledge, there is not yet an industrial system that can detect such a pattern change. In order to conduct a systematic analysis, voltage sequences were collected at different distances from the point of thickness change. To do so, five different zones were defined: 5 mm away (Zone 1) from the point of thickness change, 4 mm away (Zone 2), 3 mm away (Zone 3), 2 mm away (Zone 4), and 1 mm away (Zone 5). The closer the wire to Zone 5 (in other words, to the point of thickness change), the more degraded the cut will be.

Hence, five different zones of 1 mm length were established in order to adequately describe the cutting process degradation. The recording length was set to 0.8 mm, so that the oscilloscope could be reset during the remaining time. This allowed for a total of 567 sequences of 2 ms to be recorded, with a resolution of 100 ns and a sample rate of 10.0 MS/s. Thus, mean values of 140 discharges per sequence were recorded. Furthermore, this process was repeated 16 times in order to accumulate an appropriate number of tests.

These collected data were used to generate three different datasets. The first was used to study different DNN architectures to classify the voltage sequences of each zone. The second dataset was used to check the performance of the DNN to classify the sequences for Zones 1, 3, and 5. Finally, the third dataset was used to check the performance of the DNN for the slightly less ambitious task of classifying the sequences of the first and fifth zones. The Z\_all dataset was balanced, so that there are an equal number of sequences for each class. For the other two cases, all of the available data were used in order to avoid significantly reducing the training dataset. However, the number of samples for each class was similar and there was little difference between the zones, as can be seen in Table 2.


**Table 2.** Dataset used.
