1. Introduction
In the production of electric motors, the automotive industry relies on a new technique known as hairpin. Instead of twisted copper coils, single copper pins that are bent like hairpins are used, which give the technology its name. These copper pins are inserted into the sheet metal stacks of a stator and afterward welded together in pairs. As with conventional winding, the result is a coil that generates the necessary magnetic field for the electric motor. This method replaces complex bending operations and enables a more compact motor design while saving copper material [
1,
2]. Depending on the motor design, between 160 and 220 pairs of hairpins per stator are welded. If at least one weld is defective, the entire component may be rejected. Therefore, continuous quality control is necessary and the weld of every hairpin pair should be monitored [
1].
In most cases, a laser is used to weld the pins. The laser welding process enables a very specific and focused energy input, which ensures that the insulation layer is not damaged during the process. In addition, unlike electron beam welding, no vacuum is required and laser welding is a flexible process that can easily be automated in a short cycle time [
1]. The lower-cost laser sources that are scalable in the power range emit in the infrared wavelength range, which is comparatively difficult for working with copper. At this wavelength of about 1030 nm or 1070 nm copper, is highly reflective at room temperature, so very little incoming laser light is absorbed [
1,
3]. Just before reaching the melting temperature the absorption level rises from 5% to 15% and reaches almost 100% when the so-called keyhole is formed. Based on this dynamic, the process is prone to defects and spattering [
4]. A spatter occurs when the keyhole closes briefly and the steam pressure causes the material to leak out of the keyhole. If the ejected material gets into the stator, it may cause short circuits or other defects [
1]. In addition, less material will be used to form the weld, which often leads to a loss of stability. For these reasons, it is extremely important to prevent spatter as much as possible. Various processes can improve the welding result on copper. Three approaches are briefly touched upon below. By moving the laser spot fast and simultaneously during forwarding motion (wobbling), stable dynamics can be created in the weld pool. This can improve the process quality when welding with an infrared laser. Another approach is welding with different strengths of inner and outer fiber core. This means that the inner fiber core is used to create the desired welding depth with high intensity, while the molten pool is stabilized by an outer fiber core—the fiber ring. In addition, there is the possibility of using a visible wavelength of a green laser, which results in higher absorption of the laser light and thus higher process reliability [
5,
6,
7]. Furthermore, there may also be external causes that lead to spattering. These include, for example, contamination, gaps, misalignment or an oxidized surface.
The correct setting of the laser welding parameters such as laser power, speed and focus size is very important in copper welding. In addition, the process may not drift or this must be detected at an early stage. The presence of spatter on the component can be used as an indicator of an unstable situation in the welding process, as its occurrence is closely related to the quality of the weld seam [
8,
9]. Due to the briefly mentioned reasons, it is essential to monitor the welding process while focusing particularly on spattering. This allows a conclusion about the quality of individual welds, the occurrence of defects, as well as the overall quality of the stator. An important requirement is also fast process time which is a prerequisite for a system to be used in large-scale production. The welding of an entire engine takes just a bit more than one minute and quality monitoring should not slow down the process [
1,
2].
Currently, there are only a few machine learning applications that are used for quality assessment in laser welding [
10]. Some approaches are presented by Mayr et al. [
11], including an approach for posterior quality assessment based on images using a convolutional neural network (CNN). They use three images in the front, back, and top view of a hairpin, to detect quality deviations [
12]. In [
13] a weld seam defect classification with help of a CNN is shown. They achieve a validation accuracy of 95.85% in classifying images of four different weld defects, demonstrating the suitability of CNNs for defect detection. Nevertheless, some defects cannot be seen visually on the cooled weld seam. For example, pores in the weld seam or a weld bead that is too small due to material ejection can not be visually distinguished from a good weld seam.
That is why imaging during the process of hairpin welding offers more far-reaching potential for machine learning than subsequent image-based inspection of the weld. Important criteria are the mechanical strength of the pin and the material loss [
12]. Both criteria are in correlation with a stable welding process and the occurrence of spatter. For spatter detection, a downstream visual inspection of the component is also possible [
14]. However, this approach is problematic for hairpins since there is little material around the hairpins that can be verified for spattering.
Therefore, this paper presents an approach that enables spatter detection during hairpin welding. One of the main challenges of spatter detection directly during the welding process is the fast execution time on hardware with low computing power. The algorithm should be executed directly in the production line, where the installed hardware is often fanless and only passively cooled due to ingress protection. Another important issue is the amount of training data. Since this is an application in an industrial environment, training should only be done on a small data set so that the labeling effort is low and the algorithm can be quickly adapted to new processes. These two aspects are considered in the following.
In
Section 2 the data basis and the analysis methods are presented in detail. On the one hand, the network architecture is discussed, but also comparative algorithms, such as morphological filters with their configurations, are presented. Subsequently, in the result section, the training parameters and the results are shown. Finally, the results are discussed and summarized in
Section 4.
3. Results
We trained different models of the small SDU-Net architecture for each input data generation approach,
coaxial single,
coaxial complete, and
lateral complete. All models were trained with a batch size of 6, an input of gray-scale images in the size of
pixels, and 500 steps per epoch. We used the Adam Optimizer and started the training process with a learning rate of
. The learning rate was reduced by 5% after 3 epochs without any improvement until a learning rate of
is reached. The training process was stopped when no further improvement has occurred in 20 consecutive epochs. This results in different long training times for the different models. The loss value and the accuracy of the different models can be seen in
Table 1. To verify the results during training, we used validation data sets. These contained 3 images each for
coaxial complete and
lateral complete and 18 images for
coaxial single, according to the small database. The validation data sets were also enlarged with strong use of data augmentation. After the training, we used a separate test data set, each containing 50 images with the corresponding ground truth image.
Because the number of pixels per class is very unbalanced, and especially the less important background class contains the most pixels, we used the loss functions weighted dice coefficient loss (DL) and the categorical focal loss (FL) [
25]. The network results are shown in
Figure 4 and in
Table 2.
The advantage of focal loss is that no class weights have to be defined. The loss function, which is a dynamically scaled cross entropy (CE) loss, down-weights the contribution of easy examples and focuses on learning hard examples:
The two parameters and have to be defined. The parameter represents the balancing factor, while is the focusing parameter. The CE loss is multiplied by the factor . This means that with the value and a prediction probability of , the multiplier would be , i.e., , making the FL in this case 100 times lower than the comparable CE loss. With a prediction probability of , the multiplier would be , making the FL already 1000 times lower. This gives less weight to the easier examples and creates a focus on the misclassified data sets. With the FL works analogously to the cross entropy. Here the values and were chosen.
For comparison, we used the weighted dice coefficient loss, where the loss value is calculated for each class, weighted with the respective class weighting, and then added up. The class weights were calculated based on the pixel ratio of the respective class in the training images. The classes that contain only a few pixel values, such as the spatter class, must be weighted more heavily so they are considered appropriately during training. Since the values are calculated based on the number of pixels in the training data, these weights vary between the different input data sets.
Besides training individual models for each input data approach, we also trained one model for the prediction of all data, the coaxial and lateral view, summed, and single images. Since the different data sets have the same classes and a similar appearance, one model approach is also possible. The advantage of this approach is that a higher variance of data can be covered in one model and therefore we do not need to define new models or parameters for each data type. To train the global model we used 14 images of each coaxial and lateral complete data set and 34 coaxial single images.
Another advantage is that additional classes can be added to the model. We have introduced a new class, which includes the cooling process of the welding bead. From the moment the process light turns off, the weld is assigned to the cool-down class. This class cannot be identified via the previously described structure recognition with the subsequent exclusion procedure using the morphological filter. Only image elements of different sizes can be detected and distinguished from each other. For elements of similar size with different properties, the method reaches its limits.
The result of training the small SDU-Net as a single model for all data is also shown in
Table 1. All four classes were considered in the training process. In the summed images, the cooling process is not visible. For comparison, a SDU-Net model with twice the number of filters was trained. This net has more trainable parameters, but in our test, no significantly better results in loss, accuracy as well as in evaluation could be obtained. In addition, the results of the comparatively small U-Net model are shown.
The classification results are compared using the Intersection over Union metric (
IoU). The metric compares the similarity between the predicted classification mask and a drawn annotation mask as ground truth by dividing the size of the intersection by the size of the union:
In
Table 2 the evaluation results of the different approaches are shown. The second value in the rows shows the IoU for all pixels in the entire image. The dark area around the weld is most correctly classified as background. Especially for the coaxial single dataset, the background takes up the main part of the image, so the IoU over the entire image is very high for all methods. Therefore, the specific class pixels are again considered separately. This consideration can be seen in the first values in the table. This represents the IoU based on the pixels assigned to a specific class, except for the background class. This value gives more information about the actual result than the total IoU. However, the larger the area of the background, the fewer pixels are included in the calculation of the class-specific IoU. As result, the value is more influenced by individual misclassified pixels.
In
Figure 4 and
Figure 5 the first value, without consideration of the background pixels, is used. As shown in
Figure 4 the two weighted loss functions, FL and DL, result in comparable distributions in which the DL performs only marginally better.
Using a single model trained on all three data sets, an outlier with IoU close to 0 can be seen in each of our test sets in
Figure 4. There, a shot during the cooling process with spatter was misclassified as process light. When using different models per data set, this error case did not occur. On the one hand, the error can be attributed to an underrepresentation of the cooling class in the overall data set, since this only occurs in the
coaxial single images. On the other hand, the occurrence of spatter on images at this point in the welding process is very rare, which is why the case was not sufficiently present in the training. In productive use, it is assumed that the data is taken only from one perspective. Nevertheless, this experiment can show that the model generalizes well even on different input data with only very few training data and thus covers a high data variance.
Figure 5 shows the IoU without considering the background pixels of the different data sets all trained with the dice coefficient loss. This graph shows that the largest deviations are contained in the
coaxial single data set. In these images, the background occupies the largest image area, which makes small deviations of the other classes more significant, as shown in
Table 3.
In all three input data sets, coaxial single, coaxial complete, and lateral complete, the SDU-Net provides the best results compared to the other methods. The disadvantages of the U-net architecture arise from the fact that only simple and small receptive fields are used, which leads to a loss of information about the image context. In our use case, it leads to the fact that the classes cannot always be clearly distinguished from each other. The SDU-Net processes feature maps of each resolution using multiple dilated convolutions successively and concatenates all the convolution outputs as input to the next resolution. This increases the receptive field and both, smaller and larger receptive fields are considered in the result.
Visualized results of the different methods are shown in
Table 3. In comparison to the small SDU-Net, results of small U-Net, binary opening, and gray-scale opening are shown. The models of the SDU-Net and the U-Net are trained on all data and with four classes, while the morphological filter on the one hand requires a structural element of different sizes per data set and also cannot distinguish between process light and cooling process. For better visualization, the pixel-by-pixel classification of the neural networks is displayed in different colors and superimposed on the input image. The class of the process light is shown in green color, the spatters in red, and the cooling process in blue. The resulting images of the morphological filter are displayed analogously to
Figure 3.
With the morphological filtering, small regions always remain at the edge of the area of the process light, since the structural element can never fill the shape exactly. As a result, the exclusion method would always recognize some pixels as spatter, which must be filtered in post-processing. Even small reflections, which occur mainly in the lateral images, are detected as spatter by the exclusion procedure. On the other hand, spatter that is larger than the defined structural element is detected as process light and not as spatter, which also leads to a wrong result. Compared to the binary opening, the steam generated during welding, which is mainly visible on the lateral images, is usually detected as process light in the gray-scale opening. With the binary opening, the steam area is usually already eliminated during binarization.
Without runtime optimizers, our classification time of the small SDU-Net model on the CPU is about 20 ms. In comparison, the binary and gray-scale opening reached 12 ms for coaxial single, 1.61 ms for coaxial complete, and 40 ms for the lateral complete images. The deviating process times of the opening operation are caused by the different image sizes and the different sizes of the structural element. By using a larger or differently shaped structural element, the process times can be further improved, but the resulting detection quality suffers.
The production time of a stator is about one minute for all 160 to 220 pairs of hairpins. In the best case, 270 ms are needed for welding one hairpin. The quality assurance with the SDU-Net needs 20 ms which is not even 10% of the welding time and per mill of the whole welding process. With a time-delayed evaluation of the images, the previous pin can be evaluated each time the next pin is welded. Thus the time sequence of the welding of a stator remains unaffected by this setup due to the fast prediction time. The evaluation was deliberately calculated on the CPU since the model is to be executed directly at the production plant on an industrial PC, where GPUs are not always available. Thus, a strong spatter formation, which can indicate a drifting process or contaminated material, can be reported directly to the user. This allows the user to react directly and stop or adjust the process.
4. Discussion and Outlook
By training one model on all three input data sets, it could be shown that a high variance of data can be covered with this approach. The data varies greatly in terms of both recording position and spatter optics. This high level of data variance will not occur in a production line, where a position of data acquisition, as well as a type of pre-processing, will be determined. However, this experiment suggests that it will be possible to use one model for different applications and with slightly different recording mechanisms. We obtain an average IoU of the specific classes without background class of in this approach. In comparison, the IoU values of the morphological filter are and , even though these methods were parameterized specifically for the particular data set. This generalization opens the possibility of using one model for different optics without having to make adjustments. With the execution time of 20 ms we are also in a similar range as the execution times of the morphological filter. This requires 12 ms, 1.61 ms, or 40 ms depending on the input data. For some input data, especially coaxial complete with 1.6 ms, this process takes less time, but for other input data it takes even longer.
The spatter prediction works on the summed images as well as on the single images. These show the average IoU value for the specific class with for coaxial single, compared to for coaxial complete. To record single images, where every spatter is shown punctually, a high recording frequency is required. Cheaper hardware usually has a lower capture frequency, which means that individual spatters would be missed. To counteract this, the exposure time can be increased so that the spatters are visible as lines on the images, similar to the cumulative images we used. The tests showed that even in this application, the spatter can be detected in the image and thus cheaper hardware can be used for quality monitoring.
By using a segmentation approach and a model architecture which works well with strong data augmentation, it is possible to work with very small training data sets. This makes the labeling effort for new processes, e.g., to new customer data, manageable, and thereby saves time and costs. By using a small network architecture with few parameters, both the training time as well as the prediction time are short. Thanks to the short prediction time, the application can be run directly on the production line on a conventional industrial computer. By analyzing the data during production, it is possible to react interactively, which is more efficient than a completely downstream analysis. The algorithm can also be continuously optimized by feeding new data into the neural network under defined monitoring conditions and then training it further. Further knowledge can be generated through the proper application of data feedback. However, in this application, it is important to ensure that the application is not retrained by a drifting process. In addition, an online learning approach for the laser parameters would also be conceivable. The algorithm can be used to check whether spatter occurs with a certain configuration and thus readjust the laser settings.
The data can be recorded coaxially through the laser optics or laterally to the welding process. Spatter detection works well with both recording methods. In the average IoU of the process light and the spatter class, we achieve for the coaxial view, while we only achieve for the lateral images. It should be noted that the input images in both cases look very different and the relevant image area has different sizes. In the lateral images, a larger area is covered. In both cases, care must be taken to ensure that the distance to the weld seam is large enough to ensure that the spatter is still within the camera’s field of view. The coaxial camera setup is often already available on production lines and can therefore be integrated more easily. In this case, the spatter detection could be upgraded in a production line with the help of a software update.
When considering the entire welding process, welding monitoring with a focus on spatter can be seen as just one part of a desired automated 100% inspection of the welding result. This step could be integrated into a three-stage quality monitoring system: in the first step, a deviating position of the hairpins can be detected in the process preparation and thus the welding position can be corrected. In addition, it makes sense to integrate a check of the presence of both hairpins and their correct parallel position. In the second step, spatter monitoring can be carried out directly in the process. This provides information on whether the welding process is unstable and enables rapid response. In the third step, subsequent quality control of the welding results can be carried out. Due to the in-process monitoring, random samples are sufficient in this step.
However, if 100% monitoring for spatter occurrence is to be implemented, additional hardware is required. As mentioned before, an industrial camera installed at production lines usually does not have such a high frame rate that images can be recorded without short times were spatters can be missed. This can be counteracted with the help of an extended exposure time and a larger field of view in which the spatter can be detected, but a 100% view during the process is unrealistic. In this case, an event-based camera or other sensor technology would have to be used. The approach presented in this paper focuses on quick and easy integration into an existing production system without the need for investment in additional hardware. This is often very costly and can lead to additional calibration effort.
Further on, the presented approach can be extended by an additional consideration of laser parameters or other sensor technology, which is already installed in the system. With the help of the information fusion in which the camera-based in-process monitoring for spatter is integrated, it is also possible to control the process even more comprehensively with existing hardware.