Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case

Stavropoulos, Panagiotis; Papacharalampopoulos, Alexios; Sabatakakis, Kyriakos

doi:10.3390/app131910580

Open AccessArticle

Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case

by

Panagiotis Stavropoulos

^*

,

Alexios Papacharalampopoulos

^* and

Kyriakos Sabatakakis

Laboratory for Manufacturing Systems and Automation (LMS), Department of Mechanical Engineering and Aeronautics, University of Patras, 26504 Patras, Greece

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10580; https://doi.org/10.3390/app131910580

Submission received: 30 August 2023 / Revised: 18 September 2023 / Accepted: 20 September 2023 / Published: 22 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Quality monitoring of manufacturing processes is a field where data analytics can thrive. The attributes of the data, denoted with the famous ‘7V’, can be used to potentially measure different aspects of the fact that data analytics may be referred to, in some cases, as big data. The current work is a step towards such a perspective, despite the fact that the method, the application and the data are coupled in some way. As a matter of fact, herein, a framework is presented through which a heuristic match between the big data attributes and the quality monitoring characteristics in the case of manufacturing is used to extract some insights about the value and the veracity of datasets, in particular. The case of simple machine learning is used and the results are very interesting, indicating the difficulty of extracting attribute characterization metrics in an a priori manner. Eventually, a roadmap is created with respect to integrating the data attributes into design procedures.

Keywords:

quality monitoring; big data; data attributes; machine learning

1. Introduction

Data analytics in manufacturing is a vast portion of the literature, and as a subset of this, the so-called big data refers to a particularly interesting set of techniques [1]. The first hint, however, of moving from data analytics to big data is the size of the dataset itself [2]. Of course, there are more elaborate approaches to this, generalizing the factors to the complexity [3] rather than the size itself. The 7Vs are an attempt to qualitatively describe this complexity [4]. All the involved analytics, however, are closely related to desired production properties, such as agility [5], and hence, the business strategies. This renders the data, in general, a business asset [6], highly useful in the case of smart manufacturing [7].

As a matter of fact, the sophisticated predictions state that “In order to harness big data, businesses rely on storage and processing power as well as strong analytics capabilities and skills. By 2025, annual revenue from the global big data analytics market is expected to reach 68.09 billion U.S. dollars” [8]. Additionally, the global revenue for the advanced and predictive analytics software market amounted to 3.47 billion USD in 2019 [9]. It is also foreseen that, in 2026, the global analytics as a service (AaaS) market value is expected to be equal to 101 billion USD [10].

There are quite some challenges, apparently, regarding the quantification of any “Big Data”relevant issues. Recently, such an approach has been the integration of the “subjectively big data” [11], which attempts to link big data with computational complexity, as also previously sketched by others [12]. That work was limited to the first two Vs, namely volume and velocity, while the difficulty of extracting the value index of a dataset before training (a priori) was shown. There is, in any case, the challenge of computing, if possible, all these metrics, as they are case-dependent. If one uses computational complexity as a main (causal) drive, this statement is almost self-proving.

The work focuses on the other Vs, as they are given in the table hereafter (Table 1). In general, these characteristics interact with the computing system, so extra characteristics could be involved [13].

The 7Vs are interrelated most of the time, depending on the application. For instance, in real-time imaging processing applications, a high-speed camera will typically result in a large volume of data [19]. Thus, velocity is, in this case, directly linked to the volume of the data. In multimodal sensing applications, an increase in data volume is directly related to the number of sensors used for monitoring a particular process or asset [20].

Regarding the size of the data in bytes, estimations lie on the spectrum of peta-exa-and zeta-bytes [21,22], while at the same time, there is a close link with cloud systems [23,24,25] as well as with distributed computing systems [26].

In addition, it is stated that there are a variety of methods applied in the area of big data [27,28]. It is worth noting that there is the case of randomized numerical linear algebra that suits perfectly in this case [29]. Herein, however, the case of classification through machine learning is considered in order to extract some conclusions. In particular, as manufacturing is quite relevant in the case of big data, it seems that there can be some insight. Additive manufacturing [30], supply chain management [31,32], cyber-physical systems [33], fault diagnosis and early warning [34], manufacturing processes [35] as well and product development [36], are all thematics that can be potentially related to the 7Vs. More particularly, in the current work, the focus is on welding [37,38], the quality monitoring of which is expected to easily increase the levels of either one of the 7Vs. As aforementioned, volume and velocity have been associated with computational complexity [11]. This can be the beginning of the 7V’s quantification. However, this is not enough; a space characterized by V-related axes could be used for the full definition. Thus, herein, the concepts of veracity and value are investigated within the context of specific applications.

At the same time, the challenge remains to quantify these characteristics using a simple, intuitive, and a priori method. The approach described in the following section aims at demystifying this procedure for the case of veracity and value, in particular, through specific cases of welding quality monitoring and thus classification. It is noted that for the case of veracity, only the part of applicability is checked here.

The current approach utilizes a heuristic match between the data attributes and the classification problem characteristics. The case of quality monitoring for welding is utilized to this end in two different welding configurations. The results are promising for this approach, and eventually, a roadmap will be provided for the integration of data attributes in manufacturing workflow design.

Section 2 is about drafting the approach. To this end, the data attributes, also known as 7V’s, are interpreted towards the specific case of welding monitoring. Then, in Section 3, two different case studies are presented, in the context of which the data attributes are investigated. Subsequently, the results for the relevant correspondence between the data attributes and how they are matched to quality monitoring characteristics are given in Section 4. Finally, in Section 5, the concepts are generalized, and the business-added value of the overall characterization is presented.

2. Approach

A classification problem can be complex enough so that the derived computational problem is considered “big”, either in terms of size needs purely or even in terms of complexity, thus affecting (at least in principle) the veracity of a dataset.

As a matter of fact, a classification problem could be defined by the tuple (application, dataset, method). This is also denoted in the blue part of Figure 1. The first term refers to the physical problem, i.e., welding quality monitoring, and the subsequently defined classes, such as defect types. The dataset refers to the particular data that are used to train the system, which is a function of monitoring devices and data types. Finally, the method refers to the classification method used. It is noted here that, probably, deep learning [39] is the most relevant case for the case of big data; however, pure machine learning can be utilized for the sake of controlling the feature extraction procedure [40]. The positioning of the 7Vs can then be provided based on a heuristic view of what factors affect them.

The dataset can then be used to extract useful features; this results in having some extra characteristics: the separability of the points in the feature space [41] as well as their interdependence (a consequence of feature interaction [42]). Finally, also related to the post-training characteristics, lie the performance (classification rate) and the efficiency (i.e., training time) of training itself.

It can be presumed that V1 and V2 depend on the classification problem tuple; V3 and V4 are also affected by the separability and the metrics dependence, while V5 and V6 should be affected by all the aforementioned plus efficiency and performance. V7 is even more generic in terms of a number of factors, as there are also business aspects. This would exceed, however, the purposes of the current work.

After defining these dependencies, it could be feasible to check all the factors that affect the 7Vs in the case of welding quality monitoring. The following table (Table 2) enlists some indicative factors, in a rather heuristic manner, based on the observations above. The list is not, in any case, exhaustive. Thermal cameras are presumed to be used in order to be consistent with previous works; hence, the characteristics of the retrieved signals are taken into consideration in a fashion that is convenient for modeling.

3. Case Studies

3.1. Case Study I

The electrical contact resistance (ECR) of joints during the assembly of battery packs is a critical factor for accepting or rejecting battery packs or modules [44]. Postprocess inspection of ECR, in general, is something that poses a lot of challenges, as adding a new step in production [45] is not feasible due to accessibility issues; the latter are different among product variants, depending on battery pack architectures.

The approach herein aims to establish a connection between the ECR level of joints and the infrared (IR) images captured during the welding of battery tabs in the context of a supervised machine-learning problem (Figure 2). For that, a fiber laser equipped with an off-axis infrared high-speed IR camera was used to monitor the welding of Al-Cu overlapped battery tabs (as of 31 December 2021, the ZELD-e EU project, http://zeld-e.eu/ (accessed on 15 September 2023)).

The data acquisition included three different power levels (1.2, 1.4, and 1.6 kW) using a fiber laser with a 0.15-mm spot size and a constant welding speed of 12 m/min for the welding of 270 Al-Cu overlap tab-pairs with dimensions 45 × 45 mm², 0.4 mm and 45 × 45 mm², 0.2 mm. The resulting seams were 40 mm in length and inspected as per their electrical resistance across the Al-Cu tabs. Each video recorded during the welding of the tab pair was matched with the corresponding ECR value measured.

The baseline supervised approach for developing a model able to classify the electrical quality included a list of actions as described in the following points:

(1): ECR threshold definition:

An industrial empirical threshold rule was applied to the measured values of the ECR, as would normally be implemented for statistical process control. The threshold was defined both by taking into consideration the production requirements of the statistical distributions and the measured values.

(2): Data cleaning:

Data entries were rejected either due to problems during image acquisition or due to problems during the measurement of the resistance (measuring equipment was affected by external E/M fields).

(3): Data preprocessing

The high gain setting used for capturing the IR emissions of the Al tab during welding introduced a lot of thermal noise, making it difficult to easily identify the thermal signature. Thus, noise reduction was incorporated into this study to remove the background noise.

(4): Feature extraction:

For every frame of each video, a single feature, the standard deviation, was calculated, converting practically every video into a time-series signal. Consequently, in a sense, a pooling operation was performed to summarize the information in each frame into one point. However, due to the high framerate of the camera, the number of points is still quite large for creating a model that would not most certainly overfit [46]. For that, the resulting signals were split into equal parts, and after first removing the background-only frames at the start and the end of these signals, the mean, maximum, and standard deviation were calculated for each part. Finally, the best class separation was achieved by constructing a 1D signal from the standard deviation of each frame, splitting it into three parts, and calculating their mean value.

(5): Model training:

To reduce the complexity of the subsequent data-investigation actions, increase the overall expandability of the decision-making mechanism, and harmonize with the distribution of the data into the feature space, a simple perceptron was selected for the classification. This level of simplicity is preferred in order to minimize the model capacity for overfitting and bias that are inevitable when a large number of parameters is tuned using a small dataset. In this case, the number of tunable parameters is defined by the number of inputs. Additionally, keeping the overall model complexity low as well as its training process enables reproducibility among the different cases of varying the Vs, thus making it easier to track the cause of corresponding performance variation.

The perceptron was equipped with a sigmoid activation in order to be differentiable during training by utilizing gradient-based algorithms. Weights were initialized to zero, and training took place in batches using 2/3 of the available 270 data entries. An adaptive moment estimation (ADAM) algorithm [47] was used as an optimizer along with a cross-entropy loss function, as typically performed in classification problems. As mentioned, the number of inputs was defined by the number of features, which, along with the bias element, formed the available tunable parameters.

Finally, the following table (Table 3) summarizes the potential effect of the various stages on the veracity of the dataset.

In the context of this study, the major goal is to investigate and confirm otherwise theoretical statements about the link between selected Vs and the performance of data-driven applications. All the training algorithms were implemented using Python 3.7 (NumPy 1.21 and TensorFlow 2.2/Keras 2.3.1).

3.2. Case Study II

In resistance spot welding (RSW), the quality of the welds is affected by many variables [48], including the properties of the materials used, the smoothness and purity of their surface, the size and shape of the electrodes, and the parameters of the welding machine. Expulsion is a very common defect during RSW [49], and in extreme situations, it can lead to nugget thinning that may compromise the mechanical properties of the joint and the aesthetic appearance of the part’s surface [50]. Expulsion is typically accompanied by sparks during the process caused by high welding current, but it can also be due to electrode misalignment and thus not easily identifiable.

Typically, in the industry, postprocess inspection is carried out to determine the quality of the weld; however, it is sample-based, and when it includes destructive techniques, it definitely cannot be applied to products that will reach the market [51]. For that matter, in-process methods and systems have been implemented for assessing the quality aspects of a joint. As with the previous case study, the majority of them aim to establish a relationship between process emissions and joints’ qualities.

Herein, a system based on the same monitoring hardware (Figure 3) is used to record the resistance spot welding of 202 galvanized steel-316 overlapped coupon with dimensions 200 × 25 mm², 1 mm thickness. To weld these coupons, six different welding sets of parameters were used, where the welding current and duration were varied while the clamping force was constant. These sets were the following: 15A-10cycles, 15A-5cycles, 10A-15cycles, 15A-5cycles, 18A-5cycles, and 8A-10cycles.

A label was assigned to each video to indicate if expulsion occurred after visually inspecting the welded samples. Once again, the current approach aims to classify the IR image data according to the expulsion binary label by reducing the problem into a supervised one.

Based on the previous case study, it is obvious that data veracity indeed can affect the performance of a data-driven application, however, the intensity of its effect is strictly related to the nature of the data and the value that is exploited from them and eventually ends up to each model. With that in mind, in this case study, another factor is inserted, the Value of the data for its effect to be assessed in the context of a real-world application.

A series of steps described in the following bullet points are used sequentially to construct a single approach as described in the following table.

Data pre-processing:

As in this application, the recording time was significantly longer (including the cool-down phase), apart from the noise, thermal drift was also an issue that should have been faced. Two steps were applied, one for correcting the thermal drift and one for the noise cancellation. Apart from these, due to reasons that were made clear in subsequent steps (feature extraction), image stabilization was implemented to remove any vibrations transferred to the monitoring system during the process that were throwing off-center the thermal signature.

2.: Feature extraction:

With all the videos having the same dimensions, each one was flattened and appended to a single array. Then, principal components analysis (PCA) [51] was utilized as a linear, component-based transformation to extract a new set of feature vectors from the initial dataset (principal components—PC). Their dimensions defined the captured variance from the initial data.

3.: Model training:

In line with previous studies, the model used here has been a simple perceptron using the same loss function (cross-entropy loss) and weight initialization techniques (weights initialized to zero). The only difference is the optimization function; due to the different software platforms (MATLAB 2022a) used here, a built-in ADAM optimization algorithm was not available, and Conjugate Gradient was used instead. Thus, the scaled conjugate gradient was used [52].

Based on the above definitions/descriptions, the levels of the two factors affecting the data are listed in the following table. Based on them, a full factorial test design was implemented. The following table (Table 4) depicts the three different levels of the two variables that were involved.

The strategies followed for data collection, labeling, cleaning, and other data normalization and pre-processing tasks e.g., for compensating imbalance, as well as a number of feature and data visualizations, can be found in more detail in the following studies [51,53] with respect to the 1st and 2nd case study.

4. Results

As the data for each case study have been captured using the same hardware, their differences can be mainly found in the duration. For the first case study, the duration of a single welding event is 200 ms, thus, with the triggering signal for recording slightly preceding and succeeding the corresponding starting and finish points, the overall duration of the video is around 300 frames. On the other hand, in the RSW case, the cool-down of the spot, due to the larger welded area and different heat capacity of the specimens, the recording time is prolonged. This results in videos with a duration of up to 5000 frames.

Considering that the camera’s resolution is 32 by 32 pixels and each pixel is represented by a 10-bit (2-byte variable), the total amount of data for a single instance is 600 Kb and 9.76 Mb for the two cases. Apparently, this type of classification problem could create enough data volume that is problematic for subjectively big data. However, it is also shown hereafter that other Vs, such as veracity and value, can be accounted for during data modeling. For simplicity, smaller datasets and simpler cases have been taken into consideration.

To cope with the inherent thermal noise, the pixel noise was dynamically modeled, and a criterion that is capable of separating background and foreground areas was derived. By constructing a confidence map (pixel-wise confidence), a foreground mask was extracted. The initial and final results of noise cancellation are depicted in the following figure (Figure 4).

In the case of RSW, except for noise cancellation, drift correction was also required due to the prolonged recorder duration. Thus, a heuristic procedure was used to compensate for the linear trend of the data (Figure 5). Herein, the algorithm calculates the mean of each frame, constructs a signal, and fits it into a linear equation. Then it uses the coefficients of this equation to filter all the video pixels, which were defined as vectors (arrays) in a 2D space based on their value and frame number (Figure 5).

4.1. Laser Welding and Veracity

For the case of laser welding, three different factors have been tested against their effect on the dataset veracity and, hence, on training performance/efficiency. For the baseline approach described in the previous section, the decision boundary was formed easily between the two classes as their points were well separated (Figure 6—Baseline). As such, a perfect score was achieved and maintained during training as early as the 50 epochs (Figure 7—left). This is made apparent by the values of sensitivity (how well the model identifies the positive class), and specificity (how well the model identifies the negative class) which together result in a perfectly balanced accuracy score. Thus, it can be said that the baseline model’s performance is not biased toward a specific class.

Compared to the baseline approach, in the case where the preprocessing steps were removed, the number of epochs taken by the algorithm to reach a perfect classification score was significantly increased (above 150 epochs). This can be easily justified as the otherwise distant “defective” points (see baseline approach) are now very close to the “good” ones (Figure 6—remove preprocessing), rendering the creation of a linear separation boundary between them challenging. This was justified by the oscillation in specificity. Thus, even though the veracity of the data decreased in some way, the information required for class separation was still embedded in the corresponding features. However, this is not the case for the three clusters indicating different power levels, as they seem to partially “disappear”.

In the same context of manipulating data veracity, the otherwise rejected entries due to faulty ECR measurements were included. These entries can be observed (Figure 5—include ill entries) within the main geometrical structure in which the “Good” points seem to form. Within the current feature space and type of classifier, this reduction in data veracity has indeed limited the potential for better performance. This is obvious from the oscillation of specificity, which was affected by the random distribution of defective points in the feature space (Figure 6—include ill entries). As such, these points were forcing the orientation of the decision boundary to split the feature space somewhat in half and therefore, the specificity to endlessly oscillate below a perfect classification score. Eventually, the optimization algorithm was not able to find a global minimum (Figure 7—right).

While data normalization (centering and scaling) is a typical procedure in machine learning to speed up and improve the training procedure, it surely seems to contribute to increasing data veracity. In the context of the first use case, the exclusion of this step resulted in an increase in the overall training time; however, once again, a perfect classification score was achieved after 150 epochs.

4.2. RSW Case: Veracity and Value

Moving on to the next case study, the results regarding the specificity (TNR), sensitivity (TPR), and number of epochs for the gradient to stop changing (Figure 8) for the full-factorial design are presented in the following table (Table 5). True negative rate (TNR) and true positive rate (TPR) metrics were used, as the classes have not been balanced, on purpose; the bias can help in controlling the veracity.

At first glance, removing preprocessing steps, thus reducing veracity, is similar to the previous use case in that both classification and training time are affected negatively. By excluding noise cancellation and thermal drift correction for the majority of cases (F1, F2), the number of training epochs for the gradient to stop changing is nearly doubled (from P1 to P3). In the case of P2, while the classification performance decreased as predicted, reaching a constant gradient value was not possible, as it was oscillating endlessly (Figure 9). As such, while this veracity factor reduction was not having a significant effect on the classification performance, it was inserting uncertainty into training performance and consequently on the time required for the model optimization algorithm to converge.

By increasing the number of features, the required epochs for the model to converge are increased, as more parameters must be fine-tuned. Using more features seems to increase, in general, the classification performance, as expected (Figure 10). Increasing the value gained from the data by introducing a higher number of features and thus including more of the initial data variance seems to always lead to better classification performance with a training time-increase trade-off. However, as observed, the gains are not the same for the P1, P2, and P3. From P1 to P2, the increase in classification performance across F1, F2, and F3 is inferior compared to the transition towards P3. This could be an indication of a relationship between the data value and data veracity, meaning that adding value may potentially eliminate the need for data with increased veracity.

5. Discussion

The processing of data leads to the need to characterize the datasets with respect to the V-concepts. However, the latter, especially veracity and value, are not defined in a straightforward manner. Processes themselves, however, appear to have different geometrical configurations, different defects, and different monitoring approaches, leading to differentiated datasets and diversified data processing and machine learning techniques. This is quite a hindrance to the definition of the V-concepts in a unified matter.

The different nature of the signals (i.e., photos vs. videos), as well as the various steps of machine learning, namely denoising, feature extraction, and classification (or regression), seem to provide a more systemized way of matching the 7Vs with monitoring characteristics. For both the cases of RSW and laser welding, it can be noted that the veracity and the value are quite case-dependent and are highly coupled to both the performance and the efficiency of the quality monitoring systems.

It has to be noted that, deliberately, all the aforementioned metrics only refer to the second stage of machine learning, the classification itself. In general, the first stage is about extracting the features that have been taken for granted herein. This is particularly valid in the case of deep learning, where big data would indeed be relevant. Herein, in the second case, the PCA algorithm has been utilized, as it has been proven to work, while in the first one, the extraction of variance is used. This reduces the “computational load”, as for each dataset training sample, the amount of information of 32 × 32 pixels × 5,120,000 frames is reduced to 3 features and 10 features for cases 1 and 2, respectively. It is, however, noted that Figure 1 could be extended to incorporate feature extraction as a fourth characteristic of the classification problem. In that case, the technical value of the dataset could be defined as Equation (1) below. This does not include, of course, the business part of the value.

Value = \frac{true information of the dataset towards successful training}{the size of the dataset},

(1)

As far as veracity is concerned, partial metrics per aspect (i.e., usability and trustworthiness) have to be taken into consideration. Taking into account usability in particular, a tentative definition for veracity can be the following (Equation (2)), given that a case refers to a specific change in the characteristics. This is closely related to the concept of robustness [54,55]. The “scenario” refers to the potential transformations applied to the dataset. This could also cover the case of security [56]; however, this exceeds the purposes of the current work.

Veracity = \frac{the number of the scenarios where the performance is good}{the number of all the scenarios},

(2)

This could transform a bit the way that the 7Vs are regarded. As a matter of fact, they can be considered filters that gradually transform the required size of data D. Thus, a roadmap could be formed (Figure 11). The various Vs are shown here as filters, transforming the need for data (green circular disks). They help a company, in a way, respond to critical questions (red rhombuses).

The relationship between the various data sizes can be algebraically given through coefficients η. The business value, however, is also up to other factors, denoted by the letter B, implying the need to change the overall workflow within the company.

Using big data analytics in manufacturing carries significant ethical, social, and environmental implications. On the ethical front, concerns arise over privacy infringements and potential biases in data collection and decision-making, while social implications include job displacement, skills gaps, and potential unrest. In this case, where AI applications are researched for enhancing quality inspection, it is obvious that this will result in automating inspection and thus lowering resource consumption and waste generation. However, the capacity of even simple AI models to encapsulate the decision-making mechanisms of a quality expert poses questions about how their adoption could affect the entire quality control system in the industry.

6. Conclusions and Future Outlook

Based on the previous discussion, a match between the problem parameters and the veracity seems to be possible in the case of classification problems in manufacturing. As a matter of fact, as expected, reducing (eventually) data veracity—correlated factors—in general affects the performance of data-driven applications mainly in terms of increased training time, while classification performance is affected mainly by the cause of reduced data veracity and not in general by its reduction.

Data value, on the other hand, in spite of seemingly being highly coupled with veracity, seems to be able to compensate for poor veracity (or vice versa, perhaps). Adding more useful information that holds a larger portion of data variance can ensure high classification performance, however, bringing along a higher training time. This is easily proved by the dependence on the number of PCs.

Based on the above, when it comes to big data applications, it can be realized that training time will be scaled up. Fine-tuning the data-driven application’s development process on a smaller data portion can save a lot of time, as both poor data veracity and/or value can increase training time as well as the steps for improving them. Defining, however, the value and the veracity of a dataset may differ slightly from case to case, especially in the case of deep learning, where the feature selection is automated. Additionally, in terms of future work, trustworthiness, as a critical characteristic of data, needs to be studied under the concept of creating secure digital twins.

Author Contributions

Conceptualization, A.P. and P.S.; methodology, A.P.; software, K.S.; writing—original draft preparation, A.P. and K.S.; writing—review and editing, P.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Belhadi, A.; Zkik, K.; Cherrafi, A.; Sha’ri, M.Y. Understanding big data analytics for manufacturing processes: Insights from literature review and multiple case studies. Comput. Ind. Eng. 2019, 137, 106099. [Google Scholar] [CrossRef]
Cui, Y.; Kara, S.; Chan, K.C. Manufacturing big data ecosystem: A systematic literature review. Robot. Comput.-Integr. Manuf. 2020, 62, 101861. [Google Scholar] [CrossRef]
Helms, J. Big Data: It’s About Complexity, Not Size. IBM Center for The Business of Government. 2015. Available online: https://www.businessofgovernment.org/blog/big-data-it%E2%80%99s-about-complexity-not-size (accessed on 15 September 2023).
Tunc-Abubakar, T.; Kalkan, A.; Abubakar, A.M. Impact of big data usage on product and process innovation: The role of data diagnosticity. Kybernetes 2022. [Google Scholar] [CrossRef]
Papacharalampopoulos, A.; Michail, C.K.; Stavropoulos, P. Manufacturing resilience and agility through processes digital twin: Design and testing applied in the LPBF case. Procedia CIRP 2021, 103, 164–169. [Google Scholar] [CrossRef]
Kortelainen, H.; Happonen, A.; Hanski, J. From asset provider to knowledge company—Transformation in the digital era. In Asset Intelligence through Integration and Interoperability and Contemporary Vibration Engineering Technologies; Springer: Cham, Switzerland, 2019; pp. 333–341. [Google Scholar]
Alexopoulos, K.; Nikolakis, N.; Chryssolouris, G. Digital twin-driven supervised machine learning for the development of artificial intelligence applications in manufacturing. Int. J. Comput. Integr. Manuf. 2020, 33, 429–439. [Google Scholar] [CrossRef]
Statista Research Department. Big Data—Statistics & Facts. Available online: https://www.statista.com/topics/1464/big-data/ (accessed on 7 April 2022).
Statista Research Department. Advanced and Predictive Analytics Software Revenue Worldwide from 2013 to 2019. Available online: https://www.statista.com/statistics/1172729/advanced-and-predictive-analytics-software-revenue-worldwide/ (accessed on 7 April 2022).
Statista Research Department. Analytics as a Service (AaaS) Market Size Forecast Worldwide in 2018 and 2026. Available online: https://www.statista.com/statistics/1234242/analytics-as-a-service-global-market-size/ (accessed on 7 April 2022).
Stavropoulos, P.; Papacharalampopoulos, A.; Sabatakakis, K.; Mourtzis, D. Quality Monitoring of Manufacturing Processes based on Full Data Utilization. Procedia CIRP 2021, 104, 1656–1661. [Google Scholar] [CrossRef]
Marx, V. The big challenges of big data. Nature 2013, 498, 255–260. [Google Scholar] [CrossRef]
Muniswamaiah, M.; Agerwala, T.; Tappert, C. Big data in cloud computing review and opportunities. arXiv 2019, arXiv:1912.10821. [Google Scholar] [CrossRef]
Tomaz, R.B. Big Data Analytics as a Service: How Can Services Influence Big Data Analytics Capabilities in Small and Mid-Sized Companies? Master’s Thesis, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, 2020. [Google Scholar]
Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar] [CrossRef]
Rubin, V.; Lukoianova, T. Veracity roadmap: Is big data objective, truthful and credible? Adv. Classif. Res. Online 2013, 24, 4. [Google Scholar] [CrossRef]
Reimer, A.P.; Madigan, E.A. Veracity in big data: How good is good enough. Health Inform. J. 2019, 25, 1290–1298. [Google Scholar] [CrossRef] [PubMed]
Cappa, F.; Oriani, R.; Peruffo, E.; McCarthy, I. Big data for creating and capturing value in the digitalized environment: Unpacking the effects of volume, variety, and veracity on firm performance. J. Prod. Innov. Manag. 2021, 38, 49–67. [Google Scholar] [CrossRef]
Stavropoulos, P.; Papacharalampopoulos, A.; Sabatakakis, K. Online Quality Inspection Approach for Submerged Arc Welding (SAW) by Utilizing IR-RGB Multimodal Monitoring and Deep Learning. In Proceedings of the International Conference on Flexible Automation and Intelligent Manufacturing, Detroit, MI, USA, 19–23 June 2022; pp. 160–169. [Google Scholar] [CrossRef]
Segreto, T.; Teti, R. Data quality evaluation for smart multi-sensor process monitoring using data fusion and machine learning algorithms. Prod. Eng. 2023, 17, 197–210. [Google Scholar] [CrossRef]
Stephens, Z.D.; Lee, S.Y.; Faghri, F.; Campbell, R.H.; Zhai, C.; Efron, M.J.; Iyer, R.; Schatz, M.C.; Sinha, S.; Robinson, G.E. Big data: Astronomical or genomical? PLoS Biol. 2015, 13, e1002195. [Google Scholar] [CrossRef]
Rodríguez-Mazahua, L.; Rodríguez-Enríquez, C.A.; Sánchez-Cervantes, J.L.; Cervantes, J.; García-Alcaraz, J.L.; Alor-Hernández, G. A general perspective of Big Data: Applications, tools, challenges and trends. J. Supercomput. 2016, 72, 3073–3113. [Google Scholar] [CrossRef]
IBM. How to Manage Complexity and Realize the Value of Big Data. Available online: https://www.ibm.com/blogs/services/2020/05/28/how-to-manage-complexity-and-realize-the-value-of-big-data/ (accessed on 28 May 2022).
Agrawal, D.; Das, S.; El Abbadi, A. Big data and cloud computing: Current state and future opportunities. In Proceedings of the 14th International Conference on Extending Database Technology, New York, NY, USA, 21–24 March 2011; pp. 530–533. [Google Scholar] [CrossRef]
Mourtzis, D.; Vlachou, E.; Milas, N. Industrial big data as a result of IoT adoption in manufacturing. Procedia CIRP 2016, 55, 290–295. [Google Scholar] [CrossRef]
Rashid, Z.N.; Zebari, S.R.M.; Sharif, K.H.; Jacksi, K. Distributed cloud computing and distributed parallel computing: A review. In Proceedings of the International Conference on Advanced Science and Engineering, Duhok, Iraq, 9–11 October 2018. [Google Scholar]
Vanani, I.R.; Majidian, S. Literature review on big data analytics methods. In Social Media and Machine Learning; IntechOpen: London, UK, 2019. [Google Scholar]
UnnisaBegum, A.; Hussain, M.A.; Shaik, M. Data mining techniques for big data. Int. J. Adv. Res. Sci. Eng. Technol. 2019, 6, 396–399. [Google Scholar]
Drineas, P.; Mahoney, M.W. RandNLA: Randomized numerical linear algebra. Commun. ACM 2016, 59, 80–90. [Google Scholar] [CrossRef]
Wang, L.; Alexander, C.A. Additive manufacturing and big data. Int. J. Math. Eng. Manag. Sci. 2016, 1, 107. [Google Scholar] [CrossRef]
Seyedan, M.; Mafakheri, F. Predictive big data analytics for supply chain demand forecasting: Methods, applications, and research opportunities. J. Big Data 2020, 7, 53. [Google Scholar] [CrossRef]
Kamble, S.S.; Gunasekaran, A. Big data-driven supply chain performance measurement system: A review and framework for implementation. Int. J. Prod. Res. 2020, 58, 65–86. [Google Scholar] [CrossRef]
Qiao, F.; Liu, J.; Ma, Y. Industrial big-data-driven and CPS-based adaptive production scheduling for smart manufacturing. Int. J. Prod. Res. 2021, 59, 7139–7159. [Google Scholar] [CrossRef]
Jieyang, P.; Kimmig, A.; Dongkun, W.; Niu, Z.; Zhi, F.; Jiahai, W.; Liu, X.; Ovtcharova, J. A systematic review of data-driven approaches to fault diagnosis and early warning. J. Intell. Manuf. 2022, 34, 3277–3304. [Google Scholar] [CrossRef]
Shang, C.; You, F. Data analytics and machine learning for smart process manufacturing: Recent advances and perspectives in the big data era. Engineering 2019, 5, 1010–1016. [Google Scholar] [CrossRef]
Tsang, Y.P.; Wu, C.H.; Lin, K.Y.; Tse, Y.K.; Ho, G.T.S.; Lee, C.K.M. Unlocking the power of big data analytics in new product development: An intelligent product design framework in the furniture industry. J. Manuf. Syst. 2022, 62, 777–791. [Google Scholar] [CrossRef]
Fronius International GmbH. Big Data in Welding Technology. Fronius International GmbH 2018. Available online: https://blog.perfectwelding.fronius.com/wp-content/uploads/2018/12/Fronius-PW_Whitepaper_Big-Data_EN-US.pdf (accessed on 14 April 2022).
Ennsbrunner, H. Exploring the Role of Big Data in Welding Technology. Efficient Manufacturing 2019. Available online: https://www.industr.com/en/exploring-the-role-of-big-data-in-welding-technology-2360956 (accessed on 14 April 2022).
Francis, J.; Bian, L. Deep learning for distortion prediction in laser-based additive manufacturing using big data. Manuf. Lett. 2019, 20, 10–14. [Google Scholar] [CrossRef]
Jieyang, P.; Kimmig, A.; Wang, J.; Liu, X.; Niu, Z.; Ovtcharova, J. Dual-stage attention-based long-short-term memory neural networks for energy demand prediction. Energy Build. 2021, 249, 111211. [Google Scholar] [CrossRef]
Zhou, L.; Wang, Ζ.; Luo, Υ.; Xiong, Z. Separability and compactness network for image recognition and superresolution. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3275–3286. [Google Scholar] [CrossRef]
Song, W.; Shi, C.; Xiao, Z.; Duan, Z.; Xu, Y.; Zhang, M.; Tang, J. Autoint: Automatic feature interaction learning via self-attentive neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1161–1170. [Google Scholar] [CrossRef]
Braun, M.; Kellner, L.; Schreiber, S.; Ehlers, S. Prediction of fatigue failure in small-scale butt-welded joints with explainable machine learning. Procedia Struct. Integr. 2022, 38, 182–191. [Google Scholar] [CrossRef]
Beck, D.; Dechent, P.; Junker, M.; Sauer, D.U.; Dubarry, M. Inhomogeneities and Cell-to-Cell Variations in Lithium-Ion Batteries, a Review. Energies 2021, 14, 3276. [Google Scholar] [CrossRef]
Yang, Y.; Pan, L.; Ma, J.; Yang, R.; Zhu, Y.; Yang, Y.; Zhang, L. A high-performance deep learning algorithm for the automated optical inspection of laser welding. Appl. Sci. 2020, 10, 933. [Google Scholar] [CrossRef]
Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Kumar, R.; Chohan, J.S.; Goyal, R.; Chauhan, P. Impact of process parameters of resistance spot welding on mechanical properties and micro hardness of stainless steel 304 weldments. Int. J. Struct. Integr. 2020, 12, 366–377. [Google Scholar] [CrossRef]
Manladan, S.M.; Yusof, F.; Ramesh, S.; Fadzil, M.; Luo, Z.; Ao, S. A review on resistance spot welding of aluminum alloys. Int. J. Adv. Manuf. Technol. 2017, 90, 605–634. [Google Scholar] [CrossRef]
Xia, Y.J.; Su, Z.W.; Li, Y.B.; Zhou, L.; Shen, Y. Online quantitative evaluation of expulsion in resistance spot welding. J. Manuf. Process. 2019, 46, 34–43. [Google Scholar] [CrossRef]
Stavropoulos, P.; Sabatakakis, K.; Papacharalampopoulos, A.; Mourtzis, D. Infrared (IR) quality assessment of robotized resistance spot welding based on machine learning. Int. J. Adv. Manuf. Technol. 2022, 119, 1785–1806. [Google Scholar] [CrossRef]
Møller, M.F. A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 1993, 6, 525–533. [Google Scholar] [CrossRef]
Stavropoulos, P.; Bikas, H.; Sabatakakis, K.; Theoharatos, C.; Grossi, S. Quality assurance of battery laser welding: A data-driven approach. Procedia CIRP 2022, 111, 784–789. [Google Scholar] [CrossRef]
Stavropoulos, P.; Papacharalampopoulos, A.; Michail, C.K.; Chryssolouris, G. Robust additive manufacturing performance through a control oriented digital twin. Metals 2021, 11, 708. [Google Scholar] [CrossRef]
Stavropoulos, P.; Papacharalampopoulos, A.; Sabatakakis, K. Robust and Secure Quality Monitoring for Welding through Platform-as-a-Service: A Resistance and Submerged Arc Welding Study. Machines 2023, 11, 298. [Google Scholar] [CrossRef]
Luo, G.; Yuan, Q.; Li, J.; Wang, S.; Yang, F. Artificial Intelligence Powered Mobile Networks: From Cognition to Decision. arXiv 2021, arXiv:2112.04263. [Google Scholar] [CrossRef]

Figure 1. Quality monitoring and data: The Performance Indicators space.

Figure 2. Laser welding of batteries (configuration on the left) and quality monitoring (data processing on the right).

Figure 3. Resistance spot welding (configuration on the top left) and quality monitoring (data processing on the top-right and detailed procedure below).

Figure 4. Noise cancelation.

Figure 5. Correct thermal drift—Sample data from the second case study.

Figure 6. Decision threshold of the trained models—How changes in data veracity are affecting the distribution of the features (yellow points—acceptable ECR, purple points—unacceptable ECR).

Figure 7. Training progress for the first case study—Baseline (left), ill entries (right).

Figure 8. Training results from the second case study—P1-F1 case.

Figure 9. Gradient during training in the second case study—P2-F1 case.

Figure 10. Second case study—Geometric mean of TPR and TNR.

Figure 11. A roadmap for integrating Vs in metamodeling. Red nodes are critical questions; green nodes are metrics.

Table 1. Definition of the 7Vs.

#	Feature	Definition [14,15]
V1	Volume	Represents the size of the dataset.
V2	Velocity	Reflects the speed at which data are collected and analyzed.
V3	Variety	Comes from the plurality of structured and unstructured data sources, such as text, videos, networks, and graphics, among others.
V4	Variability	Constant change in data, i.e., in rate.
V5	Veracity	Ensures that the data used are trusted, implying security among others. Herein, the concept of usability, as expressed through uncertainty [16], primarily, as well as completeness [17] will be addressed.
V6	Visualization/Verification	Can be described as interpreting the patterns and trends present in the data.
V7	Value	Represents the extent to which big data generates economically worthy insights and benefits through extraction and transformation. There are already clear statements that it is coupled with veracity, through the literature [18].

Table 2. The 7Vs for the case of welding quality monitoring.

Feature	Factors
Volume	1. Process duration 2. Resolution of camera 3. Sampling rate of the camera
Velocity	Depending on the speed of the process monitoring; how many single frames per second (also on how many different files need handling)?
Variety	This has to do with the fact that the monitoring system may not rely only on one camera (single spectrum, or single angle) but on multispectral and stereo vision, possibly with different sampling rates.
Variability	This can be related to potential reuses of the system in different welding configurations or the use of the data in multiple control loops with different sampling rates.
Veracity	This depends on the combination of material and camera sensitivity, the thermal drift of the camera, and all the features that favor the introduction of noise, both from the camera itself (i.e., lambertian source) and the environment (background radiation).
Visualization/Verification	At a minimum, this is related to the ease of extracting features that can offer a good correlation with the ultimate goal of the application.
Value	Among technical factors, such as monitoring performance, business-wise, it could also be linked to explainable machine learning in the case of human-centric manufacturing [43].

Table 3. Case study I—factors that compromise data veracity.

Action	Description
(Removing) Data preprocessing steps	Remove noise-cancellation as already described in the previous step.
(Including) Ill entries	Leave entries that correspond to faulty ECR measurements.
(Removing) Normalization	Do not scale the features.

Table 4. Case study II—Factors affecting the Veracity (grey) and Value (white) of data.

Veracity Code	Veracity Factor	Value Code	Value Factor
P1	All steps included	F1	PCA: 3PC
P2	Remove video stabilization	F2	PCA: 5PC
P3	Remove thermal-drift correction and noise cancelation	F3	PCA: 10PC

Table 5. Case study II—Full factorial design results for data veracity and value.

	F1			F2			F3
	TPR	TNR	Epochs	TPR	TNR	Epochs	TPR	TNR	Epochs
P1	0.907	0.993	35	0.926	0.993	230	0.944	0.993	375
P2	0.907	0.993	200	0.907	0.993	200	0.92	0.993	200
P3	0.782	0.971	75	0.873	0.986	274	0.909	0.993	689

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stavropoulos, P.; Papacharalampopoulos, A.; Sabatakakis, K. Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case. Appl. Sci. 2023, 13, 10580. https://doi.org/10.3390/app131910580

AMA Style

Stavropoulos P, Papacharalampopoulos A, Sabatakakis K. Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case. Applied Sciences. 2023; 13(19):10580. https://doi.org/10.3390/app131910580

Chicago/Turabian Style

Stavropoulos, Panagiotis, Alexios Papacharalampopoulos, and Kyriakos Sabatakakis. 2023. "Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case" Applied Sciences 13, no. 19: 10580. https://doi.org/10.3390/app131910580

APA Style

Stavropoulos, P., Papacharalampopoulos, A., & Sabatakakis, K. (2023). Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case. Applied Sciences, 13(19), 10580. https://doi.org/10.3390/app131910580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data Attributes in Quality Monitoring of Manufacturing Processes: The Welding Case

Abstract

1. Introduction

2. Approach

3. Case Studies

3.1. Case Study I

3.2. Case Study II

4. Results

4.1. Laser Welding and Veracity

4.2. RSW Case: Veracity and Value

5. Discussion

6. Conclusions and Future Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI