1. Introduction
The recent effort toward meeting the dynamically changing marker demands has shifted the manufacturing industry from the era of mass production to an era of mass customization, with the main goals of a manufacturing system being the optimization of time, cost, flexibility, and quality [
1]. To meet this challenge, significant focus has been oriented toward the deployment of more “intelligent” systems that meet the product design requirements [
2]. As a result of this shift, various sectors involving welding operations have gained significant traction, as welding is considered a fundamental process for manufacturing more customized products that accommodate a wide range of materials, shapes, and sizes. Additionally, the advent of state-of-the-art technologies such as artificial intelligence (AI), introduced with the emergence of Industry 4.0 [
3], has enabled the implementation of systems and applications capable of addressing challenges, such as the quality inspection of different customized products.
In general, quality inspection is a very important aspect of the modern manufacturing practise, as it enables high precision and reliability of the manufactured products. This is particularly evident in the welding sector that focuses on the fabrication of objects and structures, such as ships, automobiles, and even space shuttles [
4]. In these instances, any error prior or during the welding process can lead to costly component rejections and significant delays, potentially necessitating a complete manufacturing process restart or the complete scrapping of the component. To this end, limited research efforts have been focused on the development of methodologies that provide fast and accurate results regarding the state of the components about to be welded to eliminate any possible rejections.
Throughout the years, many techniques have been proposed to address the task of welding quality inspection, which until the end of the 20th century was conducted by human operators [
5]. These techniques are mostly incorporating computer vision capabilities as they offer a reliable and non-destructive way of inspecting the visual characteristics of welding procedures (
Figure 1). Basically, in these cases the inspection process can be divided into three main categories [
6]:
Pre-welding inspection: Its primary purpose is to verify that all prerequisites for a successful weld are met.
Per-welding inspection: It refers to the ongoing monitoring and assessment of the welding procedure.
Post-welding inspection: It occurs after the competition of the welding process, and it involves an examination of the weld against specified acceptance criteria.
This study primarily focuses on the inspection process prior to welding, which involves estimating the gap characteristics between two components (
Figure 1). To achieve this, various methods incorporate laser vision sensors for computing the relevant seam values [
7]. However, in many high-accuracy applications, this gap between the components falls within the narrow range of 0.1 to 0.5 millimetres, depending mostly on the thickness of the materials. This limitation has driven the research community to develop more complex systems capable of meeting the welding requirements. For instance, in [
8], the authors propose a setup that combines a 2D vision camera with three laser emitters to implement 3D structured light vision techniques for extracting the butt joint characteristics. Another approach can be found in [
9], where only a Charge-Coupled Device (CCD) camera is used. In this work, dedicated image processing techniques are applied sequentially to the input image for calculating the gap width between two parts of various shapes. While these approaches offer satisfactory results, they highly relate to projected patterns on the materials or traditional image processing techniques. Therefore, they are prone to environmental factors, like lighting changes or surface conditions, that may affect the quality of the input image.
The recent advances in artificial intelligence and particularly in the field of machine learning have brought to the forefront a whole new area of research for developing systems that ensure the quality assessment of products across various production stages. Especially with the establishment of Convolutional Neural Networks (CNNs) in the early 2010s [
10], machine vision systems are able to perform a variety of industry-related tasks automatically and exceed even human perception in certain cases. These breakthroughs had a significant impact on the quality inspection applications, leading to the development of various strategies tailored to address diverse manufacturing scenarios. One notable example is found in [
11], where the authors propose a novel framework for identifying missing or misaligned parts on assembled products. In the context of welding inspection, several CNN architectures have been proposed. Specifically, Ref. [
12] utilizes a network to categorize input images into four distinct classes based on their characteristics. More intricate implementations leverage these networks to tackle more complex tasks beyond classification. These approaches involve object detection [
13] for localizing post-welding defects on images, and semantic segmentation [
14] that is focused on obtaining different types of malfunctions such as welding cracks.
In the field of computer vision, semantic segmentation stands as a fundamental concept that has gained significant attention from the research community over the years. It mainly focuses on associating a label, or category, in every pixel in an input image, resulting in a detailed segmentation map that highlights distinct objects. Unlike traditional image classification or object detection operations, semantic segmentation’s goal is to not only localize or identify the object(s) of interest but also accurately calculate their boundaries within the image. This unique functionality has made semantic segmentation a favoured choice for applications demanding precise scene understanding, such as autonomous navigation [
15] and medical image analysis [
16]. Consequently, it is considered well suited for the current study, which requires accurately defining the boundaries of a pre-welding gap. However, to turn a CNN, which is designed for image classification, into a model able to achieve per-pixel classification, further alternations are required. To address this challenge, Fully Convolutional Networks (FCNs) were introduced [
17]. These models, as the name suggests, replace the final fully connected layers of a typical CNN with only convolutional and pooling layers. This modification allows them to retain spatial information throughout the entire network and generate predictions for the inputs of arbitrary sizes, resulting in an output image of the same dimensions as the network’s input, but with the object boundaries estimated. FCNs represent a fundamental technology for modern semantic segmentation applications, and they have enabled the development of more advanced architectures, in terms of speed and accuracy, such as the DeepLabV3+ model originally introduced in [
18].
Although the recent developments have successfully tackled various challenges in the field of quality inspection in manufacturing, there remain areas that require further optimization, particularly in the domain of pre-welding inspection. Automating this task is a complex endeavour, and many production processes still rely on human operators to carry it out. Ongoing research in this field explores deep learning techniques that diverge from the estimation of the gap characteristics and instead focus on different tasks. For example, in [
19], a model is implemented for detecting the continuity of the laser vision sensor’s projected line in order to make a decision regarding the welding, while in [
20], the authors propose a CNN for evaluating the alignment between two components, by inspecting the projected laser pattern offset on top of their surface. In this context, it is evident that there is still a need for an approach that directly estimates the seam characteristics from an input image.
Considering the limitations of the presented methods, the scope of this study is to propose a vision-based framework that incorporates deep learning capabilities for extracting the gap between two parts during the pre-welding phase and calculate its geometric characteristics. The novelty of the proposed work lies in the utilization of a state-of-the-art semantic segmentation network, which can isolate the desired gap area without interferences of lighting or surface conditions. The output of this model is further processed for calculating the detected gap’s width and centre. The proposed framework is able to provide real-time gap estimation with an accuracy level of 0.1 mm. The remaining of this paper is organized as follows.
Section 2 provides a description of the functionalities of the discussed framework, while
Section 3 presents the implementation of both the hardware and software configuration, while in
Section 4, a case study inspired by the manufacturing industry and the experimental campaign that was employed to test and validate the proposed framework is presented, as well as the results that have been achieved. Finally, the concluding remarks and the potential for future work are outlined.
2. Materials and Methods
This section presents the methodology designed to estimate the gap characteristics between components before welding. A high-level overview of the proposed system is illustrated in
Figure 2, which outlines the main elements, including the vision sensor, processing module, and operator interface. These components work together to capture, analyze, and make informed decisions about the gap characteristics. The hardware configuration of the vision system is detailed in the Method Implementation Section, while the core processing components and their functionalities are discussed in the following subsections.
2.1. Semantic Segmentation Network
As already mentioned in the introduction, in this paper, the DeepLabV3+ model is proposed to enable real-time semantic segmentation for the pre-welding inspection due to its ability to perform high-precision semantic segmentation in industrial applications, providing the necessary precision and robustness required for real-time pre-welding inspections. Unlike traditional image processing techniques, which struggle with varying lighting conditions and complex textures, DeepLabV3+ effectively handles these challenges through its advanced feature extraction mechanisms.
This model essentially incorporates the encode–decode structure commonly found in such networks with various techniques aiming to enhance detection performance. One such technique frequently employed in this series models is the atrous, or dilated, convolution. In the typical deep CNNs, conventional convolution and max pooling operations are employed to generate feature maps, which tend to reduce the input’s size and result in sparse feature extraction. While this does not pose a significant threat for classification operations, where spatial information is not critical, it can be detrimental in tasks demanding precise localization, such as semantic segmentation. Atrous convolution is designed to tackle this issue by introducing a parameter called “rate” . This parameter essentially governs the spacing between elements within a convolutional kernel, granting control over the receptive field of each convolutional layer. Therefore, the network is able to examine larger regions of the input feature map without compromising spatial resolution. DeepLabV3+ further improves upon this concept by incorporating atrous separable convolution, which essentially combines two convolutional operations. This involves the described atrous convolution alongside a pointwise convolution using 1 × 1 kernels to fill the elements that the atrous convolution does not cover.
The model’s encoder primarily consists of a conventional CNN architecture, and it customizes it with separable atrous convolution and the Spatial Pyramid Pooling (SPP) module [
21]. This adaptation is intended to shift the model’s focus from classification to the task of semantic segmentation. The SPP module assumes a critical role in accurately estimating object masks within the same category, even when they appear at varying scales across the dataset. This is achieved by applying distinct pooling operations to the same feature map and subsequently combining the resulting outputs into a fixed-length vector. Originally, this final feature vector was employed for classification purposes. However, in the case of DeepLabV3+, this module undergoes modification beyond mere pooling. Here, the feature maps that are generated are coming from separable atrous convolutions with different rates, and they are transferred to the decoder, where pixel class predictions occur. As a result, SPP for semantic segmentation is commonly referred to as Atrous Spatial Pyramid Pooling (ASPP).
As for the decoder architecture, this is relatively straightforward, and it is primarily focused on enhancing the resolution of the feature maps received from the encoder to provide an output of the same size as the input, leading to an image where each pixel is assigned to a specific category. In the context of DeepLabV3+, the features obtained from the encoder are enlarged by a factor of 4, compensating for the usual 16-stride employed in the backbone network. Additionally, a
convolution operation is applied to the lower-level encoder features, reducing their channel dimensions to prevent them from dominating the importance of the richer semantic features coming from the ASPP module. The outcome of this convolution is then combined with the output of ASPP, and the resulting feature maps undergo multiple
convolution layers for further refinement before being upsampled by a factor of 4. The overall architecture of DeepLabV3+ is displayed in
Figure 3.
By utilizing a semantic segmentation network, the corresponding mask of the desired gap can be obtained in a robust way without being affected by environmental factors such as changes in the lighting and surface conditions. Therefore, given an RGB input, the model can provide an output in which the desired region is isolated from the rest of the image. This output serves as a basis for the next module in which the calculation of the gap characteristics is performed.
2.2. Weld Gap Estimation
The analysis of weld gap characteristics relies on the image generated by the semantic segmentation model that highlights the relevant area. However, this model does not only outline the boundaries of this region, which are necessary for this approach, but it also contains the information inside these boundaries. Therefore, further processing is required to remove unwanted details. To achieve this, the Canny edge detection technique originally proposed in [
22] is utilized. This technique involves the identification of significant changes in intensity between neighbouring pixels in an image and involves smoothing the grey-scaled image with a filter before performing gradient calculation based on kernel operations. Additionally, area thresholds are employed to eliminate any small edges (noise) that may be introduced by the method. The resultant image displays two continuous lines that essentially define the desired gap boundaries (
Figure 4).
For estimating the width and centre of the gap, the following steps are employed, as displayed in
Figure 5. Specifically, each continuous boundary line is divided into segments, with each one containing a maximum of 20 points. These segments are assumed to represent small linear sections of the boundaries. A simple linear regression approach is applied to each boundary segment, aiming to determine the line that best fits the set of 20 data points. In general, linear regression assumes that a line can be constructed to represent the provided data, which include both dependent
and independent
values. This relationship is typically expressed using the following formula:
where
and
are the slope and intercept of that line, respectively. The estimation of the optimal values for these parameters is achieved through a least square fitting method, which basically seeks the values that minimize the following expression:
where
are the actual dependent values of the point dataset, and
are the corresponding predicted values given the independent values of
.
To calculate the width and centre of the gap for each segment, two factors are considered:
Firstly, the lines of the other boundary that are nearly parallel to the line of the currently examined segment are obtained by comparing the corresponding slopes with a predefined threshold.
Secondly, from these lines, the one with the minimum Euclidean distance is selected as the corresponding line of the other boundary.
The calculation of the distance relies on utilizing the midpoints to indicate the gap’s width in that specific segment. The formula for this calculation is given as follows:
with
and
denoting the midpoints of the final chosen line segments from the two boundaries. As for the centre of the gap of that area, this is selected as the midpoint of the vertical line that connects the two segments at their midpoints. The process is repeated for every other segment until the entire region defined by the boundaries is examined. To obtain a single value for the total gap width, the mean value of the calculated widths across the whole image is considered.
2.3. Control User Interface
To enable seamless communication between the inspection system and the welding operator, a dedicated User Interface (UI) has been designed and implemented, allowing the operator to real-time monitor and control the pre-welding inspection operation (
Figure 6). The UI is designed with a simple and intuitive layout to ensure accessibility for operators without requiring extensive training. Through the UI, the operator can initiate the vision system, adjust inspection parameters, and receive visual feedback on the detected gap characteristics. More specifically, several key functionalities are provided through the UI, such as the following:
Process Initiation and Termination: Operators can start or stop the inspection process with a single button press.
Real-Time Visualization: The interface displays both the original camera feed and the processed segmentation output, allowing operators to verify the detected welding gap in real time.
Measurement Indication: A numerical display shows the estimated gap width, with a colour-coded warning system—turning red if the gap exceeds acceptable tolerances, and green if it is within limits.
Threshold Adjustment: A slider control enables operators to define the acceptable gap range, customizing the system for different welding requirements.
Error Messaging: In cases where the system detects an issue (e.g., excessive gap width or poor image quality), a notification message appears, suggesting corrective actions to the operator.
A summary of the functionalities provided by the UI are presented in
Table 1.
The main role of the UI is to enable communication between the backend server that has been developed to serve as the centre hub for processing and the human operator (client). The implemented server receives requests by the operator through the UI’s frontend environment and conducts the quality inspection, accordingly, returning responses according to the inspection outcome (
Figure 7). Additional to the response returned to the operator by the server via the UI, dedicated signals are generated that can be used to trigger any external indicators and/or to be communicated directly to the robot controller to control the trajectory execution according to the inspection outcome.
3. Method Implementation
This section describes the implementation of the architecture of the proposed system, which is displayed in
Figure 8. The overall method can be divided into two parts. The first involves building the deep learning model, which serves the purpose of isolating the desired gap. This entails activities such as dataset collection and augmentation, as well as the training and validation of the network. The second part refers to the integration of the trained model into real-time operations and encompasses various processes aimed at optimizing the control and visualization of the inspection procedure. Detailed descriptions of the fundamental components of the framework’s implementation as well as its hardware configuration are provided in the following subsections.
3.1. Model Training and Validation
As already mentioned, the proposed DeepLabV3+ model for accomplishing semantic segmentation relies on CNNs, making it a supervised approach. Therefore, a training process is essential for enabling the network to produce satisfactory outcomes. This training procedure requires the collection and the annotation of a dedicated dataset tailored to the specific application at hand. For the purposes of this study, 800 training images featuring diverse welding gaps were manually collected and annotated. The dataset encompasses gaps of varying dimensions, exposed under various lighting conditions, to enhance the model’s ability to generalize effectively (
Figure 9). Specifically, the dataset includes welding gaps ranging from 0.1 mm to 0.5 mm and multiple environmental lighting conditions (low, medium, and high intensities) to improve robustness.
Given the limited number of images, additional augmentation techniques were employed, using Python 3.8, to increase the dataset diversity and further improve the network’s adaptability. These methods involve the random cropping, rotation, or translation of the input images as well as colour transformations such as gamma or HSV adjustment. These augmentation techniques ensure that the model is exposed to a wide range of real-world conditions, reducing the risk of overfitting and improving its generalization capabilities. Additionally, dedicated Python modules, such as OpenCV, were utilized for data visualization and preprocessing to enable the successful operation of the training process.
The semantic segmentation network was built and tested using PyTorch 2.1, which is a Python-based machine learning framework dedicated to implementing, training, and validating machine learning models. To enhance the training process, numerous experiments were conducted to fine-tune the model’s hyperparameters with the aim of minimizing the loss function for this specific application. The ultimate model, used for obtaining real-time results, underwent a training of 200 epochs. The chosen loss function is the categorical cross-entropy, which operates at the pixel level to effectively measure the dissimilarity between predicted pixel-wise class probabilities and ground-truth labels.
Following each training epoch, a validation process was conducted involving the evaluation of the model on an unseen dataset consisting of 200 images. This step is used to assess the model’s performance throughout training and prevent potential problems like overfitting the training data. The metric employed for this evaluation was the mean Intersection over Union (m-IOU), which is widely used in semantic segmentation tasks and calculates the average IOU between the predicted and ground-truth masks, providing an indication of the model’s performance on the validation dataset, which consists of data that the neural network does not use for its training. The trained model scored 0.95 with the m-IOU metric, meaning that it correctly detected 95% of the gap pixels appearing in the validation dataset (
Figure 10), suggesting that the dataset used for the model training is sufficiently diverse for this specific application.
3.2. Real-Time Operation
This section provides an overview regarding the implementation of the real-time operation of the proposed system. As previously mentioned, during this phase, the trained semantic segmentation model is initialized and generates an output image to the module responsible for calculating the welding gap characteristics. The different techniques employed for further processing the output image as well as building the backend of the proposed UI were implemented using Python. For detecting the desired gap boundaries, the OpenCV framework’s built-in Canny function was employed, facilitating edge detection. Following this, contours were extracted from the Canny image, effectively breaking down these boundaries into separate data structures, each corresponding to one side of the gap. An iterative approach, as discussed in
Section 2, was then pursued to segment each boundary into smaller lines. The calculation of these lines, along with the estimation of the corresponding distances and gap centres, was accomplished through implemented Python functions designed to facilitate fundamental Euclidean geometry operations. Finally, to transform the pixel information into the corresponding millimetres, a camera calibration procedure was conducted to calculate the intrinsic parameters that allow pixel-to-world coordinate mapping.
The proposed UI’s backend was constructed using Flask, a Python framework specifically designed for creating web applications and web services. This module’s primary function is to initiate the vision sensor and support the various computer vision techniques intended to deliver real-time outputs to the frontend for monitoring. These outputs primarily include two streams: the original feed of the camera and the outcomes of the quality inspection. To accomplish this, a communication channel was established between the backend and the frontend to facilitate data exchange, ensuring the publication of the desired information in the respective fields and granting the operator control over the entire process. This was achieved through the utilization of Server Sent Events (SSEs) that provide a straightforward way to establish a unidirectional, real-time communication from the server to the browser. On the frontend side, HTML and CSS were employed to create an operator-friendly interface for the UI. The different functions offered by the displayed buttons, as well as the implementation of SSEs, were executed using JavaScript.
3.3. Hardware Configuration
It is evident that the effectiveness of the proposed system relies heavily on the hardware configuration of the vision sensor, as this component serves as the primary data source for all employed techniques. Typically, in laser welding operations, ensuring a high-quality weld requires maintaining a gap of approximately 10% of the thickness of the parts [
23]. Therefore, in the scope of the present research, the tolerances allow seamless welding to range from 0.1 to 0.4 millimetres. These gaps are imperceptible to the naked eye and their inspection by an operator necessitates specialized tools for gap measurement and significant experience. Consequently, a standard camera would not be suitable for this purpose as it would fail to capture the essential information required for the application of the implemented algorithms.
The challenge in this context is to establish a vision setup capable of delivering high-quality images with depth magnification while also maintaining a minimum frame rate of 20 frames per second (fps) to allow real-time functionality. To achieve this, a specialized industrial camera was employed, equipped with a magnifying lens capable of capturing gaps as narrow as 0.05 millimetres. Specifically, the Basler acA2440-20gc GigE camera manufactured by Basler AG (Ahrensburg, Germany), featuring the Sony IMX264 CMOS sensor, produced by Sony Semiconductor Solutions Corporation (Kanagawa, Japan) is utilized, which provides high-resolution images at a frame rate of 23 fps with a 5.0-megapixel capacity. For the lens component, the Ricoh FL-BC7528 model, manufactured by Ricoh Imaging Company, Ltd. (Tokyo, Japan), was selected due to its exceptional ability to achieve maximum magnification, which is critical for the proposed research. This lens has a focal length of 75 millimetres and operates at a minimum distance of 0.25 metres. Additionally, an external light source was integrated with the vision sensor to ensure that the captured images remain unaffected by variations in environmental lighting conditions. For the development and the training of the proposed model an RTX3060 12 GB GPU was utilized, which enabled rapid and parallel model training while also facilitating a smooth real-time operation.
4. Case Study and Results
4.1. Overview
In this section, the implementation, testing, and validation of the envisioned system are detailed within the context of an actual industrial case study. More specifically, the operational system’s assessment took place on a laser welding scenario inspired by the industrial module manufacturing industry, where various modules and systems are being manufactured and assembled in small to medium batches.
In this context, the manufactured parts, comprising a module or components of it, are gathered in the laser welding area where a robot equipped with a laser head carries out the welding. The various steel components, during the pre-welding phase, are positioned and held together on dedicated fixtures by human operators. Then, the -jig-parts assembly is transferred to the laser welding cell to be welded by the robot. Thus, any faults during the assembly of the parts on the jig can lead to faulty welding and, therefore, to the rejection of the whole component.
To address this issue, the developed vision-based inspection system was integrated into the workflow to provide real-time, automated verification of pre-welding conditions. The system is designed to inspect the positioning of components before the welding process begins, identifying any excessive gaps that could compromise weld integrity. This inspection step is critical given the tight tolerances required for successful laser welding, depending on the material thickness and welding type. In butt welds (welds between two flat surfaces), which are the most common type of welds in manufacturing and the current focus of this case study, the acceptable root gap ranges from 0.1 to 0.4 mm (
Figure 11).
The envisioned workflow involves the welding robot executing a pre-welding dry-run trajectory along the intended seam path, with the proposed vision system capturing and analyzing the gap dimensions in real time. This approach ensures that misaligned parts are detected before welding, reducing the risk of defective welds and minimizing scrap rates (
Figure 12).
Since the robot is programmed by the human operator to have a constant distance from the gap in the
z-axis, it can be assumed that the gap will always be at the centre of the vision sensor’s field of view. Therefore, to simulate the described robot behaviour in the presented case study, a pre-industrial testing and validation of the implemented system is showcased. Specifically, a testbed was designed and deployed in a laboratory environment, as depicted in
Figure 13. Under this setup, the camera was fixed and the parts to be inspected were moving through its field of view, using a controlled transportation system to simulate the movement of the welding robot across the fixture, where the parts to be welded would be positioned. For the demonstrated scenario in the laboratory setup, some indicative parts of the examined frame that were to be assembled on the fixture and then welded were used.
Implementing inline quality control in laser welding offers significant advantages, primarily by detecting gaps before the welding process begins and promptly alerting the operator to rectify any faulty assemblies. This initiative serves the dual purpose of reducing the occurrence of defective welds caused by unnoticed gaps and, consequently, lowering the number of rejected parts. Additionally, such a system empowers human operators to recognize recurring error patterns that may give rise to these gaps during the assembly process, enabling them to provide more precise instructions to those handling the assembly. Consequently, the likelihood of errors during the assembly phase is diminished, ultimately enhancing the efficiency of the production line.
4.2. Results
To validate the implemented system in a real-world scenario, a series of tests were conducted within the proposed pre-industrial setup. These tests are focused on evaluating the system’s consistency and precision while considering two primary factors: variations in scene illumination and alternations in the velocity at which the gap passes through the vision sensor’s field of view. In the first test, the lighting conditions were manipulated using four distinct levels of light intensity, as illustrated in
Figure 14.
To evaluate the performance of the system under these four different lighting conditions, four pre-measured gaps—each confirmed to fall within the acceptable range of 0.1 to 0.4 mm—were used as test cases, and the objective was to determine whether the system could consistently classify the gaps as Accepted or Rejected under varying lighting intensities. The system’s decision was compared against the known ground truth (the real gap was pre-measured each time using a certified measuring instrument), with the expected outcome being “Accepted” for all cases if the system functioned correctly under optimal conditions. The results of this evaluation are presented in
Table 2, where the effect of different lighting intensities on system accuracy is presented. In this table, the columns represent the width of the gap, while the rows correspond to variations in illumination.
The results indicate that the system encounters challenges in accurately estimating gap dimensions under near-zero light intensities. Specifically, under no light, all gaps were classified as rejected, demonstrating the system’s dependency on sufficient illumination to function correctly. Under low light, the system successfully identified the gaps for the mid-range threshold values (0.2 mm and 0.3 mm) but failed at the extreme values (0.1 mm and 0.4 mm). This limitation arises because the semantic segmentation network fails to generate precise masks when the gap features are less visible. This is particularly true for conditions with no lighting, where the gap remains invisible even to the human eye. However, under medium to high illumination, which is where the model is mostly trained on, the system demonstrates exceptional detection accuracy across a range of gap acceptance thresholds. The primary reason for this degradation is the model’s dependence on contrast between the weld gap and the surrounding material. Inadequate lighting reduces the visibility of edge boundaries, causing the segmentation network to misclassify pixels. This is a known limitation of deep learning-based vision systems, especially if the training data present some bias toward well-lit conditions, leading to poor generalization in these scenarios. These findings highlight the importance of appropriate lighting conditions for accurate gap detection and underscore the system’s limitations in environments with poor illumination.
Regarding the second test, two components that form a predefined gap (accurately measured using a certified measuring instrument) are passed through the sensor’s field of view at five different velocities provided by the controller of the transportation system. To calculate how the velocity affects the accuracy of the quality inspection module, the mean and standard deviation values of the calculated gaps for each frame are obtained (
Table 3). Based on these values, a statistical analysis is carried out to further understand the system’s outcomes.
Specifically, the correlation coefficient
is calculated to measure the strength and direction of the relationship between the velocity and the gap width values. The results showcase a positive correlation coefficient, which means that when the velocity increases, the estimation of the gap width from the vision system is also increased. This is mostly visible in the std deviation correlation diagram, where in comparison to the mean value, it displays a strong correlation value of 0.75, meaning that the deviation of gap millimetres around the mean value significantly increases with changes in the velocity (
Figure 15), indicating an existing balance needing to be accomplished between the speed that the robot can move at during the inspection process and the gap measurement accuracy.
5. Discussion
This research focuses on the deployment of an optical system designed to evaluate the gap between two components and determine if they meet the necessary criteria for a successful welding procedure. The core processing of this system lies in the employment of a state-of-the-art semantic segmentation network, which can estimate and isolate the targeted region within an input image. Subsequently, the information generated by this network undergoes additional processing through conventional computer vision methods and fundamental Euclidean geometry operations that enable the extraction of the gap’s width and centre, ultimately enabling a decision regarding the welding operation. In addition to the technical aspects, an operator-controlled UI is proposed, which serves as a tool for supervising and initiating the entire quality inspection process. This UI offers essential functionalities, providing real-time updates on the status of the inspection. The practical implementation of the proposed method demonstrates its potential for seamless, real-time operation with high precision, achieving an accuracy level of 0.1 mm. Furthermore, the testing of the system in a real-world case study derived from the laser welding domain showcases its ability in optimizing the overall process by reducing the scrap rate associated with components due to the current absence of pre-welding inspections.
Although the proposed model demonstrates high accuracy in detecting and segmenting welding gaps, its generalization to different welding scenarios requires further consideration. The model has been trained primarily on the butt weld gaps of sheet metal components, which may limit its performance when applied to different joint geometries (e.g., fillet welds, lap joints) or materials with significantly different surface properties (e.g., aluminum, titanium). Differences in material reflectivity, thickness, and edge irregularities could introduce variations in the captured image quality, potentially affecting segmentation accuracy.
Future work could involve the further optimization of the deep semantic segmentation network by expanding the training dataset to include gaps of diverse shapes and materials under various welding setups. This broader dataset would render the network more versatile and suitable for a variety of welding operations. Additional training performance metrics, like Precision, Recall, and F1-score could also be utilized to provide a more comprehensive assessment of the model’s robustness across different welding scenarios. Furthermore, the direct integration of the implemented system with a welding robot to enable online control of its behaviour based on the inspection outcome could also be explored. This could significantly enhance the system’s automation capabilities by enabling the control of the welding process based on the extracted gap characteristics.