1. Introduction
With the recent advancement of automation and AI (artificial intelligence) across industries, these technologies have become essential for enhancing productivity and reducing manufacturing costs. In the manufacturing process, tasks such as defect detection, quality control, and inventory management must be carried out in real time through automated systems, where high accuracy and efficiency are essential [
1]. Especially in manufacturing sectors that adopt a multi-product flexible production system, the ability to flexibly produce a variety of products is required, demanding precise adjustments whenever environmental conditions change. Given the nature of the manufacturing industry, new data collection and model training are required whenever a new product is introduced or environmental conditions shift, leading to significant costs in technology and human resources. Computer vision addresses these challenges by detecting defects through image analysis and performing precise quality inspections, making it well-suited for environments that require flexible, multi-product production. In particular, computer vision can adapt to environmental changes without requiring new data collection or model retraining, allowing manufacturers operating multi-product flexible production systems to quickly adapt to new product launches or changes in environmental conditions without additional learning processes. Using these capabilities, computer vision technology maximizes both precision and productivity by detecting defects through product image analysis and enabling rapid and precise quality inspection, while also contributing to savings in technological and human resources [
2,
3,
4]. The application of this technology is expected to play a pivotal role in identifying various defects through product image analysis and conducting fast and precise quality inspections. Key techniques include the SSIM (structural similarity index measure), MSE (mean squared error), and PSNR (peak signal-to-noise ratio), which are mainly used to evaluate structural similarity and pixel differences between images [
5,
6,
7]. Notably, SSIM has been proven effective in resolving ambiguity when tracking multiple objects, making it useful not only for quality assessment but also for tasks requiring precise object recognition [
8].
However, there are several challenges that must be addressed in order to build such a system. First, methods like SSIM, PSNR, and MSE are not sensitive to image rotation or positional changes. This means that if the angle or size of an image varies, defects may not be accurately detected. Second, load-cell-based weight measurement systems frequently encounter errors that prevent precise product counting due to resolution limitations. For instance, if a load cell measures weight in 500 g increments, overcounting or undercounting may occur if the actual weight does not align precisely with these units. Third, errors can arise when workers inadvertently affect the load cell while loading products. In manufacturing environments, if a worker steps onto or approaches the load cell while placing a product, the load cell may register the person’s weight as well, leading to inaccurate counts. For example, if a worker briefly steps on the load cell while placing a product, the system may register both the worker’s and the product’s weights, resulting in incorrect calculations. This lowers the reliability of the load cell measurement system and impacts inventory management. These issues can seriously disrupt real-time counting systems, and if left unaddressed, the accuracy of inventory management and quality control cannot be assured. Specifically, when the weights of people and products cannot be differentiated, inventory calculations may be inaccurate, reducing the overall efficiency of the manufacturing process.
This study was developed to improve the automation and accuracy of defect detection, quality control, and inventory management by integrating SIFT (scale-invariant feature transform)-based defect detection, a real-time counting correction system using YOLOv8 (You Only Look Once version 8) Pose, and a precision counting mechanism using difference image techniques. The system is designed to solve the consistency, time, and labor cost issues caused by manual inspection in large-scale manufacturing environments. Product defects are inspected through computer vision technology using cameras, and if the product has a defect and the detected value is different from the standard product, the system notifies the operator for reprocessing. Reprocessed products are re-examined using differential imaging technology to ensure they meet quality standards. Test results showed that the system demonstrated high accuracy in defect detection and quality control, contributing to reducing human errors and significantly improving the efficiency of the overall manufacturing process.
Figure 1 presents a schematic diagram of the entire system. When a product is placed on the conveyor belt, defects are detected, and uniformity is assessed through a camera installed on the belt, after which accepted products proceed to the counter. At this stage, Camera 2 identifies the products, calculates the quantity through difference images, and computes the average value. Weight measurement and counting are then performed to ensure accurate inventory management. This study utilized a high-speed camera with a variable focus function, 1280 × 720 resolution, and a frame rate of 120 fps (frames per second) to maximize efficiency in defect detection and quality control. This high-speed capability ensures precise detection of even minor defects on products moving rapidly on a conveyor belt. The camera’s variable focus function allows it to adapt flexibly to various product sizes and shapes, enabling stable defect detection without the need for additional focus adjustments. The camera was installed approximately 50 cm above the conveyor belt and set at a 90-degree angle to capture detailed images of the product surfaces with precision. This configuration provides a standardized setup that can be reliably applied across diverse manufacturing processes. Additionally, the lighting system was designed to optimize illumination and uniformity in the inspection area by supplying light from both directions.
This process addresses the issues mentioned above, and in this study, we propose three improved technologies. First, we utilize the SIFT algorithm to compare images and detect defects that remain robust against rotation and size changes in product images. SIFT extracts feature points from an image, enabling accurate similarity analysis regardless of rotation or size [
9,
10,
11,
12,
13]. Recent studies demonstrate SIFT’s high efficacy in modern applications, thanks to continuous improvements in memory storage. Advances in data compression, for instance, allow consecutive nibble pairs to be stored within a single byte, reducing memory usage by half without causing alignment issues. This bit-level improvement supports faster comparative analysis while preserving storage efficiency and matching accuracy [
14]. After correcting the product’s size and orientation using SIFT, product defects and uniformity were determined by analyzing the SSIM, PSNR, and MSE values with the
formula and difference image technique. Second, we introduced the YOLOv8 Pose algorithm to create a system that corrects counting errors in real time whenever an operator is detected on the load cell. This solution ensures accurate product counting by temporarily pausing weight measurement when an operator steps onto or approaches the load cell. Third, we developed an accurate product counting method using the difference image technique to address errors caused by the load cell’s resolution limitations. During initial setup, a specific number of products were placed on the load cell and detected through difference images, and then the unit weight is calculated. The product count was subsequently derived based on the total weight. This approach minimized counting errors arising from product placement or external factors.
The remainder of this paper is structured as follows:
Section 1 provides an introduction, detailing the study’s purpose and scope.
Section 2 reviews related research, focusing on advancements in computer-vision-based defect detection, body motion detection, and automated inventory management.
Section 3 outlines the methods proposed in this paper for product defect detection, uniformity assessment, worker recognition, precision counting, and inventory management.
Section 4 presents the experimental setup and results, analyzing the performance of the proposed techniques. Finally,
Section 5 concludes the paper with a summary of the findings and recommendations for future research directions.
2. Related Works
2.1. Computer-Vision-Based Defect Detection
2.1.1. Limitations of Deep Learning Techniques Based on Image Classification
Recently, deep learning technology has made great progress and is being used in various fields. In particular, deep learning models such as CNN (convolutional neural network) and RNN (recurrent neural network) are also attracting attention in the field of image classification. Additionally, the recent emergence of new architectures such as vision transformer (ViT) expands image processing possibilities, providing CNN-like performance. However, while this approach shows excellent performance, it still has several limitations.
First, there is a class imbalance problem. In most datasets, the imbalance between majority and minority classes has a negative impact on the performance of deep learning models [
15]. In particular, when class imbalance is severe, feature learning of minority classes is not performed properly, and the model tends to perform excellently only in predictions for majority classes [
16]. In a recent study, the performance degradation of deep learning models for various class imbalance problems was analyzed, and it was found that the larger the imbalance, the greater the tendency for model accuracy to deteriorate [
17,
18].
To solve this problem, you can either undersample the majority-category data using random sampling techniques to match the minority-category data or use a technique called a generative adversarial network (GAN) to oversample the minority-category data. As shown in
Figure 2, the undersampling process (left) involves reducing the number of samples in the majority class (Class A) to match the sample size of the minority class (Class B). This technique helps balance the dataset but may lead to information loss as potentially valuable data from the majority class is discarded. On the other hand, the oversampling process (right) increases the sample size of the minority class by duplicating existing data or generating synthetic samples. This approach helps preserve all the majority-class data while keeping the dataset balanced. Techniques such as generative adversarial networks (GANs) have been widely used to generate realistic synthetic data for the minority class, improving model performance by effectively addressing the class imbalance problem. However, even with these methods, achieving optimal performance remains challenging due to the risk of overfitting in oversampled data and the difficulty of accurately measuring model performance [
19].
Second, deep learning models such as CNN have high computational complexity and require a lot of resources during the learning and inference process [
18]. Especially in a real-time environment, this requires research to optimize the computational load for real-time product quality evaluation using multiple-frame-rate cameras and various deep learning models [
20,
21]. CNN can extract various spatial features of an image through multiple layers of convolution, but its computational cost is very high due to its structural characteristics. This can cause difficulties in applying deep learning models when real-time processing is required in an actual industrial environment, and in order to solve these problems, several studies have argued that lightweight network design or hardware acceleration techniques are needed. Third, using deep learning models in a manufacturing environment is subject to various limitations. Due to the nature of the manufacturing industry, the model must be retrained every time a new product is manufactured in addition to an existing product, which inevitably requires significant time and cost during the data collection and labeling process. Additionally, if the manufacturing facility or environment changes, the measurement environment must be rebuilt and the model must be retrained, which can have a negative impact on operational efficiency.
To solve these limitations, this study used computer vision techniques such as SSIM, MSE, PSNR, and SIFT to compare the image data. The goal was to compare good and defective products without the need for a predefined learning data set.
2.1.2. Application of Vision Algorithm in Product Classification
The development of vision algorithms is currently making an important contribution to the detection and classification of product defects and quality control in the manufacturing industry and continues to develop in the direction of increasing real-time processing capabilities and accuracy. Through this, it plays an important role in maximizing productivity and quality control efficiency in the manufacturing process. When applying vision algorithms in product classification, commonly used image comparison techniques such as SSIM, MSE, and PSNR are used. These technologies allow for the evaluation of structural similarity and pixel differences between the same images. Although these techniques are useful for basic image quality evaluation, they have the limitation of not being sufficiently robust against rotation or size changes.
In contrast, feature-based algorithms such as SIFT can extract feature points robustly even when image rotation or size changes, enabling accurate defect detection [
22,
23]. SIFT has the advantage of being able to extract feature points stably despite rotation or size changes within the image [
24,
25]. According to the results in
Figure 3, we tested SURF (Speeded-Up Robust Features), SIFT, and the hybrid technique. The hybrid technique demonstrated good performance for scale changes, whereas SIFT provided more accurate performance when detecting features such as rotational invariance [
26]. Based on this, this study aimed to develop a model that increases the recognition rate of products and enables more-accurate defect detection by utilizing the SIFT technique to overcome the limitations of existing techniques.
2.2. Research on Body Movement Detection Based on Body Skeletal Structure
Latest Research Trends in Body Movement Detection
Body movement detection is advancing as a technology that leverages deep learning, a field within machine learning, to identify human movement, detect hazards, and recognize specific actions. In HAR (human activity recognition), deep learning models play a crucial role in accurately recognizing various movements by analyzing video and sensor data. The development of HAR technology primarily involves the integration of diverse deep learning architectures [
27]. Applications such as real-time hand gesture recognition are made possible by these advancements, and research has even extended to gesture recognition based on body skeletal structure [
28]. These studies highlight the impressive capabilities of deep learning in tracking and recognizing body movements in real time and can indicate which approaches may be more effective in particular situations by comparing different architectures.
Figure 4 provides an overview of the YOLOv8 Pose architecture, which enhances the standard YOLOv8 model with advanced pose estimation capabilities [
29,
30]. The architecture consists of three main components: the backbone, neck, and head. The backbone performs feature extraction using convolutional layers and advanced modules such as C2f and SPPF, ensuring efficient and robust feature representation. The neck fuses features from multiple scales using upsampling and concatenation layers, effectively integrating spatial and semantic information. The head is adapted for pose estimation, providing dual outputs for bounding box coordinates and keypoint locations, enabling simultaneous object detection and pose estimation. Unlike earlier YOLO models, YOLOv8 Pose adopts an anchor-free design, simplifying the detection process and improving computational efficiency [
31,
32]. With enhanced speed and accuracy, the architecture is particularly suitable for real-time applications such as human motion tracking and activity recognition. The integration of the C2f module for lightweight computation and the optimized SPPF module for global feature extraction further contribute to its high performance. In summary, YOLOv8 Pose combines innovative features and real-time capabilities, making it a versatile tool for pose estimation and activity recognition in challenging environments.
2.3. Automated Inventory Management and Product Counting System
2.3.1. Inventory Management and Product Counting Market Trends
Recently, research and advancements in inventory management and product counting systems have gained attention as key components of smart factory implementation [
33]. In manufacturing, intelligent systems are essential to maximize inventory management efficiency and enhance the accuracy of product counting [
34,
35]. According to a survey, technologies such as barcodes, QR codes, AI, cloud computing, IoT (internet of things), RFID (radio frequency identification), and WMS (warehouse management systems) are widely used.
Barcode and QR code technologies provide cost-effective and reliable methods for tracking products throughout the supply chain. Barcodes are scanned at various production and distribution stages for quick identification and status updates, while QR codes, with their higher data capacity, allow for the inclusion of product specifications or batch details and are easily integrated into modern systems. In addition, AI, cloud computing, IoT, and RFID technologies have become central to improving inventory management and product counting in contemporary manufacturing systems [
36]. AI-based systems, in particular, are effective in reducing human error and enhancing accuracy by using computer vision and data analytics to streamline product calculations and inventory management, supporting real-time tracking and automated decision making.
2.3.2. Existing Product Counting Methods and Problems
The problem with existing load cells is that accurate product counting is impossible due to resolution limitations during the calculation process. For example, if the resolution of the load cell is rounded to 500 g, overcounting or undercounting may occur if the actual weight deviates from the standard value. Additionally, in a manufacturing environment, if a worker temporarily stands on or near a load cell while loading product, the system may mistakenly recognize the worker’s weight, leading to incorrect calculations. These inaccuracies have serious implications for inventory management and reduce the overall efficiency of the manufacturing process. To address these issues, research on smart load cells has shown that smart load cells achieve measurement errors of less than 100 g in industrial applications weighing up to 400 kg, providing more-accurate readings than traditional systems [
37]. However, despite these improvements, these systems are still limited in terms of maximum weight capacity and resolution, making them inadequate for use in a wide range of industrial applications.
In this study, the goal is to develop a system that improves the accuracy of product counting and inventory management processes by setting the product counting value as the unit weight. This is achieved using a technology that recognizes when a worker approaches the load cell and prevents counting from being affected, along with the difference image technique. We believe that this system will contribute to reducing product counting errors and increasing the reliability of inventory management by superseding the limitations of existing techniques.
3. Methods and Result
This study aimed to evaluate the likelihood of a product being a normal or defective item by analyzing similarities between images. Precise image comparison analysis is essential for automated quality inspection in manufacturing processes, especially in detecting subtle differences. Higher SSIM and PSNR values indicate greater similarity between images, while a lower MSE value suggests a smaller difference, thus representing greater similarity. However, high SSIM and PSNR values and a low MSE value do not guarantee that two images will appear visually identical. These metrics reflect only specific aspects of image similarity and may not be sensitive to subtle differences or variations in object position and scale. For instance, identical objects may yield high SSIM and PSNR values even if the image is rotated or shifted, but they may still look visually different. Since SSIM reflects structural similarity and PSNR and MSE focus on pixel differences, their reliability may decrease when structural and detailed differences are mixed. In particular, if a product is rotated or shifted, high SSIM, PSNR, and low MSE values may still exhibit significant visual discrepancies. To address this issue, this study applied the SIFT algorithm to detect the product’s orientation first. SIFT identifies keypoints within an image and extracts features that are robust to rotation and translation, enabling the product to be restored to its original orientation even if positioned at various angles. This alignment allows the more reliable use of SSIM, PSNR, and MSE.
Additionally, the formula combines the values of SSIM, PSNR, and MSE to produce a final evaluation score, determining whether a product should be inspected. The formula appropriately adjusts the weights of each metric, integrating diverse quality information that cannot be assessed with a single metric alone. This approach allows for clearer identification of potential defects when SSIM, PSNR, or MSE values exceed a threshold. Furthermore, if the result deviates from a specific standard, the difference image technique is applied to visually confirm changes in the product. The difference image technique calculates pixel-level differences between two images, effectively highlighting surface defects or color changes. This method allows for accurate identification of defect locations, providing a basis for operators to address issues immediately if necessary. In conclusion, by using the SIFT algorithm to align the product’s position and orientation, applying the formula to assess quality, and employing the difference image technique to visually detect minor defects, this study contributes to enhancing overall quality management.
3.1. Image-Based Product Quality Inspection and Feature Matching Techniques
3.1.1. Rotation and Scale Invariance in Image Comparison Using SIFT Algorithm
As shown in
Figure 5, after loading two images, the SIFT (scale-invariant feature transform) algorithm is used to detect keypoints within the images and calculate descriptors for those keypoints. SIFT is an algorithm designed to detect keypoints that are invariant to scale, rotation, and illumination changes, enabling it to reliably find consistent features even under various transformations [
38]. The first step of the SIFT algorithm is to locate keypoints in the scale space. This is achieved by using a Gaussian filter to process the image at different scales, allowing the detection of important keypoints by identifying extrema (maxima and minima) at each scale. The process of finding keypoints in the scale space, where a Gaussian blur is applied, utilizes the difference of Gaussian (DoG) method [
39]:
Here, G(x, y, σ) represents the image with Gaussian blur applied at scale σ, I(x, y) is the original image, and k denotes the scale factor. In this process, the differences between images at each scale are computed to identify extrema (maxima/minima), which are detected as keypoints [
39,
40,
41]. By calculating the gradient magnitude and orientation of the pixels surrounding each keypoint, a principal orientation is assigned to each keypoint to ensure rotational invariance. The gradient magnitude m(x, y) and orientation θ(x, y) are computed using the following equations:
Here, L(x, y) represents the intensity of the image, m(x, y) is the gradient magnitude at that position, and θ(x, y) is the gradient orientation. Based on the calculated orientation information, a principal direction is assigned to each keypoint, enabling the keypoints to maintain rotational invariance [
39,
40,
41]. To proceed with the matching process between the two images, the Euclidean distance between each descriptor is calculated to assess their similarity. The distance between two descriptors, p and q, is defined as follows:
Here,
and
represent the iii-th components of the two descriptors, respectively. The smaller the Euclidean distance, the more similar the two descriptors are considered, enabling the matching of keypoints between the two images. Once the matching is completed, the rotation angle between the two images can be estimated, and this angle is calculated using the following equation:
Here, (, ) and (, ) represent the coordinates of the matched keypoint pairs in the two images, respectively. Using this equation, the rotation angle between the two images can be estimated, allowing for rotation correction and image restoration based on this information. Additionally, by calculating the distance between the matched keypoints, the scale variation between the two images can be assessed, enabling the determination of whether the image has been scaled up or down.
3.1.2. Defect Detection Using a Combined SSIM, PSNR, and MSE Evaluation
Figure 6 shows the difference images between A product and B product, obtained through pre-processing with SIFT, as well as the difference image between the two. To quantitatively evaluate the similarity between the two images, SSIM, PSNR, and MSE were applied. These metrics were used to accurately compare and detect defects between normal and defective products, enabling efficient identification of product defects. The restored images were evaluated using SSIM, PSNR, and MSE to quantitatively assess the similarity between the two images, and the formulas for each method shown below in
Table 1 [
42,
43].
By applying SSIM, PSNR, and MSE to the restored images, a quantitative evaluation was performed based on structural similarity, signal-to-noise ratio, and mean squared error between the two images. SSIM measures the structural similarity of the images, PSNR assesses the signal-to-noise ratio, and MSE analyzes the detailed pixel differences, enabling an overall evaluation of similarity. These metrics can vary according to specific criteria set by the user and may be interpreted differently depending on the goals, application area, and quality requirements of the image processing task. Therefore, users can establish standards for each metric according to the project’s objectives and requirements and evaluate image similarity based on these standards. For example, in manufacturing processes where defect detection is critical, even minor defects may significantly impact results. Thus, an SSIM value of 0.95 or higher could indicate an acceptable product, and a PSNR value of 40 dB or higher could suggest good quality [
42]. In the case of MSE, a lower value signifies fewer differences between the two images. Therefore, to ensure consistent interpretation within
, the inverse of MSE is used in calculations. This approach allows for a more intuitive interpretation, where a higher
score indicates greater similarity between the images. Finally, the similarity score
, reflecting the weights of SSIM, PSNR, and MSE, is calculated as follows:
The weights , , and in the formula sum to 1 and are adjusted based on the characteristics that each metric evaluates. This adjustment is not just about evaluating the images but also about focusing on the specific defects and characteristics of the product. First, if the overall appearance or structure of the product is important, the weight of the SSIM metric, , is increased. SSIM evaluates the structural similarity of the image, focusing on structural elements such as the patterns, edges, and textures of the product. This is suitable in situations where structural defects are critical, such as in the bending of metal products or the consistency of patterns in textiles. Second, when noise or distortion on the product’s surface is the focus, the weight of the PSNR metric, , is increased. PSNR plays a significant role in examining surface scratches or the finishing of lens surfaces. Third, when fine pixel-level defects are of particular importance, the weight of the MSE metric, , is increased. MSE precisely calculates the differences between pixels, making it ideal for processes that need to detect very small defects. In conclusion, the weights in the formula are set according to the product’s characteristics and the type of defects being emphasized. By adjusting the weights according to the specific features of each metric, the efficiency and accuracy of defect detection can be improved.
3.2. Product Counting Algorithm Using Camera-Based Skeleton Tracking and Body Part Detection
3.2.1. Classification of Counting Classes Based on Body Part Detection
In this study, we developed a product counting system that applies the YOLOv8 Pose algorithm to detect the human body and correct errors that occur when a person steps onto the load cell [
44,
45]. The core of the research lies in using a camera and algorithm to detect human body parts in real time, distinguishing factors that affect weight data measured by the load cell and reducing counting errors. Body parts that influence load cell weight measurements were categorized into four classes, and the load cell’s weight measurement actions were controlled according to each class. As shown in
Figure 7, the control images illustrate the situations of full upper body detection, partial upper body detection, lower body detection, and no detection.
Full Upper Body Detection: When the full upper body is detected within the load cell area, weight measurement is temporarily paused. This is because the structure, in which the camera views the load cell from above, may result in the lower body or other body parts not being detected. When the full upper body is detected, weight measurement is paused to eliminate the influence of the body on the load cell, and the measurement resumes once the upper body moves away from the load cell.
Partial Upper Body Detection: In partial upper body detection, only a part of the upper body is detected within the load cell area. During this time, the load cell continuously reads weight data in real time, and only when the weight change meets the stabilization value is it considered valid. Weight changes less than 0.5 kg are regarded as insignificant fluctuations and are not included in the count. Therefore, when partial upper body detection occurs, the load cell measurement continues, but small weight changes are ignored, and only meaningful changes are reflected in the count.
Lower Body Detection: When the lower body is detected within the load cell area, weight measurement is paused. The weight of the lower body directly affects the load cell, so weight changes are not measured while the lower body is detected. Measurement resumes when the lower body moves away from the load cell area.
No Detection: If the camera does not detect any part of the body over the load cell, the load cell continuously measures weight changes in real time and counts the product based on the weight variations. In the no detection state, the load cell counting process proceeds normally, and the measured weight changes are used to calculate the product count.
3.2.2. Overall Flowchart of the Product Counting Algorithm
The load cell used in the product counting system for
Figure 8 can weigh up to 2000 kg and has a resolution of 500 g. While higher resolution increases the precision of the load cell system, it also raises the cost. Load cells with high resolution enable precise measurements, but in many cases, the level of precision exceeds what is required in industrial settings. In particular, the products measured in this system mostly weigh over 1 kg, so a resolution of 500 g is sufficient for accurate weight counting. This study focuses on developing a method to count products using weight increments of 500 g, based on these conditions.
3.2.3. Unit Weight Calculation and Counting Method Using Image Differencing Technique
As presented in
Figure 9, image difference techniques are employed to determine the unit weight of a product [
46,
47,
48]. The image differencing method detects the number of products placed on the load cell and calculates the unit weight based on the total weight of the products. During the initial setup, a certain number of products are placed on the load cell, and the unit weight is calculated by using the number of products detected through image differencing and the total weight measured by the load cell. This technique enables accurate detection of the number of products, minimizing weight measurement errors caused by product placement. To enhance the reliability of the image differencing method, adjustments were made to compensate for external factors such as camera angle, lighting, and background noise. This ensures minimal impact from environmental changes on product detection, improving the accuracy of the product counting process.
Here, represents the unit weight of the product, is the total weight measured by the load cell, and is the number of products detected using the image differencing technique. Once the initial unit weight is established, the number of products is calculated based on the change in weight measured by the load cell, following these steps:
Weight Change Calculation: When products are added or removed, the weight change is calculated by determining the difference between the current and previous weights measured by the load cell. The weight change must exceed a certain percentage of the unit weight (e.g., 0.5) to be considered valid. This prevents counting errors caused by minor weight fluctuations.
Stabilization Process: To improve counting accuracy, the system detects the point at which the weight change stabilizes. Based on the number of data points received per second, NDATA, if the same weight change is detected over a certain number of consecutive readings, the weight is considered stabilized, and the counting is performed. This helps to reduce errors caused by temporary weight fluctuations or noise.
Precise Decimal Handling: The weight change is calculated with precision to the first decimal places, and rounding or truncation is applied only at the time of counting. This minimizes counting errors that may occur when multiple products are loaded simultaneously. For example, if the weight change appears in increments of 0.5, it is rounded up and reflected in the final count.
Counting Execution: Once the stabilization process is complete, the number of products is calculated by dividing the weight change by the unit weight. During this process, decimal values are carefully handled, and if the first decimal place is 0.5, rounding up or down is applied to ensure the accuracy of the count.
Stabilization Process: To enhance counting accuracy, the system detects when the weight change stabilizes. Based on the number of data points received per second, NDATA, if the same weight change is detected consistently over a certain number of readings, the weight is considered stabilized, and counting is performed. This helps reduce errors caused by temporary weight fluctuations or noise. Here, Cproduct represents the number of products, ∆W is the weight change, and Wunit is the unit weight.
Count Error Correction: In this system, only products weighing 1 kg or more are counted. If the weight change is less than 1 kg, it is excluded from the count. Specifically, weight changes of 0.5 kg or less are considered minor fluctuations and are disregarded in the count results. This prevents errors caused by small weight changes and ensures that only the actual weight changes of the products are accurately reflected in the count.
The experimental results presented in
Figure 10 and
Table 2 were obtained in collaboration with an automobile parts manufacturer to verify product accuracy. The test was conducted by counting 10 products ranging in weight from 1 kg to 45 kg, achieving an accuracy of 99.268% in the product coefficient test.
4. Discussion
This study proposes a robust image comparison method that combines traditional image similarity metrics such as SSIM, PSNR, and MSE with the SIFT algorithm to quantitatively evaluate differences between normal and defective products in the manufacturing process. Conventional metrics like SSIM, PSNR, and MSE are insensitive to rotation and positional changes, making it challenging to accurately detect defects when products are captured from various angles. Specifically, for circular products or those with important surface patterns, when the product is photographed in a rotated state, the image similarity decreases, increasing the likelihood of misinterpreting a similar product as defective. To overcome this limitation, the SIFT algorithm was employed to correct for rotation and scale changes, allowing for more accurate and dependable image comparisons. The SIFT algorithm detects key points within the image, calculates the orientation and scale of each point, and generates descriptors invariant to these changes. This process enables images to be restored to the same orientation, even when the product is rotated, significantly enhancing the reliability of SSIM, PSNR, and MSE similarity metrics. For products such as wheels or those with critical surface patterns, the accuracy of similarity evaluations improved significantly after applying SIFT for angle restoration. Furthermore, to compensate for the limitations of SSIM, PSNR, and MSE, a weighted combination score, , was introduced. This score reflects the characteristics of each metric, allowing for a comprehensive evaluation of both the overall structure and fine differences of the product. The experimental results demonstrated that enabled more precise defect detection than using SSIM, PSNR, or MSE alone, and adjusting the weights for specific defect types allowed for flexible adaptation to various defect scenarios.
Additionally, the product counting system developed in this study applied the YOLOv8 Pose algorithm to effectively reduce counting errors caused when a person steps on the load cell. By detecting the presence of a person through body detection, the system automatically paused weight measurement when human influence was detected and resumed measurement when nobody was detected, ensuring accurate product counting. To overcome the resolution limitations of the load cell, the system used an image differencing technique to determine the unit weight of the product, dividing the total weight by the number of detected products to calculate an accurate count. This method involved loading a predetermined number of products onto the load cell, measuring the total weight, and calculating the unit weight based on the number of products detected using the image differencing technique. Using this calculated unit weight, subsequent product counts were derived by dividing the total measured weight by the unit weight. A key aspect of this process was detecting the point at which weight changes stabilized before collecting data. Since errors are likely to occur if weight measurements are not stabilized, the system included a procedure for collecting data only when the weight had not changed for a period of time. This helped to prevent errors caused by transient weight fluctuations and improved counting accuracy. However, one limitation of this study is that the SIFT algorithm can become computationally intensive in complex environments, which may affect performance in real-time applications. Additionally, SIFT’s performance in detecting key points may be influenced by external variables such as lighting changes and background complexity, necessitating the use of complementary algorithms.
5. Conclusions
This study proposes a robust defect detection method that combines the SIFT algorithm with traditional image similarity evaluation techniques such as SSIM, PSNR, and MSE, enabling reliable quality inspection despite product rotation and scale changes. By detecting key points in the image and correcting for rotation and scale using the SIFT algorithm, the system demonstrated the capability for accurate quality control. Additionally, the introduction of the metric allowed for precise defect analysis by leveraging the strengths of SSIM, PSNR, and MSE, offering flexible responses to various defect scenarios. In the load-cell-based counting system, the YOLOv8 Pose algorithm was employed to correct counting errors in real time when a person was on the load cell. Furthermore, the image differencing technique was used to calculate unit weight, enabling accurate product counting. Experimental results showed a high accuracy of 99.268% for products weighing between 1 kg and 45 kg. In conclusion, this research demonstrated that reliable defect detection can be achieved despite rotation and scale changes using the SIFT algorithm, and accurate quality inspection and counting systems can be implemented in manufacturing processes by utilizing the image differencing technique and YOLOv8. This confirmed that high reliability and accuracy can be maintained even in real-time manufacturing environments.
Future research should focus on improving the processing speed of the SIFT algorithm and optimizing the system to maintain stable performance in conditions with lighting changes or complex backgrounds. In particular, efforts should be made to simplify the algorithm for enhanced real-time performance and to integrate machine-learning-based predictive models for advanced automation in defect detection. Additionally, for the load-cell-based counting system, it is essential to introduce technologies that enhance resolution or develop methods capable of detecting smaller weight changes with greater precision. Algorithms that can correct counting errors in real time should be advanced, and the system’s stability must be reinforced to remain unaffected by external environmental factors such as temperature and vibration. These advancements are expected to play a crucial role in enhancing the efficiency of quality inspection and counting processes in automated manufacturing systems.