1. Introduction
Sweaters, knitted outer garment, offer a canvas for self-expression and a means to make a statement in the fashion world beyond their ability to keep their wearers warm, leading to a boosted global knitwear market demand due to the increased awareness of social media and fashion influence [
1,
2,
3]. According to the analysis from the Knitting Association, the Global Knitwear Market was valued at USD 644.29 billion in 2021 and is expected to reach USD 1606.67 billion by 2029, registering a CAGR of 12.10% during the forecast period of 2022–2029. Thus, to meet the ever-increasing knitwear demand and overcome the labor shortage, it is critical to develop automatic sweater machines with improved production efficiency. Ningbo Cixing Co., Ltd. (stock code: 300307), one of the global suppliers of intelligent knitting machinery, is dedicated to improving the technical level of knitting machinery promoting the development of knitting technology processing and realizing the upgrading of the knitting industry. In recent years, it has successfully mastered the core technology of automatic knitting machines by equipping its computerized automatic knitting machines with the Steiger System. In the process of intelligent equipment development, the company’s knitting automatic knitting machine business sold approximately 30,000 units in 2023, becoming the leading enterprise. However, since the head of the automatic knitting machine has been increased to a speed of 1.6 m/s on a horizontal platform, abnormal floating yarn is inevitable during rapid knitting. This is due to the fact that a breakage of the tongue of the knitting needle is not directly detectable, and the broken tongue is unable to separate the loops from the yarn, resulting in a continuous pile-up of looped yarn in the needle. When the yarn accumulates to a certain extent, it will prevent the needle from continuing its reciprocating motion, and then the needle will break under the influence of the interaction force in a relatively short period of time, leading to the breakage of the needle board, as shown in
Figure 1. If this machine floating-yarn anomaly could not be detected in time, it would seriously hinder the normal reciprocating motion of the needles, leading to a catastrophic fracture of the whole machine needle plate. Once damage to the needle board happens, the equipment owner should inevitably spend a great deal of manpower and resources to seek professional personnel to repair the equipment, but also the machine’s halt caused by the damage of the needle board greatly affects the order production. For example, it takes at least 3–5 days to completely repair the needle board and resume production. According to statistics, one automated knitting equipment can produce 20 pieces of clothes in a day, and the average producing-price of a piece of clothing is about USD 10–20, which means that excluding the maintenance costs, damage to the needle board would result in losses of at least USD 600–2000 dollars, or even more, in production profit. When this situation is extended to the whole country or even the whole world, it leads to millions in economic losses, which seriously deviates from the original intention of technological innovation for the pursuit of higher productivity and profit. Therefore, it is crucial for the knitting machines to timely and accurately detect this undesired floating-yarn phenomenon.
Traditionally, floating-yarn detection methods are primarily based on physical probe-detection equipment, which has limitations in terms of accuracy and reliability [
4]. For instance, the height of the physical probe is manually set based on experience, which leads to the following issues: if the probe is set too low, accidental triggers may cause false alarms; if it is set too high, it may fail to detect floating yarn in a timely manner. Therefore, a more intelligent and reliable method is needed to monitor the yarn status during the knitting process. Recently, some modern intelligent detection technologies, including 3D laser scanning, X-ray digital imaging, ultrasonic testing, and visual inspection have been well studied [
5,
6,
7,
8]. Among them, the visual inspection is considered to be the most effective detection method for the abnormal floating yarn, because it allows for the early detection and elimination of possible yarn flaws with the lowest facility costs and minimum part preparations [
9,
10,
11]. Visual inspection methods, involving computer vision, deep learning, and big data analysis, have been utilized in defect detection across various fields. For example, Tangbo Bai et al. proposed a new method based on the improved YOLOv4 railroad surface defect detection, using MobileNetv3 as the backbone network of YOLOv4 to extract image features, while applying depth-separable convolution on the PANet layer of YOLOv4 to realize a lightweight network that achieved higher accuracy in detecting surface defects on railroads [
12]. Weiwei Jiang et al. created a semi-supervised framework for automotive paint defect detection by introducing a spatial pyramid of external attention mechanism pooling fast external attention modules to improve the traditional YOLOv7 network structure, called YOLOv7-EA, and introducing a Wise-intersection-over-union loss function that takes into account the quality of the anchor frames, which achieves a good automotive paint defect detection performance metric [
13]. However, in practical applications, accurately distinguishing between patterns and defects in knitted fabrics is a challenging task due to the variety of knitted patterns and the complexity of the knitting process [
14,
15,
16].
In the knitting industry, attention has been focused on the detection of defects in yarns and finished knitted products, refining the accuracy of defect detection by improving the network framework. For instance, Noman Haleem et al. proposed an online uniformity testing system for measuring certain types of yarn defects called nep by using imaging and computer vision techniques and detecting knot defects in real time using the Viola–Jones object detection algorithm, which provided a guarantee of producing high-quality yarns [
17]. Zahra Pourkaramdel et al. demonstrated a rotation-invariant fabric defect detection approach with high detection rate in comparison with well-known methods, introducing a new local binary pattern called completed local quartet patterns to extract local texture features from fabric images. They used a benchmark dataset of patterned fabric images, which included three groups of fabric patterns and six defect types; the final experimental results demonstrated the main advantages of the method with a simple, rotation-invariant, grayscale-invariant process and a high detection rate [
18]. Imane Koulali et al. developed an unsupervised knitting anomaly detection method consisting of five main steps: preprocessing, automatic pattern cycle extraction, patch extraction, feature selection, and anomaly detection. That method combined the advantages of traditional convolutional neural networks with an unsupervised learning paradigm, requiring only a single flawless sample without additional data labeling. The trained network was then used to detect anomalies in flawed knitting samples. Their algorithm produced reliable and competitive results in less time and at a lower computational cost compared to state-of-the-art unsupervised methods [
19]. However, in the actual production process of the knitting industry, accurately detecting floating-yarn anomalies during the high-speed operation of knitting equipment has become an urgent problem due to the multi-process nature and continuous mass production of knitting.
In this work, to overcome the abovementioned issues, for the knitting image sequence composed of photographs captured during the knitting process, since the feature area where the floating yarn anomaly occurs is small in scope, in order to extract features more accurately, we propose a multimodal floating-yarn anomaly recognition framework based on a CNN-BiLSTM network with added knit feature sequences, because the CNN-BiLSTM network can fulfill the requirement of a better extraction of feature information in a small area. We add a new knit feature sequence to the original network framework, and the knit feature sequence combines the knitting machine head speed and knitting pattern row data. As shown in
Figure 2, the image acquisition module inputs the collected image sequences into the recognition algorithm. By using the contextual information of the knitting data, the model achieves more accurate and efficient floating-yarn recognition in complex knitting structures with the continuous cycle of image acquisition–anomaly detection–re-acquisition–re-detection training. In this, we preprocessed the captured images. Firstly, due to external light sources, dust, and other interfering factors present in the actual photography environment, the captured images contain a significant amount of noise. Therefore, we apply median filtering to effectively remove outliers and noise while preserving edge details, which improves the image quality and the model’s recognition accuracy. Next, to facilitate network input, we uniformly crop the input images to 150 × 180 pixels, reducing unnecessary information and ensuring consistency in the input data, which helps the network better learn the features of floating yarn. Then, we apply thresholding to the knitted images, converting them into binary images where the accumulated floating yarn parts are marked as the foreground, with the remaining areas designated as the background (black). Ultimately, a series of image sequences reflecting the laser-irradiated yarn are obtained. The serial communication module sends an alert message when a buildup of floating yarns is recognized, so that the staff can deal with such anomalies as soon as the floating yarns build up, thus ensuring the safety of the knitting equipment. In order to demonstrate the effectiveness of the method, the response time and the recognition accuracy required when a floating yarn anomaly is detected are used as evaluation metrics. The results show that the improved network framework outperforms the probe-based detection method as well as the original network framework, with an overall recognition accuracy of 93%, and is able to recognize and alarm most of the knitting structures only 4–5 yarn feeds after floating yarn buildup. In conclusion, we provide a pioneering method of automatic floating-yarn identification which, once tested and put into mass production, is sure to contribute to greatly advancing the knitting industry. This will help major manufacturers in the industry to acutely recognize and address floating yarns at the initial stage of their creation, thereby improving the safety, life, and productivity of knitting equipment. This method will greatly improve the accuracy and precision of identifying floating yarn build-ups, which will save potentially significant labor costs and production time costs. It has been designed to work with different automated knitting machines and a variety of knitting configurations, ensuring the versatility of the method. We believe it represents a significant advancement in the knitting manufacturing industry, paving the way for smarter, more automated knitting production.
3. Methods
3.1. Image Acquisition
The image acquisition module was developed by using a camera that captured images with a resolution of 1920 × 1080 pixels and a frame rate of 240 frames per second and an MTK Helio G90T processor. The size of the image acquisition module was 76.4 mm × 78.4 mm × 18.5 mm, and the power supply of the module as well as the data interaction with the automatic knitting machine was realized through type-c. This module was mainly used to call the camera and a specific wavelength laser combination for the acquisition of knitting images. Remarkably, because the imaging effect directly affects the accuracy of the recognition, we effectively reduced the interference of other positional images through the combination of the laser and filter, tested a variety of lighting situations through experiments, and finally chose the appropriate lighting position to improve the recognition accuracy and recognition speed. By adjusting the camera’s exposure time and sensitivity and other parameters, the effect of image acquisition could be optimized (
Figure 4). Reducing the exposure time and sensitivity could reduce the drag effect of the image, while increasing the exposure time and sensitivity could make the floating yarn phenomenon more obvious.
3.2. Serial Port Communication
The serial communication module primarily handled the interaction between the recognition program and the automatic knitting machine. The communication protocol used was the UART protocol. The serial communication module allowed the camera parameters as well as the recognition parameters to be adjusted, and the accuracy could be improved by sending information such as knitting data. Through that interaction, it was possible to set the recognition parameters and camera parameters of the detection module and feed back the floating yarn situation to the automatic knitting machine. There were four serial port modes: (1) off, (2) on, (3) active, (4) edit. In the activation mode, the automatic knitting machine transmitted the head data (number of knitting rows, color of knitting yarn) to the detection module, which automatically adjusted the recognition parameters according to the head data and the built-in parameter file to achieve accurate floating-yarn detection. In the editing mode, the automatic knitting machine could adjust the camera parameters (such as exposure time, sensitivity, and fixed focal length) as well as the recognition parameters by commands and generate a configuration file of these parameters for storage.
3.3. Recognition Algorithm
The main function of the recognition algorithm module was to process the captured images. Then, the module recognized the floating yarns according to their characteristics and sent the recognized results to the serial communication module. When there was a floating-yarn phenomenon, the serial module would send an alarm signal to the automatic knitting machine system, and thus the automatic knitting machine would stop working. In order to realize this function, we designed a floating-yarn anomaly detection framework, which was based on CNN-BiLSTM networks that were capable of incorporating knitting data.
A CNN is a commonly used deep learning model in machine vision and is widely used in artificial intelligence [
34]. CNNs includes 1-dimensional CNNs, 2-dimensional CNNs, and 3-dimensional CNNs, which are used to process sequential signals, images and videos, respectively. In our framework, the network input was a sequence of images after feature extraction and the knitting program data required to correspond to the current knitting structure, which strengthened the connection between the whole framework and actual production. The design of the convolutional neural network was inspired by the human visual system, especially the functioning of the visual cortex [
35]. This model implements mechanisms such as localized perceptual domains, weight sharing, etc. The basic structure of a convolutional neural network consists of a convolutional layer, an activation function, a pooling layer, and a fully connected layer.
CNNs can effectively extract local features of the input image sequence and combine them with knitted data but suffer from the problem of information loss when processing sequence data. To solve this problem, Recurrent Neural Networks (RNN, GRU, and LSTM) were introduced specifically for extracting time series features from sequence data. Recurrent Neural Networks (RNN) receive inputs and generate outputs at each time step by introducing a recurrent structure, while retaining a hidden state through which information from previous time steps is passed and saved. This allows RNNs to effectively model contextual and sequential relationships in sequential data. However, when the input sequence is long, the update of the RNN is realized by successive matrix multiplications, and thus the matrix multiplication operation in backpropagation may lead to the problem of gradient vanishing or gradient explosion. The Long Short-Term Memory (LSTM) network effectively solves the problems existing in RNNs by introducing a gating mechanism [
36]. An LSTM network learns long and short-term time series features of sequence data by controlling the weights of input gates, forgetting gates, and output gates. It is suitable for prediction and classification of long sequence data. The parameters of the memory cell are updated at each moment
t. The updated equation is as follows:
where
and
are the weights and biases of the forgetting gate,
is the input of the current time,
is the sigmoid function, and
is the hidden state.
The Bidirectional Long and Short-Term Memory (BiLSTM) network is an extension of the Long and Short-Term Memory (LSTM) network that introduces a bidirectional structure based on LSTM.
where
achieves the learning of the relationship before and after the sequence by retaining the step information in both directions. The network computes hidden states in both forward and backward processes and joins or merges the two, allowing the network to capture the contextual relationships in the sequence more comprehensively. The BiLSTM network structure is shown in
Figure 5.
Although the CNN-LSTM network performs well when processing feature image sequences, when the feature image sequences are increased in large numbers, it may result in the LSTM network having difficulty in retaining key features effectively. This is because when processing a large number of feature image sequences, LSTM networks are vulnerable to information loss or flooding, which reduces the performance of the model. To achieve higher accuracy in data input, this paper was inspired by the multi-view synthetic aperture radar (SAR) automatic target recognition model [
37] and introduces an attention mechanism and a bidirectional long short-term memory network to construct a floating-yarn anomaly recognition network. The overall network structure is shown in the following
Figure 6.
The feature-extracted image sequence is first fed into a deep convolutional network. Using the training model, a series of feature maps are generated by accurately extracting the local spatial information in the input image through convolution and pooling operations. These feature maps contain key information about the image. However, their dimensions and formats may not be directly applicable to the input requirements of LSTM networks. Therefore, in order to integrate the local features of these feature maps and adjust their dimensions and formats to fit the input of the LSTM network, the obtained feature maps need to be first fed into a fully connected layer for processing, to achieve further feature extraction and integration. This fully connected layer receives the feature vectors obtained from the feature extraction and feeds them into the BiLSTM network. Meanwhile, the knitted feature sequence is passed through a 1D convolutional layer to extract significant features, combined with image sequences co-input into the BiLSTM network, which incorporates an attentional mechanism designed to model and learn the relationships between feature sequences in order to capture the key information in the sequences more efficiently. Subsequently, the weighted tensor of the attention mechanism is transformed into a fixed-dimensional vector by global average pooling. This vector is then mapped onto a fully connected layer that maps it into a space with the same number of categorization categories. Typically, the fully connected layer is followed by a softmax activation function that converts the output to a probability distribution.
However, different knitting structures tend to be more susceptible to floating-yarn buildup at specific locations during the production process, and better identification can be achieved if increased attention is paid to critical areas. Therefore, the multi-head attention mechanism was added after the modeling of the BiLSTM network. By computing multiple sets of attention weights in parallel, each set of weights was called an attention head. The as-proposed approach enabled the model to learn multiple different attention modalities at the same time. In specific operation, the final hidden state and the multi-group attention score vector are obtained by computing the data from different channel inputs of the BiLSTM network. Then, the dot product function is used to calculate the attention scores, and the softmax function is applied to normalize the scores. The normalized scores are weighted and summed with the input data to obtain the final context vector. In this way, the model is able to focus on the input sequences at different levels and from different perspectives to better capture the complex relationships in the feature map sequences with the knitted data. The specific process is shown in
Figure 7.
4. Experiments and Results
The test experiments were conducted on a machine model KS3-72C-10.2G. In the deep learning process, the quantity and quality of training data have an important impact on the network model. In this study, the dataset used was a real knitting image sequence captured in a factory environment and its corresponding knitting data. For the neural network, the preprocessed image sequence and knit feature sequence were taken as inputs. Based on the presence or absence of floating yarns in the knitting process, the network was trained to classify the output as working normally or abnormally. The detection interface as well as the acquired image sequences are shown in
Figure 8.
While acquiring the window data, the single image output size was set to 1 × 150 × 180. While inputting data into the recognition network for training, random flipping was used for data enhancement thereby increasing the robustness of the model. Such a random flip operation does not lead to loss of information. The dataset was then randomly divided, with 80% used for training and 20% for performance validation. Decomposition reconstruction and comparative analysis of CNN network architectures for different signals were performed during the experiments. It helped to find the optimal combination for effective recognition of the floating-yarn phenomenon. The recognition model was trained using the Adam optimizer (torch1.10). The loss function was the cross-entropy function, and the following cross-entropy calculation formula was used:
N is the number of samples. c is the number of categories. is the true label of the ith sample belonging to the category. is the probability that the model predicts that the ith sample belongs to the category c.
Suitable network parameters can help to tremendously improve the performance of the network model. We initially set the LSTM hidden layer vector length to 128, but in real experiments, we found that the model was able to capture richer sequence features as the vector length increased, so we gradually turned it up. At a length of 256, the model’s performance on the validation set improved significantly. However, when continuing to increase it to 512, despite the model’s enhanced learning ability, the training time increased significantly, the performance improvement was not obvious, and overfitting even occurred. Similarly, the attention span significantly improved accuracy as it was increased, but blindly increasing it could also result in the model not showing significant performance gains when the computational cost was too high. We also tried initial learning rates from 0.001 to 0.1. With a learning rate of 0.1, the model was trained very quickly in the early stages, but due to the excessive pace, the model tended to skip the global optimum point, leading to unstable performance on the validation set. When the learning rate was set to 0.001, the model converged too slowly, and the training time increased dramatically. Including the tuning process of the batch size and the fully connected layer output vector length were found to have the negative impact of blindly increasing or decreasing the parameters on the model performance. In terms of optimizer selection, Adam performs well when dealing with sparse gradients and different feature scales. Compared with the traditional SGD optimizer, Adam can dynamically adjust the learning rate, which improves the convergence speed and stability of the model. During the experiments, we also tried the RMSprop and SGD optimizer, but with the same hyperparameter settings, Adam performed best on the validation set. In order to fully utilize the computational resources, the minimum batch size of the model was set to 16, with a hidden state dimension of 256 for BiLSTM, two layers, and an input dimension of 512. The output dimension of the convolutional layer was kept the same as that of the BiLSTM layer, and the input dimension of the fully connected layer was 512. The specific training parameters of the model are shown in
Table 1.
In order to train and test the network, we compared the performance of the whole system in all aspects of experiments after completing the construction of the experimental platform. In order to verify that the method could effectively recognize floating-yarn stacks in actual production, we conducted recognition tests on different knitting structures in full compliance with industry standards, and the test samples are shown in the
Figure 9.
The recognition of different structures is shown in
Figure 10a, where it can be seen that the constructed recognition device successfully recognized floating-yarn anomalies in knitwear with different structures during the knitting process. However, when the recognition experiments were conducted in real production environments, there were slight differences in the recognition efficiency for different structures. For example, in the case of a single jersey, the needles were less spaced during the knitting process, so that more yarn was pulled up when a needle break occurred, and the frame was therefore more likely to recognize yarn build-up. On the contrary, for more complex structures, the needles were farther apart, making it more difficult to recognize characteristic areas. In the actual production process, in order to meet the needs of the public, the use of different colors of yarn for knitting is unavoidable. Therefore, to ensure relevance, the exposure times and sensitivities required for different colors of yarn to be successfully captured and identified during the knitting process were also tested, and the results of the analysis are presented in
Figure 10b. The results indicate that, for white yarns, the reflective area after laser exposure was the brightest and most reflective. Therefore, the frame was easier to recognize when stacking occurred, requiring shorter exposure times and lower sensitivity. However, as the yarn color continued to deepen, because darker colors absorb more light, the characteristic areas of the image were less visible. The result was that more yarn feeds were required for recognition, requiring longer exposure times and higher sensitivities. Finally, we show the actual floating yarns during knitting of different yarns and the recognition by the vision system in
Figure 10c, which proves that the experimental platform was able to accurately recognize floating-yarn anomalies in different cases in real production.
After training the experimentally obtained image sequences for 100 epochs, we obtained the overall recognition accuracy before and after making modifications to the CNN-BiLSTM network. Additionally, a deconvolutional neural network model (DCNN) built in our previous work was selected and tested on the floating-yarn detection problem [
38]. This deconvolutional neural network was designed to solve the problem of trajectory generation in a moving state, and the training results of all the network model are shown in
Figure 11.
It can be observed that the performance of the network after the introduction of the knitted feature sequences was much better than that of the original network, which proved that the recognition accuracy could be substantially improved after being closely linked with the actual production process. The accuracy of the framework proposed in this paper could already reach a high level after 10 epochs, and the accuracy further improved after 20 epochs and quickly stabilized. In particular, a comparison of the results showed that the overall recognition accuracy of our proposed CNN-BiLSTM-KFS network was slightly higher than that of the deconvolutional neural network, indicating that both network models had excellent performance in dealing with the problem of detecting floating-yarn accumulation. However, the deconvolutional neural network was built to solve the problem of accurate trajectory generation, which requires a high-quality dataset as well as strong computational power to ensure high recognition accuracy. However, for floating-yarn detection, floating-yarn anomalies can even be directly observed as they are too obvious when displayed in pictures. Therefore, in the detection process, we did not need high-quality images to ensure the accuracy of the recognition, which also saved computational resources to a large extent. The deconvolutional neural network violated our original intention of pursuing efficient and accurate recognition at low cost, so the design of the CNN-BiLSTM-KFS network was necessary, and the results showed that the performance of the model was also excellent.
In particular, we conducted experimental tests on different knitting structures in the hope of proving that the method proposed in this paper performed much better than other detection methods in all aspects. The results of the experimental data comparison are shown in
Table 2, which shows that the proposed method had the fastest response time to the floating-yarn anomaly at the same speed, and the detection time was still low even at the high speed of the knitting equipment. Moreover, white and red serve as common and popular clothing colors and we therefore tested different structures under two colors with the same running speed; the results are shown in
Table 3 and
Table 4. They prove that the floating yarn anomaly detection method based on visual detection proposed in this paper had excellent performance and could play a more important role in actual production, ensuring the safety of knitting equipment and improving production efficiency.
Finally, we conduct further ablation experiments using the CNN-BiLSTM-KFS network architecture in order to evaluate the contribution of each component to the overall model performance. The change in model performance was observed by gradually removing each component of the model, and the results obtained are shown in
Table 5. The data analysis showed that when the CNN module was removed, the important feature extraction step was missing, which led to a significant reduction in the overall model recognition rate. When removing the LSTM module, facing the huge volume of data obtained from CNN feature extraction, the model could not effectively process the sequence information without finding the interrelationships between the data, which led to performance degradation. When removing the attention module, since the attention module helped the model to focus on the key parts of the input when processing sequences, the model could not efficiently acquire the key information when faced with long sequences of complex patterns, resulting in performance degradation. With these results, it is clear that each model component had an important contribution to the whole model, which together improved the performance of the model.
5. Conclusions
In this paper, we constructed a complete set of floating-yarn anomaly detection system. It provided a very effective solution for early detection of floating yarns in automatic knitting machine work. Because the inclusion of knit feature sequences strengthened the connection between the network and the actual production process, it made it possible to use the overall network for floating-yarn detection without the need to adjust the framework according to different types of knitting equipment, and to detect different knitting structures with high accuracy, making the method universal. At the same time, the safety of the knitting equipment was guaranteed, and the service life of the equipment was extended without sacrificing productivity. Experimental results showed that the proposed method outperformed both the traditional probe detection and the original CNN-BiLSTM framework in detecting floating yarns across various knitting structures and speeds. This was evident in the improved accuracy and reduced detection time for floating-yarn anomalies. Finally, the experimental conditions of the method were verified to perfectly fit the actual production process, and the feasibility of the method was demonstrated. Overall, the method provides a valuable contribution to the knitting industry by addressing the limitations of existing methods, better meeting the actual requirements of knitting production, and providing a more effective solution for the automatic detection of floating-yarn anomalies. However, there are still many pressing issues that need to be further addressed: (1) The external light environment, the laser irradiation angle, and the camera shooting angle at the time of capturing the image have a significant impact on the image quality, which require the user to make constant adjustments when knitting different structures on different models of automatic knitting equipment. (2) At the same time, as automatic knitting equipment is upgraded, problems with image clarity and frame rate arise with rapid knitting. In order to ensure image quality, the cost of the camera equipment necessarily increases, and higher-computing-power companion computers are required to ensure the accuracy of model recognition. Future research will continue to explore how to solve the above problems while employing different recognition algorithms based on the style of knitted fabrics to ensure intelligent production with higher accuracy while maintaining lower production costs.