Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns

Sheng, Haoliang; Cai, Songpu; Zheng, Xingyu; Lau, Mengcheng

doi:10.3390/electronics14081605

Open AccessArticle

Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns

¹

Bharti School of Engineering and Computer Science, Laurentian University, Sudbury, ON P3E 2C6, Canada

²

Shanghai International Fashion Education Centre, Shanghai 200060, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(8), 1605; https://doi.org/10.3390/electronics14081605

Submission received: 23 February 2025 / Revised: 3 April 2025 / Accepted: 10 April 2025 / Published: 16 April 2025

(This article belongs to the Special Issue Bridging the Gap between Deep Learning and Probabilistic Inference for Advancements in Robotics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Knitting, a cornerstone of textile manufacturing, is uniquely challenging to automate, particularly in terms of converting fabric designs into precise, machine-readable instructions. This research bridges the gap between textile production and robotic automation by proposing a novel deep learning-based pipeline for reverse knitting to integrate vision-based robotic systems into textile manufacturing. The pipeline employs a two-stage architecture, enabling robots to first identify front labels before inferring complete labels, ensuring accurate, scalable pattern generation. By incorporating diverse yarn structures, including single-yarn (sj) and multi-yarn (mj) patterns, this study demonstrates how our system can adapt to varying material complexities. Critical challenges in robotic textile manipulation, such as label imbalance, underrepresented stitch types, and the need for fine-grained control, are addressed by leveraging specialized deep-learning architectures. This work establishes a foundation for fully automated robotic knitting systems, enabling customizable, flexible production processes that integrate perception, planning, and actuation, thereby advancing textile manufacturing through intelligent robotic automation.

Keywords:

automated knitting; deep learning; GAN; ResNet; fabric engineering

1. Introduction

Knitting has long been a cornerstone of textile manufacturing, from traditional hand-crafted designs to industrial whole-garment knitting machines. In conventional processes, a predefined stitch label serves as the input for knitting machines like those controlled by the m1plus [1] programming language. These inputs generate corresponding fabrics (outputs), which can be visualized or verified through images of the resulting physical objects. However, the process of creating knittable programs or stitch labels from existing images of knitted patterns remains a complex challenge. This gap defines the scope of reverse knitting, as shown in Figure 1, which is an emerging paradigm that utilizes real images of knitted fabrics as its input and outputs knittable stitch labels through deep learning models. Such systems hold immense potential to transform the design and manufacturing pipeline, especially in terms of customization and replication tasks.

The primary objective of this project is to develop a two-stage pipeline for reverse knitting, leveraging deep learning to transform real images of knitted fabrics into knittable stitch labels. This pipeline includes a generation phase for predicting front labels and an inference phase for generating complete labels. This research aims to improve the accuracy of label prediction for diverse yarn structures, addressing the challenges of single-yarn (sj) and multi-yarn (mj) knitting patterns through specialized architectures. Additionally, this study evaluates the pipeline’s performance across realistic scenarios, focusing on its ability to handle imbalanced datasets and generalize effectively. This research explores three key questions. The first focuses on pipeline design, investigating how a two-stage deep learning pipeline can be effectively structured to transform real fabric images into knittable stitch labels. The second addresses performance optimization, examining the impact of design choices such as stricter loss functions, yarn-specific training, and different model architectures on the accuracy and robustness of front label and complete label predictions. Lastly, this research evaluates the pipeline’s ability to handle realistic constraints, such as imbalanced datasets and unknown yarn types, while maintaining high performance across various application scenarios.

Early efforts, such as those of Kaspar et al. [2], introduced the Refiner+Img2prog architecture using a SimGAN framework [3] to refinereal images into synthetic rendering images for front label prediction. However, challenges like dataset imbalance (e.g., FK dominance) and the “one-to-many” problem—where identical instructions yield varied visuals—limit precision and are often exacerbated by multiple-instance learning (MIL) strategies [2]. Stricter loss functions have since been explored to enhance accuracy. Trunz et al. [4] and Melnyk [5] extended GAN-based approaches [6,7,8] to model yarn properties and fabric styles, while CNNs [9] excel at capturing spatial stitch relationships and LSTMs [10] address row dependencies [11]. However, these works focus more on creatively generating stitches, unlike Kaspar’s Inverse Knitting, which only translates fabric images into stitch labels.

For the inference phase, inter-stitch logic is critical. Yuksel et al. [12] pioneered stitch meshes, which were later refined by Wu et al. [13,14] for knittability. Conditional Random Fields (CRFs) [15] model probabilistic dependencies, though long-range limitations persist [16]. Residual architectures [16] and 3D-to-instruction methods [17] further enhance complete label generation. Yet, gaps remain: imbalanced datasets underrepresent rare labels (e.g., “E”, “V”) [2,4,5], multi-yarn scalability is limited [12,13], and computational efficiency lags are common with resource-heavy models [10,11,15].

Unlike Transformers, which favor global features but demand a large amount of resources, our residual CNNs, UNets, and multi-layer CNNs prioritize local dependencies, as stitch labels must work with their neighbors [12], using 3 × 3 kernels effectively. GANs [2] remain cost-efficient, while GNNs, though relational, face convergence issues in dense grids and CNNs favor stability.

The rest of this paper is organized as follows: Section 2 details the data collection and preparation process, including the acquisition of ground truth data and the categorization of yarn and stitch types. Section 3 presents the proposed two-stage deep learning architecture, describing the generation phase for front label prediction and the inference phase for complete label generation. Section 4 describes the experimental setup, including the four key usage scenarios, and evaluates the model’s performance across different yarn types and stitch patterns. Section 5 discusses the results, highlighting the model’s strengths and limitations, while Section 6 concludes the study and outlines future research directions. Finally, Appendix A and Appendix B provide additional details about the model’s implementation and the experimental environment setup.

2. Data Collection and Preparation

The data collection and preparation process involves acquiring ground truth data and defining yarn and stitch concepts. As illustrated in Figure 2, the ground truth data consist of four key components: the complete label, designed by textile experts as the foundation for other data types; the front label, which is derived from the complete label through mapping; the rendering image, generated by rendering the complete label; and the real image, produced by knitting the complete label into physical fabric. These data types are systematically derived through processes such as mapping the complete label to the front label, rendering these into images, and knitting these into physical fabrics.

Our dataset is categorized by yarn types and stitch patterns. The yarn types include single-yarn samples (sj) and multi-yarn samples (mj), which are further divided into 2j, 3j, and 4j for fabrics containing two, three, and four yarn colors, respectively. However, practical limitations restrict structures to less than four colors. Each dataset sample consists of a 20 × 20 grid of stitches labeled with stitch types like FK, BK, and others, which represent the structural details of the fabric.

To address the limited availability of real images in the Kaspar dataset, which contains only 1043 front-facing fabric images, data augmentation is performed using transfer images (Figure 3). Both rendering images and transfer images are generated from the complete label using the m1plus software provided by Stoll [1]. These transfer images serve as substitutes for real images captured through physical knitting and photography, allowing for a larger and more balanced dataset. This augmentation strategy significantly enhances the dataset’s consistency and availability, ensuring the more reliable training and evaluation of the models.

The complete label is not provided by the designers as raw data but is derived from the colored complete label through label aggregation. It incorporates both structure and color tags. For example, in multi-yarn (mj) cases containing two yarns (2j), labels such as (1/2j)FK,MAK and (2/2j)FK,MAK are aggregated into FK,MAK. Here, 1/2j indicates the first yarn in a two-yarn structure, while 2j specifies the yarn type. This label aggregation simplifies structure recognition by focusing on stitch labels without considering color-specific information. Notably, color recognition is outside the scope of this project, which centers on identifying the stitch structures required for knitting instructions.

2.1. Front Label Acquisition

The front label is obtained using predefined mappings, as illustrated in Figure 4. These contain the label number (numerical values used for training), label name (technical stitch types such as FK and BK), and label color and are employed for intuitive visualization of the 20 × 20 stitch grid during testing. Additionally, the front label provides a representation within the rendering image, ensuring easy identification. In contrast, the complete label focuses on generating knittable instructions and includes the label number, label name, and label color, emphasizing the structural information required for producing feasible patterns.

A key distinction exists between the front label and the complete label. The front label simplifies the recognition of visual patterns in real images or rendering images, enabling a model to map directly from the visual domain. Meanwhile, the complete label requires an inference step to generate knittable instructions, making it more suited to logical reasoning tasks rather than visual recognition. To ensure effective model training, separate mappings are provided for single-yarn (sj) and multi-yarn (mj) structures, as shown in Figure 4a,b. However, because the front label exhibits minimal structural differences between sj and mj cases, combined training is feasible when using front labels.

2.2. Rendering Image Acquisition

Rendering images were synthesized using the m1plus software by Stoll [1], which simulates virtual fabric structures using complete labels. The process involves inputting the complete label into the m1plus software, which generates a corresponding rendering image. These images serve as intermediary outputs for downstream tasks, bridging the gap between virtual designs and real-world knitting data.

2.3. Real Image Acquisition

Real images were generated using Stoll’s computerized flat knitting machines, which are programmed with outputs from the m1plus software. To improve efficiency, multiple samples were knitted together on a single sheet, as illustrated in Figure 5. The Region of Interest (ROI) for each sample was then manually marked to distinguish between the different patterns within the sheet. Finally, physical images were captured using an iPhone 15 Pro Max in RAWMAX photo mode within a controlled lightbox environment, ensuring high-quality imaging for further processing and analysis.

2.4. Distribution of Yarn Types and Stitches

The distribution of yarn types, illustrated in Figure 6, includes a total of 4950 samples. The dataset consists of 3000 single-yarn (sj) samples, evenly distributed across 10 subcategories, Hem, Move1, Miss, Cable1, Links2, Move2, Cable2, Mesh, Tuck, and Links1, with approximately 300 samples in each category. Additionally, there are 1950 multi-yarn (mj) samples. Although a larger dataset of 12,392 sj samples from Kaspar et al. [2] was available, the decision to limit the sj samples to 3000 was made to maintain a balance with the smaller number of mj samples. This curated subset ensures a representative distribution of both categories for effective model training.

The stitch distribution of the dataset reveals a significant label imbalance, as depicted in Figure 7. For the front labels, the FK label dominates the dataset, accounting for approximately 75% of the samples, while other labels remain underrepresented. Similarly, for the complete labels, FK makes up about 46.5% of the samples, followed by FK, with MAK at approximately 27.1% and the other labels being comparatively rare. This imbalance reflects the real-world variability in textile patterns and highlights the need for robust training strategies to mitigate its impact on model performance.

3. Model Architecture

The architecture we propose for generating knitting instructions from fabric images is structured to address the key limitations identified in previous methods, particularly in the approach used by Kaspar et al. (2019) [2]. In contrast to Kaspar et al., who utilized a single generation phase to directly predict 17 single-yarn (sj) labels, our methodology introduces an additional inference phase that significantly enhances the scope and precision of the generated knitting instructions.

In our approach, the generation phase initially predicts 14 distinct front labels instead of final sj labels. These front labels are carefully chosen to capture the visual differences that are more discernible in the fabric images, making them easier for the model to identify accurately. By focusing on these foundational features, we simplify the initial prediction task, which improves the robustness and accuracy of front label generation.

The inference phase then takes these 14 front labels and processes them through a residual model to generate a comprehensive set of 34 complete labels. This complete label set is not only compatible with single-yarn knitting (sj) but is also designed to support multi-yarn (mj) knitting applications, enabling the production of more intricate and multi-colored patterns. This two-step approach—starting with visually distinctive front labels and refining them into complete knittable labels—represents a major advancement in automated knitting instruction generation. This architecture thus enhances both the scalability and flexibility of the model, making it suitable for a broader range of textile applications.

The overall architecture can be visualized in the schematic provided below (Figure 8), which illustrates the flow from real image to front label and finally to complete label through the proposed generation and inference phases.

3.1. Front Label Generation: Refiner and Img2prog

The generation phase of our model processes the input real images into front labels in two steps: a refiner is used to produce intermediate rendering images and an Img2prog module to map these images to front labels (Figure 9). While the refiner acts as the generator, and Img2prog as the encoder, their specific roles are enhanced with innovations that improve the accuracy and robustness of the label prediction pipeline. This approach builds upon and improves Kaspar et al.’s work [2].

This study improves upon Kaspar et al. [2] in two key areas. First, the precise mapping from real images to rendering images replaces Kaspar’s indirect transfer approach, which aligns real images to the rendering domain without creating specific one-to-one mappings. In contrast, our refiner directly maps each real image to a corresponding predefined rendering image, reducing ambiguity and improving the accuracy of downstream label predictions.

Second, this work introduces a stricter cross-entropy loss by eliminating the multiple-instance learning (MIL) tolerance used by Kaspar et al., which allowed neighboring labels to be considered correct. While MIL provided flexibility, it compromised precision, especially on imbalanced datasets where labels like FK dominate (accounting for 75% of data points). By enforcing stricter accuracy, our approach avoids overfitting to dominant labels and ensures improved performance across all classes [2].

The refiner is a GAN model introduced by Shrivastava et al. (2017) [3]. Their work proposed the idea of refining synthetic images into realistic ones using adversarial training. This foundational concept inspired the design of the refiner used in our pipeline, which adapts their approach to refine residual images into accurate rendering images.

The refiner, implemented as oper_img2img_bottleneck, refines residual images derived from real images into rendering images, which serve as intermediary representations. Acting as the generator in the pipeline, the refiner operates through a series of distinct stages. Initially, during input preprocessing, a residual image is generated by subtracting a mean value from the real image, which normalizes the input and highlights key patterns for further processing. The encoder process then extracts features from the residual image using convolutional layers and a stride of two to progressively downsample and encode essential information, aided by ReLU activations to ensure non-linearity. At the model’s core, the bottleneck comprises residual blocks that preserve spatial dimensions while capturing deeper and more intricate features, crucial for generating accurate rendering images. Finally, the decoder process reconstructs the processed features into a refined rendering image using upsampling techniques, such as bilinear interpolation followed by convolutional layers. The output is a rendering image that closely matches the ground truth, offering clear visual cues for subsequent tasks like label prediction.

The Img2prog module, implemented as oper_img2prog_final_complex, transforms the refined rendering images output from the refiner into front labels that encode knitting instructions. Acting as the encoder in the pipeline, Img2prog begins with feature extraction, where convolutional layers reduce the spatial dimensions of the input while capturing its essential features. During this stage, intermediate outputs are preserved for use in skip connections. These skip connections integrate intermediate features, by transforming them through space-to-depth operations, reducing them via convolution, and then concatenating them to form a combined feature map. This approach ensures the fusion of both low-level and high-level information for enhanced feature learning. The residual blocks, similar to those in the refiner, further refine the combined features by processing the local and global dependencies within the knitting patterns. Finally, the output layer uses convolutional layers to map the refined features into logits corresponding to the 14 front labels. The predicted labels are derived through an argmax operation, which determines the most probable class for each label.

By refining specific rendering images with the refiner and mapping them to front labels with Img2prog, our method achieves significant improvements in precision and robustness. These two steps form the backbone of our generation phase, addressing key shortcomings in Kaspar et al.’s approach and laying the foundation for subsequent inference-phase processing.

Our project employs multiple loss functions to guide the training process effectively. These losses include cross-entropy, adversarial, perceptual, and style losses. Each loss component contributes uniquely to the generation of accurate and realistic knitting instructions and images.

Cross-entropy loss is a standard metric used in classification tasks to quantify the difference between the predicted probability distribution and the ground truth labels. The formula for cross-entropy loss is

L_{CE} = - \sum_{i = 1}^{N} y_{i} log ({\hat{y}}_{i})

(1)

where

y_{i}

represents the ground truth label for class i,

\hat{y_{i}}

is the predicted probability for class i, and N denotes the total number of classes.

In this project, cross-entropy loss is applied to the logits generated by the Img2prog module to compare the predicted front labels with the ground truth instructions. This loss is computed for both real and synthetic data. Unlike Kaspar et al. [2], who implemented a multiple-instance learning (MIL) strategy to allow label tolerance, our approach strictly evaluates exact matches between predictions and the ground truth. This stricter evaluation avoids inflating accuracy metrics and ensures precise predictions, particularly for imbalanced datasets. Overall, cross-entropy loss is crucial for improving the model’s ability to generate accurate knitting instructions as it effectively aligns the model’s predictions with the target labels.

Adversarial loss, inspired by Generative Adversarial Networks (GANs) [6], is employed to enhance the realism of the generated rendering images. We utilize a Conditional GAN (cGAN) framework [7], where the discriminator distinguishes between real and generated images based on the input instructions.

The adversarial loss of a generator is computed using the least squares GAN approach [8]:

L_{adv} = E_{x} [{(D (G (x), y) - 1)}^{2}]

(2)

where x represents the input image, y is the conditioning information (instructions),

G (x)

is the image generated by the refiner, and D is the discriminator.

In our method, the refiner serves as the generator, producing rendering images from real images, while the discriminator evaluates their authenticity. Specifically, the discriminator compares ground truth rendering images, or positive samples, to the generated rendering images or ground truth real images, which are considered negative samples. Adversarial loss forces the refiner to generate rendering images that are indistinguishable from real ones, ensuring they achieve a higher degree of visual realism. This improved realism enhances the quality of the generated outputs, making them more suitable for subsequent tasks such as accurate front label prediction.

Perceptual loss evaluates the similarity between the generated and ground truth rendering images in the feature space of a pre-trained VGG network [18]. This loss builds upon the work of Gatys et al. (2016) [19] on neural style transfers. Perceptual loss, specifically when focusing on style similarity, is defined as follows:

L_{style} = \sum_{l} w_{l} {∥ G^{l} (\hat{x}) - G^{l} (x) ∥}_{F}^{2}

(3)

where

G^{l} (\cdot)

represents the Gram matrix of the feature maps from layer l,

\hat{x}

and x are the generated and ground truth images,

w_{l}

denotes the weight assigned to each layer, and

{∥ \cdot ∥}_{F}

is the Frobenius norm.

In our method, perceptual loss evaluates the features extracted from layers of the VGG network. By comparing these features, the loss ensures that the generated rendering images maintain their structural and textural consistency with the ground truth. This encourages the model to produce outputs that not only visually resemble the target images but also retain the stylistic and structural details necessary for accurate instruction generation.

Syntax loss is designed to enforce syntactic correctness in the predicted knitting instructions. It penalizes invalid transitions between stitches based on a predefined syntax of knitting patterns. This loss function does not have a straightforward mathematical formula but operates by referencing a transition matrix that defines valid stitch combinations.

In our method, the predicted instruction logits are compared against these allowable transitions to ensure logical and structural compliance with knitting patterns. This approach guarantees that the generated instructions are not only accurate at the individual stitch level but also syntactically valid. By adhering to these constraints, the syntax loss ensures that the resulting instructions can be feasibly manufactured, making them both practical and consistent with real-world knitting requirements.

The total loss function for the generator phase combines several components to ensure the generated front labels are accurate, realistic, and syntactically valid. The total loss can be expressed as follows:

L_{total} = L_{CE} + L_{adv} + L_{style} + L_{syntax}

(4)

These components work together to achieve high-quality front label predictions that not only align with the knitting patterns but are also realistic and compliant with knitting syntax.

3.2. Complete Label Inference: Residual Model

The inference phase focuses on transforming the 20 × 20 front label, generated by the generator (refiner+Img2prog), into a 20 × 20 complete label. This step is essential for producing knittable instructions, as the front label only represents the visible front side of the fabric, omitting the back layers necessary to create comprehensive knitting instructions.

The residual model transforms the front label into a complete label using a three-stage architecture that includes an encoder, bottleneck, and decoder (Figure 10).

The encoder begins by extracting features from the input using an initial convolution layer with Conv2D, BatchNorm, and ReLU activations, ensuring efficient feature representation. Residual blocks are then applied to process these features while preserving their spatial consistency, followed by max pooling to reduce their spatial resolution. Intermediate outputs are retained through skip connections, enabling the decoder to access earlier feature maps for precise reconstruction. The bottleneck stage refines these features further using a deep residual block and includes dilated convolutions to expand the receptive field without increasing the number of parameters, capturing the long-range dependencies critical for accurate predictions. Finally, the decoder reconstructs the complete label by gradually upsampling the features to restore spatial resolution. It combines the earlier encoder outputs via skip connections to ensure that both high-level and low-level features contribute to the final output. A final convolution layer produces the 20 × 20 complete label, encoding comprehensive knitting instructions with structural consistency.

The residual model leverages residual blocks, skip connections, and dilated convolutions to ensure precise and efficient transformation of the front label into a fully knittable complete label.

The front label exclusively contains information about the front-facing stitches, leaving out the structural details of the back side. A complete label combines the information from both the front and back stitches, ensuring that the generated instructions are machine-compatible and create knittable outcomes. As demonstrated by Yuksel et al. (2012), in traditional stitch patterning, the label of the current stitch is strongly correlated with its four neighboring stitches (up, down, left, and right) [12]. This intrinsic relationship validates our choice to incorporate 3 × 3 kernels in the network architecture, as they effectively capture local spatial dependencies.

The residual model utilizes residual blocks inspired by He et al.’s ResNet architecture, which addresses the vanishing gradient problem in deep networks [16]. Each block consists of convolutional layers with skip connections, enabling gradients to flow directly through the network and allowing the model to learn more effectively. This architecture is particularly well suited to tasks requiring spatial consistency, such as inferring complete labels from front labels, where minor errors can propagate throughout the knitting instruction set.

To train the inference phase, we employed a cross-entropy loss function (Equation (1)). Unlike the generator phase, where pixel-level differences in real images and front labels have a minimal impact on accuracy, the inference phase demands precise stitch-level predictions. Introducing multiple-instance learning (MIL) techniques, as done by Kaspar et al. (2019) [2], would exacerbate the prediction error by neglecting the fine-grained differences between adjacent stitches. Given the small input resolution (a 20 × 20 front label), such approximations lead to substantial inaccuracies in the complete label. By strictly adhering to a standard cross-entropy loss, our model effectively balances the interdependencies between adjacent stitches and enforces stricter accuracy requirements.

4. Experimental Setup and Results

Our model supports four key usage scenarios, as detailed in Appendix A. The first scenario focuses on front label generation, where real knitting images are processed using the refiner and Img2prog modules to produce front labels. The second scenario, complete label generation (unknown yarn type), generates a complete label without prior knowledge of the input yarn type by utilizing the full pipeline, including inference. The third scenario, complete label generation (known yarn type), further optimizes this process by leveraging yarn-specific residual models, distinguishing between the single-yarn (sj) and multi-yarn (mj) categories. Finally, the fourth scenario, complete label generation (using ground truth front label), directly uses ground truth front labels as input to produce complete labels, reducing the model’s dependence on the front label generation step. Each scenario includes detailed commands for execution and training, with TensorBoard support for monitoring the model’s performance and visualizing its results. For more information, see Appendix A.

The experiments carried out for this project were conducted on a system configured with an NVIDIA RTX 2070 GPU running on Windows 11 with WSL2. The Ubuntu 18.04.6 operating system was installed within WSL2 to set up the necessary Linux-based environment. The deep learning framework used for this project was TensorFlow 1.11, with Python 3.6, CUDA Toolkit 9.0, and cuDNN 7.1 used for GPU acceleration. The following sections detail the environment configuration process and essential software installations. The full setup process, with detailed and reproducible commands, is documented in Appendix B.

To evaluate the model’s performance comprehensively, we designed four scenarios, each reflecting varying levels of input information and output requirements. The first scenario focuses on generating front labels from real images using the generation phase (refiner + Img2prog), which forms the basis for subsequent tasks. The second scenario generates complete labels without prior knowledge of yarn types by first producing the front label and then inferring the complete label. In the third scenario, in which yarn type information (sj or mj) is known, the process uses a yarn-specific residual model for complete label generation. Finally, the fourth scenario bypasses front label generation entirely by directly inputting a ground truth front label into the residual model to predict the complete label.

These scenarios provide a structured framework for analyzing the model’s adaptability and performance across a diverse range of tasks.

4.1. Generation Phase Evaluation—Scenario 1

Table 1 summarizes the results from different models used in the generation phase. The metrics evaluated include sample size, parameter count (indicating model complexity), training time (in hours), and F1-score, which acts as the performance indicator. The F1-score was chosen due to the highly imbalanced label distribution in the dataset, ensuring a robust evaluation of the model’s performance across all classes.

The first two models, RFI_complex_a0.5 and RFI_notran_noaug_newinst, were trained on Kaspar’s dataset of 12,392 single-yarn (sj) samples. The baseline model, RFI_complex_a0.5, used a label space with 17 classes, which made visual differentiation challenging and reduced its flexibility in generating extended knittable instructions. In contrast, RFI_notran_noaug_newinst addressed these issues by reducing the label space to 14 front labels, simplifying the classification task and improving its adaptability for complete label generation. Additionally, it used paired input–output data, explicitly mapping real images to rendering images, and introduced transfer images generated with m1plus software as augmented data to compensate for the scarcity of real images. These improvements increased the F1-score from 90.2% to 97.3% while reducing the model’s complexity and training time.

The next two models, RFINet_front_xferln_MIL_160k and RFINet_front_xferln_160k, were trained on a curated dataset of 4950 samples, consisting of 3000 sj and 1950 mj samples. Both models leveraged transfer learning, using pretrained weights from RFI_notran_noaug_newinst to enable faster convergence. The RFINet_front_xferln_MIL_160k model incorporated MIL into its cross-entropy loss, allowing tolerance for neighboring labels. While this approach improved its fault tolerance, it reduced precision due to over-generalization, achieving an F1-score of 82.1%. By eliminating MIL and enforcing stricter label predictions, the RFINet_front_xferln_160k model achieved a higher F1-score of 83.1%, particularly improving its predictions for underrepresented labels.

We selected RFINet_front_xferln_160k as the generation phase model for Scenario 1 (front label prediction) based on three factors. First, it was trained on a balanced dataset of 4950 samples, ensuring compatibility with multi-yarn (mj) data, which were limited in number. Second, its stricter cross-entropy loss resulted in improved accuracy when handling the imbalanced label distribution. Finally, the use of 14 front labels provided the flexibility needed for generating complete labels, making this model a robust and versatile solution for downstream tasks.

The performance of the RFINet_front_xferln_160k model in predicting each stitch in Scenario 1 (sj + mj) is presented in Table 2. The results demonstrate the model’s success in identifying frequently occurring labels, such as FK and BK, for which it achieved F1-scores of 90.5% and 78.2%, respectively. These high scores can be attributed to the dominance of these stitches in the dataset, allowing the model to make robust predictions.

However, the analysis highlights significant challenges encountered with underrepresented labels. For instance, stitch E, for which there were only 166 samples, achieved an F1-score of 0.0%, showing the model’s inability to predict this rare label. Similarly, stitch V, for which there were only 1,471 samples, achieved a low F1-score of 34.9%, indicating the adverse impact of insufficient training data. In contrast, labels such as VR, VL, and X(R), which had moderate sample counts, achieved F1-scores of 69.1%, 64.1%, and 68.5%, respectively, reflecting the model’s ability to generalize when the distribution of data is relatively balanced.

These findings emphasize the importance of addressing data imbalances to improve predictions for less common labels. Increasing the sample size for stitches like E and V through targeted data augmentation or rebalancing efforts could enhance the model’s performance on these stitches. Overall, the model shows strong results for common stitches and lays a promising foundation for further improvements in handling underrepresented labels.

4.2. Inference Phase Evaluation—Scenarios 2, 3, and 4

Table 3 presents the results for various models used in the inference phase across three scenarios: Scenario 2 (unknown yarn type), Scenario 3 (known yarn type), and Scenario 4 (ground truth front label). The columns are similar to those in the generation phase table, detailing sample size, parameter count, training time, and F1-score. This table highlights several trends and key observations about the performance of the inference phase models.

4.2.1. Scenario 2: Complete Label Generation (Unknown Yarn Type)

In this scenario, where predicted front labels are used as input without prior knowledge of the yarn type (sj or mj), two key observations emerge. First, the impact of MIL techniques is notable. Models incorporating MIL, such as RFINet_complete_MIL and xfer_complete_frompred_2lyr_MIL, underperform compared to their non-MIL counterparts. For instance, RFINet_complete_MIL achieves an F1-score of only 71.6%, while the stricter RFINet_complete has an improved score of 80.8%. Similarly, xfer_complete_frompred_2lyr_MIL scores 39.4%, whereas its non-MIL variant achieves 52.7%. This discrepancy highlights the unsuitability of MIL for the inference phase, where precise stitch-level predictions are essential, aligning with previous findings on MIL’s limitations in handling fine-grained spatial dependencies at a 20 × 20 resolution.

Second, the effectiveness of the new CNN architectures becomes evident. Models using CNN-based architectures, such as the 2-layer, 5-layer, residual, and UNet variants, demonstrate varying levels of performance. Among these, the residual model (xfer_complete_frompred_residual) achieves the best balance between accuracy and complexity, with an F1-score of 85.9%. It outperforms RFINet_complete (80.8%) and delivers comparable results to more complex models like xfer_complete_frompred_5lyr (78.1%). These results emphasize the residual model’s efficiency and effectiveness in this task.

4.2.2. Scenario 3: Complete Label Generation (Known Yarn Type)

In this scenario, knowing the yarn type (sj or mj) enables the use of yarn-specific models, resulting in significant performance improvements. Residual models, such as xfer_complete_frompred_residual_sj and xfer_complete_frompred_residual_mj, achieve F1-scores of 97.0% and 90.2%, respectively, outperforming their Scenario 2 counterparts. This highlights the advantage of yarn-specific training in enhancing prediction accuracy. UNet models, including xfer_complete_frompred_unet_sj and xfer_complete_frompred_unet_mj, also perform well; however, the residual models maintain a slight edge in terms of accuracy, demonstrating their efficiency and robustness in this task.

4.2.3. Scenario 4: Complete Label Generation (Using Ground Truth Front Label)

In this ideal scenario, where the input is the ground truth front label, bypassing the generation phase, the inference phase achieves its best theoretical performance. Residual models excel, with xfer_complete_fromtrue_residual_sj achieving an F1-score of 99.8% and xfer_complete_fromtrue_residual_mj reaching 96.0%. These results validate the effectiveness of CNN-based approaches in capturing the spatial dependencies between neighboring stitches using 3 × 3 kernels, consistent with previous theoretical insights [12].

Key observations further highlight the pipeline’s robustness. The incremental improvement from Scenario 2 to Scenario 4 demonstrates the inference phase’s capacity to leverage additional input information, with its accuracy consistently increasing as more precise data are provided. Notably, despite the generation phase achieving an F1-score of 83.1% in front label prediction, the inference phase compensates for these errors, achieving higher F1-scores even in less favorable conditions, such as Scenario 2.

Moreover, the residual model (xfer_complete_frompred_residual) consistently balances complexity and performance, outperforming 5-layer CNN models and achieving comparable results to UNet models with fewer parameters. This analysis underscores the inference phase’s critical role in ensuring the pipeline’s overall accuracy and robustness by compensating for errors and leveraging spatial correlations to produce knittable complete labels.

Table 4 presents the performance of the inference phase models across Scenarios 2, 3, and 4, offering several insights. In Scenario 2, where the yarn type is unknown, common stitches like FK and BK perform well, with F1-scores of 95.9% and 82.6%, respectively. However, rare stitches such as AO(2) (31.8%) and FO(2) (32.3%) struggle due to insufficient data. Additionally, multi-yarn-specific labels like V,HM achieve a modest accuracy (46.1%), further underscoring the challenges posed by data scarcity.

In Scenario 3, where the yarn type is known, separating the single-yarn (sj) and multi-yarn (mj) datasets leads to significant performance improvements. For instance, FK leads to a 98.5% accuracy on the sj dataset, while FK,MAK reaches 94.6% on the mj dataset. The separation allows the model to specialize in stitches exclusive to either sj or mj, capturing their unique structural characteristics.

In the ideal case of Scenario 4, where ground truth front labels are provided, the model achieves near-perfect results. FK achieves 100% accuracy with the sj dataset, and FK,MAKreaches 99.0% accuracy with the mj dataset, validating the theoretical feasibility of inferring complete labels from accurate front labels.

The key takeaways from this analysis include the benefits of separating yarn types during training, which significantly boosts stitch-specific accuracy, and the potential for near-ideal performance with perfect front label inputs. Nonetheless, rare stitches remain challenging, highlighting the need for targeted data augmentation to improve the model’s predictions of these stitches. The results also reinforce the importance of the structural insights gained from yarn type separation in enhancing the model’s specialization and performance.

4.3. Case Study

Figure 11 provides a detailed visualization of the results from multiple scenarios and yarn types, showcasing two samples each from the 4j, 3j, 2j, and sj categories. The figure is organized into columns that represent various stages: the ground truth (real image, rendering image, front label, and complete label), predictions from Scenario 1 (rendering image and front label), predictions from Scenario 3 (complete label, obtained using the predicted front label and yarn-specific inference models), and predictions from Scenario 4 (complete label obtained using the ground truth front label).

This analysis reveals several key observations. The model demonstrates a strong overall performance, accurately predicting front and complete labels across all yarn types. Notably, Scenario 3 highlights the model’s ability to correct errors in front label predictions, as seen in samples from the 4j and 2j yarn types. The model’s predictions for sj samples are near-perfect for both the front and complete labels, while mj samples achieve high accuracy despite their more complex patterns. Finally, Scenario 4 serves as a benchmark, producing near-perfect complete labels, underscoring the model’s potential when given ideal inputs.

This case study highlights the system’s robustness and adaptability, demonstrating its ability to handle diverse yarn types and patterns while effectively correcting errors, making it highly applicable to real-world knitting tasks.

5. Discussion

The results demonstrate a clear progression in accuracy across the four scenarios, with Scenario 4 achieving near-perfect results while using ground truth front labels. The generation phase, powered by the RFINet_front_xferln_160k model, performs well on common stitches but struggles with rare ones, highlighting the importance of balanced datasets. The inference phase compensates for generation phase errors, showing a state-of-the-art performance with yarn-specific models and ground truth inputs.

Our case study reinforces these findings, showcasing the model’s consistent performance across single-yarn (sj) and multi-yarn (mj) samples, with effective error correction in complex cases like 4j and 2j patterns. Key insights include the benefits of yarn type separation, strict loss functions, and residual CNN architectures, which enhance accuracy and robustness. However, data imbalance remains a challenge, particularly for rare stitches.

This research puts emphasis on the creation of structurally complete labels over colored complete labels, meaning the labels have true knittability. While the ultimate goal of reverse knitting includes full visual and structural fidelity, this research is limited to non-color-encoding structural outputs. Furthermore, challenges related to 3D garment shaping and material variability are beyond the scope of this study.

6. Conclusions

This study addresses critical challenges in reverse knitting by introducing a modular two-stage pipeline that separates front label generation from complete label inference. The pipeline leverages residual CNN models to capture spatial dependencies and generate precise knitting instructions, achieving F1-scores of 83.1% in Scenario 1 (generation phase) and of up to 97.0% in Scenario 3 (inference phase). The inference phase’s ability to correct errors in the front label predictions ensures the production of knittable instructions.

Future research should focus on expanding the relevant datasets, particularly those for multi-yarn samples; addressing label imbalance; and implementing data augmentation (e.g., rotation, brightness adjustments). Incorporating color recognition, flexible input–output dimensions, and advanced loss functions like focal loss could further enhance the system. Optimizing the pipeline for industrial knitting machines and extending it to 3D garment shaping and cross-domain textile processes (e.g., weaving, embroidery) will broaden its applicability and scalability. The current pipeline is constrained by fixed input dimensions (160 × 160 pixels) and stitch grids (20 × 20). Future research should explore handling variable input and output dimensions, leveraging object detection models like YOLO [20] to dynamically detect and label individual stitches, potentially supported by fine-grained stitch-level annotations. These advancements lay the foundation for fully automated, customizable textile manufacturing systems, meeting modern design and production demands.

Author Contributions

Conceptualization, X.Z. and H.S.; methodology, H.S. and M.L.; software, H.S.; validation, H.S., M.L. and S.C.; formal analysis, H.S.; investigation, X.Z. and M.L.; resources, X.Z.; data curation, S.C. and X.Z.; writing—original draft preparation, H.S.; writing—review and editing, M.L.; visualization, H.S.; supervision, M.L.; project administration, H.S. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available on FigShare at https://doi.org/10.6084/m9.figshare.28333379.v2, accessed on 18 February 2025. The complete source code required for reproducing the results is available on GitHub at https://github.com/SHolic/neural_inverse_knitting/tree/main, accessed on 18 February 2025. Additionally, a demonstration video showcasing the inference procedure can be viewed on YouTube at https://www.youtube.com/watch?v=GYR-Pck013s, accessed on 1 April 2025.

Acknowledgments

We would like to extend our heartfelt gratitude to Xingyu Zheng, Jiahui Shu, Tong Zhou, Ying Teng, and other summer interns from Shanghai Sanda University for their invaluable assistance throughout this project. Our deepest thanks also go to STOLL, created by KARL MAYER and SHIMASEIKI, for generously providing equipment, technical support, and expertise that were critical to the success of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

sj	single-yarn
mj	multi-yarn
MIL	multiple-instance learning
CNN	Convolutional Neural Network

Appendix A

Our model supports four distinct use cases:

1.

Scenario 1: Front Label Generation

Goal: Obtain a front label from a real image.
Components Used: Generation phase (refiner + Img2prog).
Command:
- python main.py
- --checkpoint_dir=./checkpoint/RFINet_front_xferln_160k
- --training=False
Input: Real image.
Output: Front label.
Training Command:
- ./run.sh -g 0
- -c ./checkpoint/RFINet_front_xferln_160k
- --learning_rate 0.0005
- --params discr_img=1,bvggloss=1,gen_passes=1,
- bloss_unsup=0,decay_steps=50000,
- decay_rate=0.3,bunet_test=3,
- use_tran=0,augment=0,bMILloss=0
- --weights loss_D*=1.0,loss_G*=0.2 --max_iter 160000
TensorBoard: We can use TensorBoard to view various loss changes, generated images, multi-class confusion matrices, etc.
- tensorboard
- --logdir "./checkpoint/RFINet_front_xferln_160k/val"

2.

Scenario 2: Complete Label Generation (Unknown Yarn Type)

Goal: Obtain a complete label from a real image without prior knowledge of its sj/mjclassification.
Components Used:
-
Generation phase (refiner + Img2prog) for front label generation.
-
Inference phase for complete label prediction.
Command:
- python xfernet.py test
- --checkpoint_dir ./checkpoint/xfer_complete_frompred
- _residual
- --model_type residual
- --dataset default
- --input_source frompred
Input: Real image.
Output: Complete label.

3.

Scenario 3: Complete Label Generation (Known Yarn Type)

Goal: Obtain a complete label with knowledge of the sj/mj classification of the input image.
Components Used:
-
Generation phase (refiner + Img2prog).
-
Yarn-specific residual model for inference phase.
Command:
- python xfernet.py test
- --checkpoint_dir ./checkpoint/xfer_complete_frompred_sj
- --model_type residual
- --dataset sj
- --input_source frompred
Or for mj data:
- python xfernet.py test
- --checkpoint_dir ./checkpoint/xfer_complete_frompred_mj
- --model_type residual
- --dataset mj
- --input_source frompred
Input: Real image.
Output: Complete label.

4.

Scenario 4: Complete Label Generation (Using Ground Truth Front Label)

Goal: Generate complete labels using a ground truth front label and knowledge of yarn type.
Components Used: Yarn-specific residual model.
Command:
python xfernet.py test
--checkpoint_dir ./checkpoint/xfer_complete_fromtrue_sj
--model_type residual
--dataset sj
--input_source fromtrue
Or for mj data:
python xfernet.py test
--checkpoint_dir ./checkpoint/xfer_complete_fromtrue_mj
--model_type residual --dataset mj
--input_source fromtrue
Input: Ground truth front label.
Output: Complete label.

For Scenario 2 through to Scenario 4, if you wish to execute the training process, you simply need to change the parameter test to train. Additionally, you can utilize TensorBoard to monitor the loss progression and view the confusion matrix by specifying the given checkpoint_dir.

Appendix B

The experimental environment was set up using Miniconda for dependency management and package installation. The following steps outline the configuration process, with all commands provided for reproducibility:

1.: Install Miniconda
     wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux
     -x86_64.sh
     bash Miniconda3-latest-Linux-x86_64.sh
     source ~/.bashrc
2.: Create and Activate Python 3.6 Environment
conda create -n tf1.11 python=3.6
conda activate tf1.11
3.: Install GPU-Compatible TensorFlow and Its Dependencies. Install TensorFlow 1.11 and its associated dependencies, ensuring compatibility with the RTX 2070 and CUDA 9.0.
     conda install tensorflow-gpu=1.11.0
     conda install numpy=1.15.3
     conda install scipy=1.1.0
4.: Install CUDA Toolkit and cuDNN
conda install cudatoolkit=9.0 cudnn=7.1
5.: Install Python Package Requirements. The required Python packages were installed using the requirements.txt file provided in the project repository.
pip install -r requirements.txt
6.: Install ImageMagick for Image Processing. ImageMagick was used for image manipulation during the preprocessing stage. The following commands were used:
     sudo apt update
     sudo apt install imagemagick
     sudo apt install zip unzip
7.: Set Up Jupyter Notebook 6.4.3 for Interactive Development. Jupyter Notebook was installed to facilitate interactive code testing and experimentation.
conda install jupyter
jupyter notebook --ip=0.0.0.0 --no-browser
8.: Install Additional Libraries. For additional functionalities, the Scikit-learn library was installed.
conda install scikit-learn

The experimental environment was designed to leverage the computational power of the RTX 2070 GPU and the stability of TensorFlow 1.11, ensuring compatibility with older dependencies and toolkits. The use of Miniconda allowed for efficient dependency resolutions, while WSL2 provided a seamless bridge between the Windows and Linux environments. By documenting the environmental setup with reproducible commands, this configuration can be easily replicated for future experiments or debugging purposes.

References

Bohm, G.; Suteu, M.D.; Doble, L. Study on knitting with 3D drawings using the technology offered by Stoll. Fascicle Text. Leatherwork 2022, 23, 5–9. [Google Scholar]
Kaspar, A.; Oh, T.-H.; Makatura, L.; Kellnhofer, P.; Matusik, W. Neural inverse knitting: From images to manufacturing instructions. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 3272–3281. [Google Scholar]
Shrivastava, A.; Pfister, T.; Tuzel, O.; Susskind, J.; Wang, W.; Webb, R. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2107–2116. [Google Scholar]
Trunz, E.; Klein, J.; Müller, J.; Bode, L.; Sarlette, R.; Weinmann, M.; Klein, R. Neural inverse procedural modeling of knitting yarns from images. Comput. Graph. 2024, 118, 161–172. [Google Scholar] [CrossRef]
Melnyk, V.E. Punch card patterns designed with GAN. In Proceedings of the 2021 DigitalFUTURES; Springer: Singapore, 2022; pp. 83–94. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Scheidt, F.; Ou, J.; Ishii, H.; Meisen, T. deepKnit: Learning-based generation of machine knitting code. Procedia Manuf. 2020, 51, 485–492. [Google Scholar] [CrossRef]
Yuksel, C.; Kaldor, J.M.; James, D.L.; Marschner, S. Stitch meshes for modeling knitted clothing with yarn-level detail. ACM Trans. Graph. 2012, 31, 37:1–37:12. [Google Scholar] [CrossRef]
Wu, K.; Swan, H.; Yuksel, C. Knittable stitch meshes. ACM Trans. Graph. 2018, 37, 121:1–121:14. [Google Scholar] [CrossRef]
Wu, K.; Gao, X.; Ferguson, Z.; Panozzo, D.; Yuksel, C. Stitch meshing. ACM Trans. Graph. 2018, 37, 130:1–130:14. [Google Scholar] [CrossRef]
Lafferty, J.; McCallum, A.; Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, 28 June–1 July 2001; pp. 282–289. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 37–30 June 2016; pp. 770–778. [Google Scholar]
Narayanan, V.; Albaugh, L.; Hodgins, J.; Coros, S.; McCann, J. Automatic knitting of 3D meshes. ACM Trans. Graph. 2018, 37, 109:1–109:14. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 37–30 June 2016; pp. 779–788. [Google Scholar]

Figure 1. Reverse knitting.

Figure 2. Data processing workflow.

Figure 3. Comparison of transfer image with real image.

Figure 4. Mapping from front label to complete label. The “No” column represents the numerical identifiers assigned to each stitch type. The “name” column lists the abbreviated names of the stitch types. The “color” column indicates the encoded colors associated with each stitch type, which are used for visualization purposes. The “image” column (left table) is reserved for displaying physical representations or diagrams of the corresponding stitch types. (a) sj map; (b) mj map.

Figure 5. Physical sheet.

Figure 6. Yarn type distribution.

Figure 7. Stitch distribution for (a) front labels and (b) complete labels.

Figure 8. Overall architecture: (a) architecture diagram; (b) input and output.

Figure 9. Refiner+Img2prog architecture: (a) refiner+Img2prog diagram; (b) input and output.

Figure 10. Residual model architecture: (a) residual diagram; (b) input and output.

Figure 11. Case study.

Table 1. Results of models used for generation phase.

Model	Sample Size	Params Count *	Time (h)	F1-Score
RFI_complex_a0.5	12,392	2,934,605	6.50	90.2%
RFINet_notran_noaug_newinst	12,392	2,934,398	5.00	97.3%
RFINet_front_xferln_MIL_160k	4950	2,934,398	3.00	82.1%
RFINet_front_xferln_160k **	4950	2,934,398	3.00	83.1%

* Params Count indicates the model’s complexity. ** This is the selected model.

Table 2. Results from generation phase for each stitch.

Stitch *	Scenario 1 (sj + mj)
Stitch *	Count	F1-Score
FK	1,484,133	90.5%
BK	209,577	78.2%
T	87,510	53.6%
H	41,059	67.8%
M	37,223	35.5%
E	166	0.0%
V	1471	34.9%
VR	25,359	69.1%
VL	25,733	64.1%
X(R)	7031	68.5%
X(L)	7043	59.8%
O	18,933	43.0%
Y	22,904	63.3%
FO	11,858	30.8%

* Stitch labels and images can be observed in Figure 4.

Table 3. Results for models used in inference phase.

Model	Sample Size	Params Count	Time (h)	F1-Score
RFINet_complete_MIL	4950	2,935,778	4.67	71.6%
RFINet_complete	4950	2,935,778	4.67	80.8%
xfer_complete_frompred_2lyr_MIL	4950	21,026	2.75	39.4%
xfer_complete_frompred_2lyr	4950	21,026	2.75	52.7%
xfer_complete_frompred_5lyr	4950	1,585,422	3.00	78.1%
xfer_complete_frompred_residual *	4950	872,034	3.00	85.9%
xfer_complete_frompred_unet	4950	279,138	3.00	83.9%
xfer_complete_frompred_2lyr_sj	3000	21,026	1.75	95.0%
xfer_complete_frompred_residual_sj *	3000	872,034	1.75	97.0%
xfer_complete_frompred_unet_sj	3000	279,138	1.75	96.2%
xfer_complete_frompred_2lyr_mj	1950	21,026	1.00	74.0%
xfer_complete_frompred_residual_mj *	1950	872,034	1.00	90.2%
xfer_complete_frompred_unet_mj	1950	279,138	1.00	84.2%
xfer_complete_fromtrue_2lyr_sj	3000	21,026	1.75	98.4%
xfer_complete_fromtrue_residual_sj *	3000	872,034	1.75	99.8%
xfer_complete_fromtrue_unet_sj	3000	279,138	1.75	99.3%
xfer_complete_fromtrue_2lyr_mj	1950	21,026	1.00	86.3%
xfer_complete_fromtrue_residual_mj *	1950	872,034	1.00	96.0%
xfer_complete_fromtrue_unet_mj	1950	279,138	1.00	95.4%

* These are the selected models for each scenario.

Table 4. Results from inference phase for each stitch.

Stitch *	Scenario 2 (sj + mj)		Scenario 3 (sj)		Scenario 3 (mj)		Scenario 4 (sj)		Scenario 4 (mj)
Stitch *	Count	F1-Score	Count	F1-Score	Count	F1-Score	Count	F1-Score	Count	F1-Score
FK	920,228	95.9%	90,474	98.5%	-	0.0%	90,474	100.0%	-	0.0%
BK	183,030	82.6%	13,375	95.2%	11,366	81.8%	13,375	99.7%	11,366	92.8%
T	87,698	65.2%	11,366	89.1%	-	0.0%	3207	98.9%	-	0.0%
H,M	15,618	66.2%	-	0.0%	-	0.0%	1001	97.1%	-	0.0%
M	22,342	65.0%	885	97.1%	2308	80.4%	885	97.1%	2308	96.0%
E,V(L)	15,901	64.5%	158	84.9%	158	94.8%	158	94.8%	-	0.0%
V,HM	1179	46.1%	-	0.0%	-	0.0%	-	0.0%	-	0.0%
VR	8214	84.0%	809	92.1%	809	92.1%	809	99.7%	809	99.7%
VL	7919	82.0%	864	91.9%	864	91.9%	864	99.5%	864	99.5%
X(R)	7031	90.1%	741	95.9%	741	95.9%	741	99.6%	741	99.6%
X(L)	7043	90.0%	713	95.9%	713	95.9%	713	99.6%	713	99.6%
T(F)	25,024	78.5%	3207	89.1%	-	0.0%	3207	98.9%	-	0.0%
V,M	265	80.0%	-	0.0%	6	80.0%	-	0.0%	-	0.0%
E,V(R)	15,703	85.0%	1747	89.1%	-	0.0%	1747	99.7%	-	0.0%
FK,MAK	536,224	88.9%	-	0.0%	107,069	94.6%	-	0.0%	107,069	99.0%
FT,FMAK	65,413	88.9%	-	0.0%	13,106	92.1%	-	0.0%	13,106	92.6%
Y,MATBK	22,904	63.3%	4741	98.9%	-	0.0%	4741	99.6%	-	0.0%
FO(2)	7324	32.3%	-	0.0%	1924	82.3%	-	0.0%	1,924	69.4%
O(5),BK	7774	43.6%	-	0.0%	1500	51.9%	-	0.0%	1500	67.9%
VR,FMAK	23,014	88.8%	4,419	95.9%	-	0.0%	4419	99.9%	-	0.0%
AO(2)	7767	31.8%	-	0.0%	1441	64.7%	-	0.0%	1441	64.0%

* Stitch labels and images can be observed in Figure 4.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sheng, H.; Cai, S.; Zheng, X.; Lau, M. Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns. Electronics 2025, 14, 1605. https://doi.org/10.3390/electronics14081605

AMA Style

Sheng H, Cai S, Zheng X, Lau M. Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns. Electronics. 2025; 14(8):1605. https://doi.org/10.3390/electronics14081605

Chicago/Turabian Style

Sheng, Haoliang, Songpu Cai, Xingyu Zheng, and Mengcheng Lau. 2025. "Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns" Electronics 14, no. 8: 1605. https://doi.org/10.3390/electronics14081605

APA Style

Sheng, H., Cai, S., Zheng, X., & Lau, M. (2025). Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns. Electronics, 14(8), 1605. https://doi.org/10.3390/electronics14081605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knitting Robots: A Deep Learning Approach for Reverse-Engineering Fabric Patterns

Abstract

1. Introduction

2. Data Collection and Preparation

2.1. Front Label Acquisition

2.2. Rendering Image Acquisition

2.3. Real Image Acquisition

2.4. Distribution of Yarn Types and Stitches

3. Model Architecture

3.1. Front Label Generation: Refiner and Img2prog

3.2. Complete Label Inference: Residual Model

4. Experimental Setup and Results

4.1. Generation Phase Evaluation—Scenario 1

4.2. Inference Phase Evaluation—Scenarios 2, 3, and 4

4.2.1. Scenario 2: Complete Label Generation (Unknown Yarn Type)

4.2.2. Scenario 3: Complete Label Generation (Known Yarn Type)

4.2.3. Scenario 4: Complete Label Generation (Using Ground Truth Front Label)

4.3. Case Study

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI