1. Introduction
The latest ongoing automation and data exchange revolution in manufacturing technologies and approaches is known as Industry 4.0. It consists of numerous technologies inclusive of cyber-physical systems, the Internet of Things (IoT), cloud computing, and cognitive computing. Smart manufacturing builds on Industry 4.0 standards to create adaptive and self-optimizing production systems.
Digital Twin (DT) is one of the prominent technologies for smart production. In simple terms, DT is a digital representation of physical objects, their processes, and operations. Digital twins integrate IoT sensors, artificial intelligence, and simulations to mirror the existence of their physical counterparts. This allows for remotely monitoring, managing, and optimizing the system.
For production systems like flexible manufacturing structures (FMS), Digital Twins may be particularly unique. The inherent capacity of FMS is its flexibility to manufacture a range of products. This flexibility leads to complexity in physical layouts, material flows, and logistics. A Digital Twin of the FMS gives entire visibility to the machine operations and allows operators to simulate adjustments and optimize the machine remotely. However, creating accurate Digital Twins requires tracking the physical assets and inventory in real-time. Emerging technologies like indoor localization through Wi-Fi fingerprinting and deep learning for location prediction provide promising solutions. By combining these technologies, precise real-time localization and a complete virtual representation of the FMS can be achieved.
This paper explores the application of Wi-Fi fingerprinting and deep learning to create Digital Twins of FMS which can enable smart manufacturing. The Digital Twin will mirror the physical system and allow efficient planning, scheduling, and optimization of manufacturing operations.
This research proposes a comprehensive framework that integrates Digital Twins, Wi-Fi-based indoor localization, and advanced deep learning models to enhance real-time asset tracking and optimize manufacturing processes. It aims to explore how the combination of Digital Twin technology and Wi-Fi-based localization can improve the operational efficiency of Flexible Manufacturing Systems (FMS). The study investigates which machine-learning algorithms and optimization techniques deliver the highest accuracy and reliability for real-time asset tracking in manufacturing environments. Additionally, it compares traditional sensor-based localization systems with Wi-Fi fingerprinting and deep learning models in terms of scalability, cost-effectiveness, and accuracy. The research also examines how reinforcement learning can enhance the autonomous navigation capabilities of Automated Guided Vehicles (AGVs) within dynamic manufacturing settings. By addressing these questions, the study seeks to significantly advance Industry 4.0 technologies, providing innovative solutions for creating smarter and more efficient manufacturing systems.
2. Literature Review
2.1. Deep Learning Techniques for Indoor Localization
Previously many models of deep learning have been presented. Some of the State-of-the-Art models are presented and compared at the end of this section in
Table 1. From the literature review, it is evident that the most dominant approach is feature extraction from Wi-Fi Received Signal Strength (RSS) and Channel State Information(CSI) as demonstrated in Árvai et al. [
1], Zhao et al. [
2], Mittal et al. [
3], Song et al. [
4], Bregar et al. [
5], Chen et al. [
6], Chen et al. [
7], Zhang et al. [
8], Njima et al. [
9], Liu et al. [
10], and Ashraf et al. [
11]. These authors used the technique for tasks like classification of location and simultaneous classification of location and orientation and Akino et al. [
12] used it for direct coordinate estimation. These authors proved that location-specific services are frequently used in an outdoor environment, and their interior counterparts are also gaining popularity. Using a digital interior map as a reference, it is possible to refine the indoor position by detecting the walking step, turn, or stair action. Zhang et al. [
8] and Liu et al. [
10] have proved that these models can outperform traditional machine-learning methods like K-Nearest Neighbors (KNN) and Support Vector Machines (SVM).
Jang et al. [
13] used Recurrent Neural Networks (RNNs) to capture sequential dependencies in data, potentially beneficial for continuous trajectory tracking. Kim et al. [
14] and Liu et al. [
15] offered scalability for multi-building and multi-floor environments. The latter’s technique can be applied to Field Programmable Gate Arrays (FPGAs).
Wei et al. [
16] provided an indoor localization and semantic mapping framework by using images as input. The underlying principle of the framework is a feature extraction network that allows component-level association with 6 dof poses and labeling. Zhong et al. [
17] provided a database for indoor localization and trajectory estimation using CNN and Long Short-Term Memory (LSTM) with the help of Wi-Fi Received Signal Strength and geomagnetic field intensity mapping it into an image-like array.
Tiku et al. [
18] provided adaptive deep learning for fast indoor localization. They describe a method for lowering the computing demands of a deep learning-based indoor localization framework while preserving accuracy targets. Later, Tiku et al. [
19] provided a deep learning-based indoor localization framework. They suggest a novel way to maintain indoor localization accuracy even when AP attacks are present.
Wang et al. [
20] provided joint activity recognition and indoor localization by proposing a dual-task convolutional neural network with 1-dimensional convolutional layers. Lin et al. [
21] recommended using the richer regional features instead of the raw RSS by suggesting a deep learning network that combines three components: a one-dimensional convolutional neural network for extracting regional RSS features, a Siamese architecture for dealing with similarity inconsistency, and a regression network for user placement.
Chenning et al. [
22] suggested an object-based indoor localization algorithm correctly recognizing 81.7% of the items in the photos, with a success rate of 59.5% and a 1–5 m accuracy of 59.5%. Abbas et al. [
23] provided a deep learning-based indoor localization system that achieves fine-grained and reliable accuracy even in noisy environments. In [
24], a new convolutional neural network was created to learn the correct features automatically. Experiments revealed that the suggested system can recognize nine different behaviors with 98% accuracy in around 2 s, including still, walking, upstairs, up the elevator, up an escalator, down the elevator, down the escalator, downstairs, and turning.
Wang et al. [
25] provided Deep Convolutional Neural Networks (DCNNs) using Wi-Fi devices in the 5Ghz band. They tested its performance in two representative interior situations where they extracted phase data of channel state information (CSI), which is utilized to determine the angle of arrival using a modified device driver (AoA).
Table 1.
Comparative Analysis.
Table 1.
Comparative Analysis.
Reference | Published | Dataset Description | Techniques Used | Accuracy |
Tiku et al. [10,18] | 2021 | Four building | SVM and DNN | Average 90% |
Árvai et al. [1] | 2021 | Several participants and their mobile phone | CNN | Average 83% |
Ashraf et al. [11] | 2020 | Sony Xperia M2 dataset | NN | 95% |
X. Wang et al. [25] | 2020 | CSI dataset of 5Hz | CSI and DCNN | 85% |
Koike-Akino et al. [12] | 2020 | Own dataset | RNN | 96% |
C. Liu et al. [15] | 2020 | RSS dataset | DNN | 87% |
Zhou et al. [24] | 2019 | 10 Participants | CNN | 98% |
Song et al. [4] | 2019 | UJIIndoorLoc and Tampere | SAE CNNLoc | Average 97.5% |
Zhang et al. [8] | 2019 | Training dataset | KNN | 45.8% |
Zhao et al. [2] | 2019 | ImageNET | CSI | 51.8% |
Abbas et al. [23] | 2019 | Public dataset | WiDeep | 90% |
Z. Liu et al. [10] | 2019 | UJIIndoorLoc and Tampere | SVM and KNN | Average 82% |
F. Wang et al. [20] | 2019 | Wiresless dataset | CNN and CSI | 92% |
Lin et al. [21] | 2019 | Training dataset | CNN | 90% |
Mittal et al. [3] | 2018 | RSSI dataset | CNN | 99.67% |
Bregar and Mohorcic [5] | 2018 | 1394 samples | CSI | 88.13% |
Chenning et al. [22] | 2018 | Public dataset | R-CNN | 81.7% |
Zhong et al. [17] | 2018 | 5th floor lobby and 4rth floor corridor hotel | CNN and LSTM | 95% |
Kim et al. [14] | 2018 | UJIIndoorLoc dataset | DNN | 89% |
Jang et al. [13] | 2017 | Own dataset | RNN | |
2.2. Digital Twins in Manufacturing: Current Approaches and Limitations
The idea of a Digital Twin, which may be defined as a digital representation of a physical asset or process, has received considerable attention in the manufacturing industry to enable smart, data-driven decision-making [
26]. Digital twins have been used to optimize production processes, improve asset performance, and enhance supply chain visibility [
27].
In the manufacturing domain, Digital Twins have been leveraged to model production systems, simulate various scenarios, and support decision-making [
28]. Researchers have investigated the integration of Digital Twins with other Industry 4.0 technologies, such as the Internet of Things (IoT), data analytics, and additive manufacturing, to create more sophisticated and intelligent manufacturing environments [
29].
In the context of Flexible Manufacturing Systems (FMS), Digital Twins have been explored to improve system reconfigurability, optimize material flows, and enhance production planning [
30,
31]. Researchers have highlighted the potential of Digital Twins to address the complexity and dynamics of FMS by providing real-time visibility, simulation capabilities, and decision support [
32].
Real-time asset tracking was proposed by Samir et al. [
33]. The author’s focus was on the collection of requirements and the design of a real-time positioning system for asset tracking. Zhang et al. [
34] explored the Device-Free Localization (DFL) paradigm. The authors proposed a two-phase approach wherein in the first phase, the large domain is subdivided into small domains via K-means clustering and then the system is trained using these smaller domains. In the second stage, the distribution is normalized through a Class-specific Cost Regulation Extreme Learning Machine (CCR-ELM).
As discussed previously Wei et al. [
16] used vision-based localization for the DT repository. The authors used LiDAR and a camera to identify the objects that are logged on the localization map. Furthermore, Park et al. [
35] proposed the Fi-Vi scheme, where in the first phase fingerprinting of the components is undertaken and then through visual system localization occurs. The same paradigm is also discussed by Shu et al. [
36] where an RGBD camera is used for visual localization along with Wi-Fi signal localization.
Hu et al. [
37] integrated BIM-enabled Digital Twins with autonomous robotics, LiDAR-based mapping, IoT sensing, and indoor positioning technologies. The authors used third-party software to create BIM environments, populating them with localization data from autonomous robotic mobile sensing and Wi-Fi communications. Furthermore, Pauwels et al. [
38] used similar methodologies for building Digital Twins for robot navigation. The author’s focus was on communication between localization data schemes and BIM models. In addition, Wong et al. [
39] worked on indoor navigation for fire emergency response. The author focused on inertial sensor integration into the BIM system via a particle filter. In the same paradigm, Mahmoud et al. [
40] used digital twinning and localization through BIM-extracted data for personal thermal comfort modeling.
Recently, Morais et al. [
41] used Digital Twins in outdoor wide area 6G localization. He proposed using the digital twin’s ray tracing feature in combination with the fingerprinting database. In 2023, Karakusak et al. [
42] presented a marvelous paper. He devised a Digital Twin indoor positioning system via Artificial Intelligence with the help of RSS. For the localization algorithm, they used MLP, LSTM Model 1, and LSTM Model 2, achieving an average localization error of less than 2.16 m. They showcased their results through autonomous mobile robots physically in the experimental area.
However, most existing Digital Twin approaches in manufacturing have relied on sensor technologies such as computer vision, RFID, and multi-modal sensor fusion, which can be constrained by line-of-sight requirements, infrastructure changes, or complex integration challenges [
43,
44,
45,
46]. The Wi-Fi-based localization and deep learning approach proposed in this paper offers a novel solution to create Digital Twins of FMS, leveraging the ubiquity of wireless networks in modern factories.
2.3. Critical Comparison with Existing Implementations
The concept of Digital Twins has been widely explored in the manufacturing domain. From the literature, it can be observed that the research is offering significant advancements in monitoring, optimizing, and simulating production systems. Existing works have demonstrated the potential of Digital Twins to enhance supply chain visibility, improve asset performance, and support data-driven decision-making. However, most prior implementations exhibit certain limitations. The proposed framework seeks to address these limitations. This section critically compares the proposed Digital Twin framework with existing implementations to highlight its unique contributions.
Existing Digital Twin implementations often rely heavily on expensive and infrastructure-intensive sensor technologies, such as RFID and computer vision systems. While these provide accurate tracking and monitoring, their deployment is costly and often constrained by line-of-sight requirements. Another approach is multi-modal sensor fusion. This approach improves data accuracy but requires extensive calibration and integration efforts, increasing implementation complexity such as Hu et al. [
37].
The proposed framework addresses these challenges by leveraging Wi-Fi fingerprinting for indoor localization, which utilizes existing wireless network infrastructure. This approach significantly reduces deployment costs and enhances scalability, making it suitable for modern factory environments with widespread Wi-Fi availability.
- b.
Localization Accuracy
Prior studies, such as those by Wang et al. [
47] and Abbas et al. [
23], have utilized machine-learning models like SVMs and CNNs for localization tasks. While effective, these approaches often achieve limited accuracy in multi-floor or complex environments. For example, SVM models typically exhibit higher mean coordinate errors in environments with dynamic obstacles as noted by Morais et al. [
41]. Some implementations, like Bregar et al. [
5], rely on CSI data, which is sensitive to environmental changes and requires additional hardware modifications.
The proposed framework demonstrates superior localization accuracy, achieving mean coordinate errors between 1.2 and 2.8 m using KNN and CNN-ADAM models. This performance surpasses traditional machine-learning methods and aligns with State-of-the-Art benchmarks, as evidenced by the evaluation against the UJIIndoorLoc dataset.
2.4. Research Contributions
This paper introduces a new framework that integrates Artificial Intelligence (AI) and Digital Twin (DT) technologies with Wi-Fi-based indoor localization. This framework offers several advantages including low cost, dynamic updates, and robustness.
Key features of the framework include the following:
Training on a comprehensive public dataset: the system leverages a large public dataset called “UJIIndoorLoc” encompassing data from multiple floors.
Exploration of various models and optimization algorithms: the framework evaluates different machine-learning models (SVM. RF, DT, KNN, CNN) coupled with three optimizers (ADAM, SGD, RMSPROP) to determine the most effective combination.
Superior Performance by KNN: The KNN model equipped with various optimizers consistently outperforms the baseline in terms of localization accuracy, except for the 95th and 100th percentiles. However, the CNN-ADAM combination shows a higher mean squared error compared to the benchmark.
Obstacle Avoidance with deep reinforcement learning: the framework incorporates a deep reinforcement learning algorithm that utilizes localization data. This enables an Automated Guided Vehicle (AGV) within a lab environment to successfully navigate and avoid both static and mobile obstacles with a 100% success rate in the area.
3. Methodology
This article presents a Digital Twin creation technique as well as the deep learning models’ dataset. We will also be discussing the proposed work in detail in this section. A publicly available dataset is used for model training. The name of the dataset is “UJIIndoorLoc”. The dataset was assembled using different types of Android phones. Every entry is termed as Wi-Fi “fingerprint”. Each entry consists of the logged strengths of the signal received by the device. The signals are from more than 500 various WAPs at the location of the device. The signals are expressed in the Received Signal Strength Indicator (RSSI). Its unit is decibel-milliwatts (dBm). RSSI values range from negative numbers (stronger signal) to 0 (highest strength), with −104 indicating the weakest detectable signal.
The dataset represents an area of 1.2 million ft2. It covers three buildings that are shown in numbered conventions, 0, 1, and 2. The first buildings have three floors each, following numbered conventions, 0, 1, and 2. The last building has 5 floors, having numbered conventions 0, 1, 2, 3, and 4. So, the location in this dataset can be quantified using, building number, floor number, longitude, latitude, space id, and the relative position. The dataset also logged the following metadata for each entry: the ID of the user, the ID of the phone, and a timestamp for when the entry was logged.
The UJIIndoorLoc dataset is provided in two separate CSV files. The first file, named “UJIIndoorLoc_trainingData.csv”, contains 19,937 data points collected from 933 unique locations. The second file, “UJIIndoorLoc_validationData.csv”, comprises 1111 data points spanning 1074 distinct locations. Notably, the validation set incorporates examples derived from users and smartphone models that were not involved in generating the training data file. This separation allows for evaluating the performance of models trained on the first file against a distinct set of data points, facilitating robust assessment and prevention of overfitting.
This work proposes an optimized model of CNN using Adaptive Moment (ADAM), Stochastic Gradient Descent (SGD), and Root Mean Square Propagation (RMSProp). The results were then compared with the performances of the Support Vector Machine, Decision Trees, K-Nearest Neighbors, and Random Forests. This Wi-Fi fingerprinting and deep learning-based approach provides precise indoor localization capability.
The Digital Twin is then used to create radio maps of the desired location and through the Digital Twin interface, the data are used for practical applications in Flexible Manufacturing Systems.
Figure 1 shows the complete methodology of the localization.
3.1. Convolutional Neural Network
1D Convolutional neural networks are related to the more well-known 2D Convolutional neural networks. Notably, 1D Convolutional Networks are utilized chiefly for texts and 1D signals. Convolution Neural Networks (ConvNets) include filters of varying sizes and forms that convert the original phrase matrix into a lower-dimension matrix. ConvNets are employed to distribute discrete word embedding in text classification. We extract the max value out of a pixel block. It reduces the image so that we can run convolutions and discover patterns on various scales. This operation can also be applied to text. There is only one dimension this time, and we do it on all channels. Convolutions are typical, followed by pooling another convolution, and so on. It allows us to find more dependence in our text. The two procedures discussed before, that is, convolutions and pooling, can be considered feature extractors. Afterwards, we transmit this feature to the system, as a reshaped vector of one row. An addition was made to the traditional work by optimizing the CNN using the following three optimizers.
3.1.1. RMSProp
It reduces the learning rates for Adagard using a moving average that is a squared gradient. It chooses a separate study for each parameter and automatically reduces the learning rate by automatically updating it. The technique uses exponential decay to divide the average learning rate.
β or
is the decay term. It will be taken from 0 to 1 in value.
gt moves an average gradient of squared.
3.1.2. Adaptive Moment Estimation (Adam)
This technique is used to calculate the learning rate of each parameter by using 1st and 2nd instants. It reduces Adagrad’s learning rates. It is the combination of Adagards. The technique updates the first and second moment’s exponential moveable gradient averages (
mt) and squared gradient (
vt). The decay rates of these are controlled by
β1,
β2 β1 [0, 1]; it is shown below.
In particular, the moving averages are zero at first. This leads to instant estimates of zero in the first steps. This initial partition can easily be counteracted, and biased estimates can be achieved.
Finally, as shown below, we update the parameter “
θ”
3.1.3. Stochastic Gradient Descent (SGD)
SGD only calculates on a small subset of random data instances instead of computations on the entire dataset—which is redundant and inefficient. Adam is essentially an algorithm to optimize stochastic objective functions through gradients.
One may wonder why we did not simply use the tuned models’ cross-validation score. In the case of the neural network, the tuned neural net’s performance on the validation set is the final indicator of how well the model performs. The general reason for this is that when searching across many sets of hyperparameters, it is possible, by random chance alone, that a set of hyperparameters gives a good cross-validation score/good performance on the cross-validation set. It is a valid concern, especially since most models have more than 2 or 3 hyperparameters to tune. The number of combinations of hyperparameters we can develop is multiplicative, so we have many trials. By choosing to report the tuned model’s performance on a separate test set as the final indicator, we essentially avoid this pitfall of overestimating model performance due to random chance. It is doubtful that a “lucky” model will get lucky both on cross-validation and on the validation and test.
The test set used is the “UJIIndoorLoc_validationData.csv” dataset, which includes separate fingerprints taken by devices that are not in “UJIIndoorLoc_trainingData.csv”. It makes model performances evaluated on the test set quite indicative of real-world performance. It contains examples that the model had not seen during training and examples generated by devices that the model had not seen during training. In the design we have chosen, we validate the model on examples that the model had not seen during training. Then, we are showing the model examples that it had not seen, and examples generated by devices it had not seen. If the optimized model we tune will perform well on the test set, it must be more general to handle new devices, which is good. In the alternative design where we mix the two datasets, we ensure that the validation set and the test set come from the same distribution, so we show the model the devices that are only found in “UJIIndoorLoc_validationData.csv” as well during training. The work suggests that either design choice can be justified, and either one accomplishes the goal of this study, which is to evaluate the feasibility of Wi-Fi signals for indoor positioning.
3.2. Data Preprocessing
No missing values were found. This study used the Wi-Fi fingerprints (columns WAP001 through WAP520) as the features. Each received signal strength value was converted to a positive representation, with 0 representing no signal and 1 to 105 representing weak to strong signals. In any given example, only a few WAPs were detected. Thus, a sparse matrix would more likely represent the data better, which required us to change the no signals representation from 100 s to 0 s.
Since the longitude, latitude, floor, and building number are enough to define a precise location, the space ID, and relative position were not used. Note that unlike in a typical regression or classification problem where there is a single label with two or more values or classes (e.g., “what’s the sales volume of this product”, “what is the object in this image?” or “what is the brand preference of this user, Sony or Acer?”), this problem consists of multiple labels (with each label containing multiple values/classes). A single categorical label was created to handle this called UNIQUE LOCATION, which takes on integer values. As the name implies, the UNIQUE LOCATION label takes on different values for each unique location, defined by the longitude, latitude, floor number, and building ID.
Features were not centered since that would destroy the sparse structure of the data. However, since gradient descent algorithms converge faster with normalized values, the features in the training set for the neural network were normalized to a 0-to-1 range by dividing by 105.
The package to train the neural network required that categorical variables be one-hot-encoded into the dummy variable form. It was conducted for the UNIQUE LOCATION label before neural network training. All features contained numerical values.
3.3. Model Explanation
Neural network classification is a layered architecture inspired by the structure of biological neurons. These layers consist of mathematical constructs designed to process and transform input data through a series of interconnected computations.
The training procedure involves repeatedly cycling through the entire training set, where each complete iteration is called an epoch. After each epoch, the model’s parameters are updated based on the cumulative gradients computed from the batches of training examples, thereby progressively refining the network’s predictions.
- (1)
Hyperparameters Tuned
Epochs: This refers to the number of complete passes through the entire training dataset during the model’s training process. It is represented as an integer value.
Batch_size: This parameter determines the number of samples that are propagated through the neural network at once during the training process. It is an integer value, representing the size of each mini-batch used in the optimization algorithm (Adam, a variant of stochastic gradient descent).
Hidden_layers: This integer value specifies the number of hidden layers in the neural network architecture.
Neurons_per_hidden_layer: An integer representing the fixed number of neurons or units present in each hidden layer of the neural network. Note that in this case, the same number of neurons was used for all hidden layers.
L2_reg_lambda: A floating-point value denoting the regularization strength of the L2 regularization technique, which helps prevent overfitting by adding a penalty term to the loss function.
Dropout: This float value represents the probability of randomly dropping out (or deactivating) a fraction of neurons during the training process, another regularization technique to reduce overfitting.
All other hyperparameters were left at their default values as specified by the package used for training the neural network models.
- (2)
Model Tuning and Evaluation
A manual grid search was conducted. The use of cross-validation was avoided due to higher computational cost as well as training time required. As for the best model, we used the hyperparameter values that offer the highest accuracy. To estimate the degree of overfitting, we found the accuracies of the training set, and validating set, subsequently finding the difference between them.
3.4. Machine-Learning Models
For each model type (Random Forest, k-NN, and neural network), we used the following approach to perform our data analysis and model building more systematically.
3.4.1. Random Forest Classifier
This technique is an ensemble method that combines outputs from multiple decision trees. In a single decision tree, overfitting is a major issue. This classifier uses the predicted class of different trees, ultimately reducing overfitting. A random sample of size n is used to construct each unique tree from the training set.
Model Training and Evaluation
A grid search was conducted over the hyperparameters using 10-fold cross-validation. The hyperparameter values that yielded the highest cross-validation accuracy were selected as the optimal model. Additionally, the cross-validation kappa was calculated. To assess overfitting, the differences between the cross-validation scores and the average scores on the training folds were computed. The optimal model was then utilized to predict unique locations in the test set. Subsequently, a reference table was used to convert the predicted unique locations back to their corresponding longitude, latitude, floor number, and building ID. For example, for a unique location value of 1151, the longitude is −7541.26 m, the latitude is 4.86492 × 106 m, the floor number is 2, and the building ID is 1. Finally, the following metrics are reported for the predicted test set locations:
Mean positional error—the Euclidean distance between the actual and predicted positions averaged overall test set examples. A position is defined by longitude and latitude (meters. 25, 50, 75, 95, and 100th percentile of the positional error—also based on the Euclidean distance). It indicates how close the most accurate predictions were and how far away the most inaccurate predictions (100th %ile) were in meters.
Building hit rate—the %age of examples where the predicted building ID was correct.
Floor hit rate—the %age of examples where the predicted floor was correct.
3.4.2. K-Nearest Neighbors Classifier
The K-Nearest Neighbors (K-NN) classifier is a non-parametric algorithm. It makes predictions based on the similarity of the data points. Unlike neural networks, it does not involve parameter learning during the training period or phase. Instead, it relies on the distance matrix to classify the new examples.
The K-NN algorithm calculates the distance between a new data point and all the examples in the training set. The matrices of common distances include the following two types of distances:
Euclidean Distance.
Manhattan Distance.
After calculating the distances, this algorithm identifies the closest examples. And then assigns the class label based on majority voting. If there is a tie, then the class of the nearest neighbor is chosen. The K-NN has the following characteristics:
Simplicity: K-NN is very straightforward. It does not need training explicitly.
Versatility: It can be applied to both classification and regression tasks.
Sensitivity: This algorithm is very sensitive in operation. A small variable may lead to overfitting, while a large one may result in underfitting.
3.4.3. Support Vector Machine
Support Vector Machines (SVMs) are very robust algorithms. They are designed for both linear and non-linear classification operations. They identify a hyperplane that maximizes the margin. This is the distance between the hyperplane and the nearest data points from each class. The classes are called Support Vectors.
For a linearly separable dataset, the SVM optimization problem aims to
For non-linear data, SVMs employ kernel functions. Examples of kernel functions are radial basis function and polynomial. These functions map the data into higher-dimensional spaces where a linear decision boundary can be constructed.
3.4.4. Decision Trees
Decision trees classify data by splitting. The splitting of data into sub-sets is based on feature values. Each node in the tree represents a decision rule and terminal nodes which are leaves. The leaves correspond to class labels or predicted values.
During training, the algorithm recursively partitions the dataset. It is carried out by selecting the feature that provides the highest information gain or reduction in impurity such as Gini Index, or entropy. This iterative process continues until a stopping criterion is met, such as a maximum tree depth or a minimum number of samples per leaf node.
The decision trees are intuitive and interpretable. Yet they are prone to overfitting. Techniques like pruning or ensembling are often employed to improve their generalization capabilities.
3.5. Data Analysis
An access point has the potential to cover an area of 10,000 square feet, but for our current discussion, we will rely on the previously mentioned average of 1600 square feet per access point.
Figure 2 illustrates the active WAPs per sample.
Figure 3 shows the building and floor counts that have been considered in the study.
The access points are classified into two types. The first ones are used to communicate via radio and the latter are used to connect to a wired network, like Ethernet or Wi-Fi.
Several factors contribute to a weak Wi-Fi signal, with the primary factor being the distance from the router. Wireless routers and access points can only transmit at low power levels, limiting their effective range to approximately 100 feet indoors to prevent interference with other devices.
Figure 4 illustrates the relationship between distance and intensity in WAPs, while
Figure 5 displays the radio map in 3D. Both
Figure 4 and
Figure 5 have the same color configuration.
Figure 6 depicts the preprocessing steps taken to separate null values from normal values, and
Figure 7 demonstrates the distribution of the dataset.
Data have been split into training and testing sets to check the algorithm’s performance.
Figure 8 shows the visualization of training and testing data sets.
3.6. Digital Twin Integration
Achieving a functional Digital Twin of the Flexible Manufacturing System requires the seamless integration of the Wi-Fi-based localization and deep learning prediction models developed in this research. This integration process involves several key steps to create a comprehensive virtual representation and enable remote monitoring, control, and optimization of the physical FMS. The integration steps are as follows:
- A.
The physical layout of the FMS, including the equipment, workstations, and material handling systems, needs to be captured and digitized. This can be conducted through a combination of techniques, such as 3D laser scanning, photogrammetry, or computer-aided design (CAD) modeling. The resulting 3D virtual environment serves as the foundation for the Digital Twin.
- B.
Next, the Wi-Fi access points deployed throughout the FMS are mapped to their corresponding locations within the Digital Twin. This spatial alignment allows the real-time localization data from the deep learning models to be seamlessly integrated into virtual representation. As IoT sensors on the physical assets (e.g., Co-Bots, materials, personnel) collect Wi-Fi RSSI data, the deep learning models predict their coordinates, which are then visualized within the Digital Twin.
- C.
To further enhance the Digital Twin, additional data sources can be integrated, such as production schedules, inventory levels, and equipment status. By combining the localization information with these operational data points, the Digital Twin can provide a holistic view of the FMS, enabling remote monitoring and analysis of the manufacturing processes.
- D.
The integration of the Digital Twin with optimization algorithms and simulation engines is another crucial step. This allows operators to explore different scenarios, such as changes in product mix, equipment maintenance, or layout reconfiguration, without disrupting the physical system. The Digital Twin can serve as a testbed for evaluating the impact of these changes and identifying the most efficient and effective manufacturing strategies.
- E.
Finally, the Digital Twin platform should provide intuitive user interfaces and visualization tools to enable real-time monitoring, control, and decision-making. This could include features such as 3D visualizations of the FMS, data dashboards, and predictive analytics to support the optimization of flexible manufacturing operations.
3.7. Reinforcement Learning in Digital Twin
In this research, the Digital Twin was integrated with a deep reinforcement learning (RL) algorithm. It is conducted in order to enhance autonomous navigation. The RL agent was specifically designed for an Automated Guided Vehicle (AGV). The working of the RL agent is as follows:
- a.
The deep RL algorithm was trained in a simulation environment. This simulation environment was generated from the Digital Twin of the AMP Lab. The agent learned to calculate the optimal trajectories between designated start and endpoints. Furthermore, through the learning, it also understood how to avoid collisions with static and dynamic obstacles. The static obstacles are the FMS and other manufacturing equipment while the dynamic obstacles are humans.
- (1)
The RL agent reward function incentivized safe navigation and goal achievement. There were penalties for collisions or deviations from efficient paths.
- b.
Static and dynamic obstacles were incorporated in real time. Using data from IoT sensors and mobile devices, the system recalculated trajectories dynamically to adapt to environmental changes.
- c.
Wi-Fi RSSI data was processed to provide real-time position tracking with an average error of 1 m. This localization accuracy enabled the RL agent to operate effectively with the aid of a Digital Twin.
3.8. Integration with Industry 4.0 Technologies
The proposed Digital Twin framework integrates seamlessly with key industry 4.0 technologies. This enables enhanced predictive analytics and real-time decision-making capabilities. By leveraging IoT sensors deployed across the manufacturing environment, the framework collects real-time data on equipment status, material flow, and environmental conditions. These data are processed using machine-learning models embedded within the Digital Twin, allowing the system to predict potential failures, optimize production schedules, and adapt to unexpected disruptions.
Real-time decision-making is supported through dynamic dashboards that visualize key performance indicators (KPIs) and suggest optimal actions based on the Digital Twin’s simulations. This integration of Industry 4.0 technologies ensures that the Digital Twin framework not only mirrors the physical system but also enhances operational efficiency and resilience through predictive and adaptive capabilities.
Results demonstrated in our previous paper [
49] on the same framework show that the Digital Twin significantly improved FMS performance. Productivity was enhanced by 14.53% compared to conventional methods, energy consumption was reduced by 13.9%, and quality was increased by 15.8% through intelligent machine coordination. The dynamic optimization and closed-loop control capabilities of the Digital Twin significantly improved overall equipment effectiveness.
4. Results
This section shows the results of all machine-learning models, i.e., Random Forests, KNN, Support Vector Machine, Decision Trees, and Convolutional Networks with Optimized Models.
4.1. Random Forests
Ground truth and prediction data of the model have been calculated using the longitude and latitude features of the data. The predicted results of 24 phone IDs between 0 and 9 can be seen in the results shown in
Figure 9. The results show the phone IDs 0 to 9, the predicted values, as well as the ground values of each one. The ID information is shown on top of each sub-image. The blue color dots show the prediction of the Random Forest Regressor model, and the orange dots show the ground positions in the actual space. The vertical axis of each sub-image’s latitude is in meters and the horizontal axis displays the longitude in meters creating a radio map, where the ID locations throughout the duration are shown. The data show that both predicted and actual locations are within very close proximity showing the accuracy of the model.
Furthermore, the positions of phone IDs were also predicted by Random Forests using parameters of Longitude/Latitude vs Timestamps. The results can be seen in
Figure 10. The corresponding phone ID is shown at the top of each sub-image. Each sub-image has two graphs, one showing the Longitude vs Timestamp data while the other shows the corresponding Latitude. The blue curve shows the predicted values by the Random Forest Regressor model, while the orange one shows the ground locations. From the visualizations, it can be observed that predicted and actual values are quite similar.
Table 2 shows the complete results of Random Forests with Mean Coordinate Error (MCE), Standard Error (SE), Building % Error (BPE), and Floor % Error (FPE) for each Phone ID.
4.2. K-Nearest Neighbors
To validate the model’s performance, the research was compared to its predicted locations with the actual ground truth data. These ground truth data are based on the longitude and latitude features within the dataset.
Figure 11 displays the predicted results for 24 unique phone IDs, ranging from 0 to 9. Each sub-image within the figure showcases the ground truth and predicted locations for a single phone ID. The phone ID is conveniently positioned at the top of each sub-image for easy reference. The blue dots represent the locations predicted by the K-Nearest Neighbors Forest Regressor model, while the orange dots depict the actual ground positions in real space. Each sub-image utilizes a radio map format, where the vertical axis represents latitude in meters and the horizontal axis represents longitude in meters. This layout allows for the visualization of phone ID locations throughout the data collection period. The close proximity between the predicted blue dots and the actual orange dots demonstrates the model’s accuracy in pinpointing phone locations. This visual confirmation highlights the model’s effectiveness in utilizing longitude and latitude features for indoor localization.
Furthermore, the positions of phone IDs were also predicted by K-Nearest Neighbor using parameters of Longitude/Latitude vs Timestamps. The results can be seen in
Figure 12. The corresponding phone ID is shown at the top of each sub-image. Each sub-image has two graphs, one showing the Longitude vs Timestamp data while the other shows the corresponding Latitude. The blue curve shows the predicted values by the K-Nearest Neighbor model, while the orange one shows ground locations. From the visualizations, it can be observed that predicted and actual values are quite similar.
Table 3 shows the complete results of KNN with Mean Coordinate Error (MCE), Standard Error (SE), Building % Error (BPE), and Floor % Error (FPE) for each Phone ID.
4.3. Support Vector Machine
To evaluate the model’s performance, the outcomes were compared to their predicted locations along with the actual ground truth data. These ground truth data are based on the longitude and latitude features within the dataset.
Figure 13 displays the predicted results for 24 unique phone IDs, ranging from 0 to 9. Each sub-image within the figure shows the ground truth and predicted locations for a single phone ID. The phone ID is conveniently positioned at the top of each sub-image for easy reference. The blue dots represent the locations predicted by the Support Vector Machine model, while the orange dots depict the actual ground positions in real space. Each sub-image utilizes a radio map format, where the vertical axis represents latitude in meters and the horizontal axis represents longitude in meters. This layout allows for the visualization of phone ID locations throughout the data collection period. The close proximity between the predicted blue dots and the actual orange dots demonstrates the model’s accuracy in pinpointing phone locations. This visual confirmation highlights the model’s effectiveness in utilizing longitude and latitude features for indoor localization.
Furthermore, the positions of phone IDs were also predicted by the Support Vector Machine using parameters of Longitude/Latitude vs Timestamps. The results can be seen in
Figure 14. The corresponding phone ID is shown at the top of each sub-image. Each sub-image has two graphs, one showing the Longitude vs Timestamp data while the other shows the corresponding Latitude. The blue curve shows the predicted values by the Support Vector Machine model, while the orange one shows ground locations. From the visualizations, it can be observed that predicted and actual values are quite similar.
Table 4 shows the complete results of SVM with Mean Coordinate Error (MCE), Standard Error (SE), Building % Error (BPE), and Floor % Error (FPE) for each Phone ID.
4.4. Decision Trees
To establish the model’s performance, the calculated values were compared to their predicted locations with the actual ground truth data. These ground truth data are based on the longitude and latitude features within the dataset.
Figure 15 displays the predicted results for 24 unique phone IDs, ranging from 0 to 9. Each sub-image within the figure shows the ground truth and predicted locations for a single phone ID. The phone ID is conveniently positioned at the top of each sub-image for easy reference. The blue dots represent the locations predicted by the Decision Tree model, while the orange dots depict the actual ground positions in real space. Each sub-image utilizes a radio map format, where the vertical axis represents latitude in meters and the horizontal axis represents longitude in meters. This layout allows for the visualization of phone ID locations throughout the data collection period. The close proximity between the predicted blue dots and the actual orange dots demonstrates the model’s accuracy in pinpointing phone locations. This visual confirmation highlights the model’s effectiveness in utilizing longitude and latitude features for indoor localization.
Furthermore, the positions of phone IDs were also predicted by the Decision Tree using parameters of Longitude/Latitude vs Timestamps. The results can be seen in
Figure 16. The corresponding phone ID is shown at the top of each sub-image. Each sub-image has two graphs, one showing the Longitude vs Timestamp data while the other shows the corresponding Latitude. The blue curve shows the predicted values by the Decision Tree model, while the orange one shows ground locations. From the visualizations, it can be observed that predicted and actual values are quite similar.
Table 5 showcases the error metrics associated with decision tree technique.
According to the predictions of machine-learning models, KNN has been considered the best regressor. Because it has minimum SE, BPE, and FPE, it has achieved 0.00% error rates, which shows the best performance of the KNN algorithm. Further, we have applied K-Folds cross-validation on KNN, as shown in
Figure 17.
KNN has shown good performance with an optimized error rate at threefold cross-validation, which can be seen in
Figure 18.
4.5. Convolutional Neural Network
The two procedures discussed before, that is, convolutions and pooling can be considered feature extractors. Afterwards, we transmit this feature to the network, usually as a reshaped vector of one row. To validate the current research work, we have optimized the CNN model by using three optimizers. CNN has shown good performance.
Figure 19 shows MSE loss during the Validation of Testing data by CNN-ADAM.
Figure 20 shows the validation set performance of CNN with the ADAM optimizer, SGD optimizer, and RMSProp optimizer.
ADAM shows an MSE of 0.83, while SGD and RMSProp have shown MSEs of 0.98 and 1.010, respectively. The figure below shows the comparative analysis of the average mean squared error of the three hyper-tuned models of CNN.
4.6. Comparative Study
This study applied different Machine-learning models and Deep Learning Hyper-Tuned Models. All the models have shown promising results, among these models CNN-ADAM and KNN are the best models in terms of MSE, FPE, and BPE, respectively.
Figure 21 shows the performance of each model in terms of metric evaluation.
Five solutions were created for the UJIIndoorLoc indoor positioning dataset. The KNN model gave the lowest positional and localization errors, beating the baseline in the paper, in all except for the 95th and 100th percentiles. The building hit rate was 100%, and the floor hit rate, at 90.4%, was around 5% higher than the benchmark. The CNN-ADAM model had a slightly higher overall positional error as a mean squared error than the benchmark. Finally, the Random Forest model gave higher positional errors and a slightly lower building hit rate at 98.6%. Its floor hit rate, at 88%, still outperformed the baseline. Furthermore, research can be implemented on outdoor localization and positioning. It can be very beneficial in terms of predicting the localization and positional errors.
In comparing the results of this study with existing literature, the performance of the machine-learning models used for indoor localization can be evaluated. The Random Forest approach in my study has shown competitive accuracy, with Mean Coordinate Error (MCE) ranging from 1.28 to 19.19 m across different phone IDs. This places my results in line with or better than previous research, such as Wei and Akinci [
16], which achieved 90% accuracy using an image-based method, and Tiku et al. [
18], which achieved 85% to 95% accuracy using Support Vector Machines (SVMs) and deep neural networks (DNNs). Additionally, the K-Nearest Neighbors (KNN) method used in this study stands out, with an MCE ranging from 1.2 to 2.28 m, demonstrating accuracy levels comparable to or exceeding those of many referenced works. Furthermore, the use of Support Vector Machine (SVM) models in this study has shown promising results with MCE ranging from 34.63 to 58.37 m, placing my results within the range of other studies using SVMs.
Methodologically, this study aligns with prior literature in its use of machine-learning models like Random Forests, KNN, and SVMs for indoor localization. However, these results show significant improvements in accuracy, particularly with KNN outperforming other studies in terms of MCE, SE, Building % Error (BPE), and Floor % Error (FPE). This suggests that the current approach of optimizations has resulted in more accurate localization outcomes compared to several referenced works. Additionally, the visualization of the results using radio maps for each phone ID, like the approaches of other studies, allows for a clear comparison of predicted and actual locations, enhancing the transparency and understanding of my methodology and results.
5. Case Study
To demonstrate the application of the proposed localization and deep reinforcement learning techniques, a case study was developed using a collaborative robot (Co-Bot) in a laboratory environment. The lab is 51 × 30 feet in area. The lab was leveraged in this study due to its existing capabilities and flexible layout, which allowed the implementation of the proposed Digital Twin framework. While not originally designed exclusively for this research, the lab provided an optimal environment for experimentation due to its integration of IoT-enabled equipment and the spatial arrangement conducive to deploying Wi-Fi access points.
The Co-Bot has a LiDAR sensor, but for this experiment, it was disconnected, to verify the accuracy and application impact on the localization. The Digital twin is populated with the CAD models of all the static obstacles. As can be seen in
Figure 22, the layout shows two Flexible Manufacturing Systems in the upper two corners. The selective laser jet, pneumatic jet, 3D scanner, coordinate measuring machine, and binder jet printer are located in between and are denoted as 1, 2, 3, etc. They act as static obstacles for the Co-Bot, while humans (technicians, researchers, etc.) act as dynamic obstacles.
The front end of the Digital Twin offers two coordinate points for the Co-Bot. The first one is the starting point, and the second one is the endpoint. At the back end of the Digital Twin, the deep reinforcement learning algorithm is used. In addition to localization, a deep reinforcement learning (RL) agent was developed to generate optimal paths between destinations while avoiding collisions. The RL agent was trained in a simulation environment created from the Digital Twin layout. The agent learned to maximize rewards for reaching goals safely.
It calculates the trajectory based on starting and ending points. Moreover, the static and dynamic obstacles are incorporated in online mode, as it recalculates its trajectory. The DT is fed with data from the models.
Through the localization data from the learning models, the Digital Twin radio map was constructed. The map is showcased in
Figure 23.
For the experiment, the Co-Bot was deployed in this lab environment and tasked with performing material handling jobs. For precise localization, Wi-Fi access points were installed throughout the lab as well and IOT sensors were mounted on all the equipment to act as obstacles for the Co-Bot while the mobile phones of the technicians and researchers were used to create dynamic obstacles. The system gathered Wi-Fi RSSI data which were fed to the trained deep CNN model for real-time position tracking.
Figure 24 shows the Co-Bot operating in the lab environment. The CNN model predicted the location coordinates within an average error of 1 m. This allowed the construction of an accurate Digital Twin of the lab with the Co-Bot’s current position.
This case study successfully demonstrated the integration of Wi-Fi fingerprinting, deep learning for localization, and deep RL for planning behaviors in Co-Bots. The techniques can be extended to other Automated Guided Vehicles in industrial environments to improve navigation accuracy and safety.
6. Conclusions
The study presents a novel framework for Flexible Manufacturing System (FMS). The framework combines Wi-Fi fingerprinting, deep learning for indoor localization, and Digital Twin technology. It is conducted to optimize real-time tracking in FMS. In modern factories, there is a widespread presence of Wi-Fi networks. This approach leverages this Wi-Fi presence and offers a cost-effective, dynamic, and scalable solution for creating Digital Twins of FMS environments.
The evaluation of the framework included several machine-learning models. Among the lot, KNN and CNN demonstrated exceptional accuracy. It achieved a mean coordinate error (MCE) between 1.2 and 2.28 m and a 100% building detection rate. The CNN-ADAM combination further highlighted the potential for deep learning for indoor localization. The CNN-ADAM combination had a mean squared error of 0.83. Additionally, deep reinforcement learning was used in conjunction with Automated Guided Vehicle (AGV). It enabled AGV to navigate and avoid obstacles with 100% success in a laboratory setting.
In comparison with the existing sensor-based Digital Twin, the Wi-Fi-based localization offers a more flexible and scalable solution. Future research could explore incorporating additional data sources. A lot of focus can be applied to testing the framework’s scalability in a larger manufacturing environment where everything is clustered. Finally, this framework advances to Industry 4.0. The advancement is through more efficient, data-driven manufacturing processes that contribute to greater productivity, cost savings, and competitiveness.
6.1. Future Work
Future studies could focus on expanding the proposed framework’s scalability and functionality to address more complex and diverse industrial scenarios. Testing this approach in larger, real-world manufacturing setups, where numerous machines, Automated Guided Vehicles (AGVs), and human operators interact simultaneously, would provide critical insights into its adaptability and robustness.
Additionally, integrating data from other advanced sources, such as LiDAR, RFID, and multi-modal sensors, could further enhance localization accuracy and provide richer, more nuanced operational insights.
Hybrid localization techniques that combine Wi-Fi fingerprinting with other technologies, such as vision-based localization or ultra-wideband (UWB) systems, also represent a promising area for future exploration. These techniques could offer more resilient solutions across diverse environments, particularly in challenging industrial layouts.
6.2. Limitations
The computational demands of integrating machine-learning models and real-time localization could restrict deployment in resource-constrained settings. Small and medium-sized enterprises may also find the initial setup costs for implementing IoT sensors, Wi-Fi access points, and Digital Twin platforms prohibitive.
Additionally, the framework has primarily been validated in controlled laboratory environments. This limited scope may not capture the full spectrum of challenges encountered in operational industrial contexts, necessitating further real-world testing.
By addressing these limitations and pursuing the outlined future research directions, the proposed framework has the potential to evolve into a more versatile, scalable, and impactful solution for advancing smart manufacturing systems.