Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment

So, Junyong; Lee, In-Bae; Kim, Sojung

doi:10.3390/app15084108

Open AccessArticle

Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment

by

Junyong So

,

In-Bae Lee

and

Sojung Kim

^*

Department of Industrial and Systems Engineering, Dongguk University-Seoul, Seoul 04620, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4108; https://doi.org/10.3390/app15084108

Submission received: 11 March 2025 / Revised: 30 March 2025 / Accepted: 7 April 2025 / Published: 8 April 2025

(This article belongs to the Special Issue Advanced Artificial Intelligence Technologies and Applications in Manufacturing and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Although articulated robots with flexible automation systems are essential for implementing smart factories, their high initial investment costs make them difficult for small and medium-sized enterprises to implement. This study proposes a federated learning-based articulated robot control framework to improve the task completion of multiple articulated robots used in automated systems under limited computing resources. The proposed framework consists of two modules: (1) a federated learning module for the cooperative training of multiple joint robots on a part-picking task and (2) an articulated robot control module to balance the efficiency of limited resources. The proposed framework is applied to cases with different numbers of joint robots, and its performance is evaluated in terms of training completion time, resource share ratio, network traffic, and completion time of a picking task. Under the devised framework, the experiment demonstrates object recognition by three joint robots with an accuracy of approximately 80% at a minimum number of learning rounds of 76 and with a network traffic intensity of 2303.5 MB. As a result, this study contributes to the expansion of federated learning use for articulated robot control in limited environments, such as small and medium-sized enterprises.

Keywords:

federated learning; smart factory; digital twin; edge computing; flexible automation

1. Introduction

Smart factories are advanced manufacturing facilities that are highly automated, digitalized, and integrated with state-of-the-art technologies, such as Cyber-Physical Systems (CPSs), Internet of Things (IoT), Internet of Services, big data analysis, artificial intelligence (AI), cloud computing, semantic web, and Augmented Reality (AR) [1]. Recently, they have been adapted by multiple manufacturers to resolve the challenges they are confronted with involving the increased demand for customized products with a short lead time on the market [2], as well as those related to meeting the needs of sustainable production in terms of waste and emission generation and the reduction in energy use [3]. Eventually, the implementation of smart factories will become a necessary option for the survival of manufacturers, regardless of the scale of their facilities and of their economic conditions. However, its high investment cost can be another challenge, especially for the operational management of small and medium-sized enterprises [4].

In general, smart factory technologies are applied to automation systems that replace traditionally manual tasks with machine- and robot-based operations, which include both physical aspects of manufacturing processes and their non-physical aspects, such as information processing and managerial decision-making [5]. Automation systems can be classified as follows by production quantity and variety: (1) programmable automation, enabling changes in the sequence of operations to process product configurations with a coded instruction program; (2) flexible automation, producing items without wasting time changing operations in other systems (i.e., fixed automation systems and programmable systems); and (3) fixed automation, utilizing equipment to automate a fixed sequence of either processing or assembly operations [6]. In these automation systems, industrial robots (IRs) are a key factor in the successful conversion from traditional manual tasks to automated tasks, with improved performance in reconfigurability, accuracy, productivity, versatility, repeatability, and safety [7]. According to the International Organization for Standardization (ISO) 8373:2021, an IR is defined as an “automatically controlled, reprogrammable multipurpose manipulator, programmable in three or more axes, which can be either fixed in place or fixed to a mobile platform for use in automation applications in an industrial environment” [8].

Among IRs, the articulated robot consisting of multiple interconnected links and joints is the most popular IR used in smart factories; because of its use of links and joints, it is capable of replicating the distinctive structural characteristics of human joints, enabling it to represent diverse motions [9]. Thus, articulated robots are well suited for a wide range of industrial applications, including the material handling, processing, assembly, and inspection operations of a product [6]. Picking tasks performed by IRs have been investigated by multiple researchers to achieve a high operational accuracy of IRs in industrial applications. Wada et al. [10] proposed a joint learning methodology using semantic occlusion segmentation to enhance the reasoning instance process of a pick-and-place task. Mohammed et al. [11] presented a raw RGB-D image-based framework generating a robot picking workspace while maintaining reasonable computational complexity. Zeng et al. [12] proposed an object-agnostic grasping system that processed observed images for different grasping actions. Meanwhile, Matenga et al. [13] applied an articulated robot arm under a flexible automation system to mimic a traditional manual rewinding process, and they found the devised automation system to be six times faster than the manual rewinding process.

As an extension of the existing studies, this study aims to introduce a federated learning (FL)-based framework to train multiple articulated robots under a flexible automation system. In particular, we consider the control of articulated robots with 3 degrees of freedom (3-DOF) in terms of picking up individual parts under the proposed framework, with limited computing resources. To this end, Raspberry Pi 4 is utilized as a controller of the 3-DOF articulated robots and of the demonstration equipment of the devised FL algorithm, which recognizes and analyzes the manufacturing environment and part picking conditions in real time. As mentioned above, articulated robot control technology is very important for the efficient operation of smart factories, and the proposed FL framework is important in that it shortens the learning time of robots under limited resources, ultimately improving manufacturing process efficiency. Rauch et al. [14] pointed out that small and medium-sized enterprises (SMEs) that have difficulty hiring highly qualified employees due to limited resources should prioritize the use of low-investment robots to introduce advanced technologies. Therefore, the proposed framework will contribute to the expansion of FL use for articulated robot control under limited resources, especially for SMEs. In the experiment, FL performance with four different articulated robots is evaluated in terms of training completion time, resource share ratio, network traffic, and the completion time of a picking task. The results show that when three articulated robots are used, the proposed FL-based framework can achieve object recognition with an accuracy of approximately 80 %, with a minimum number of learning rounds of 76 and with a network traffic intensity of 2303.5 MB. Therefore, this study will contribute to the expansion of FL use for articulated robot control in SME manufacturing environments with limited resources.

The remaining sections are organized as follows: Section 2 reviews recent methodologies related to FL, the introduction of articulated robots in smart factories, and techniques for robot control such as forward kinematics, inverse kinematics, and Pulse Width Modulation (PWM). Then, the FL-based articulated robot control framework is proposed. Section 3 reports on the performance of the articulated robot control module and the FL module in the framework. Section 4 discusses the potential challenges that may be faced when applying the proposed framework to actual manufacturing sites and suggests solutions to address them. Finally, Section 5 presents the conclusion and suggests future work associated with this study.

2. Materials and Methods

2.1. Federated Learning

FL refers to a machine learning technique that enables the cooperative training of a global machine learning model with local data via individual clients (e.g., mobile or IoT devices) [15]. FL can be classified into three different categories [16,17,18]: (1) horizontally federated learning (HFL), consisting of clients sharing a homogeneous feature space with different sample spaces (e.g., patient information in different hospitals); (2) vertically federated learning (VFL), consisting of clients sharing heterogeneous feature spaces with a sample space (e.g., heterogeneous customer in a shopping mall); and (3) federated transfer learning (FTL), consisting of clients with heterogeneous feature spaces with different sample spaces (e.g., wearable healthcare using both personalized local models and a global model). In this way, FL can be classified in terms of the characteristics of feature space and sample space utilization. In this study, HFL is adopted to create individual models based on the same feature space from different sample spaces composed of multiple articulated robots. The individual models are eventually utilized for the generation of a global model (see Section 2.3 for more detail).

In general, to train the global model, traditional distributed and parallel machine learning techniques, such as Parallel SGD [19], ADMM [20], and Downpour SGD [21], collect the local data of each client to a central server (or a global server) and train the model at once. However, in the training process of FL, a local model of each client is solely trained using its data, without sharing them with other clients, and the trained model’s parameters (e.g., the weights, biases, and gradients of a model) are iteratively aggregated into the global model to enhance the overall performance of global and local models [22].

FL, which shares characteristics with distributed systems, utilizes distributed data to enhance the performance of a model on a task. Therefore, early FL-based approaches can be regarded as extensions of distributed computation, while recent advances in privacy preservation within distributed systems have further reinforced this connection. The CodedPrivateML protocol [23] consists of quantization, encoding, polynomial approximation and gradient computation, and decoding the gradient and model update process, which preserves the privacy of the data and model information while enabling an efficient parallelized training process for distributed workers. However, distributed processing connects multiple client computers in distributed locations to a centralized control server via a communication protocol to complete a task, while each computer completes a different task [24]. As a result, while distributed processing focuses on accelerating processing, FL aims to improve privacy and collaborative machine learning capabilities, even with non-independent and identically distributed (non-IID) data.

In FL, a global model is trained according to two approaches of the Federated Stochastic Gradient Descent (FedSGD) and Federated Averaging (FedAVG) algorithms [25]. Both algorithms use three hyperparameters: (1) the fraction of clients (C) participating in each round (i.e., one client update model parameter to the server) among all clients

(0 \leq C \leq 1)

; (2) the mini-batch size of local data (B) used for training in each epoch; and (3) the number of epochs (E) to train locally for each round. The FedSGD algorithm is built upon the stochastic gradient descent (SGD) algorithm, which is an effective optimization technique that iteratively minimizes the cost or loss function using a minibatch of data. To be more specific, each client trains its local model using the gradient descent method from its full local dataset. Once each client completes the training process, the central server collects the parameter of the trained client model, updates a global model with the collected model parameters, and deploys the updated global model to the clients. This iterative process is then repeated until global and local models are converged. However, considering the communication cost of updating the global model every round from selected local clients, FedAVG is a more efficient algorithm than FedSGD, which performs only one local update for each client. In FedAVG, the local training process for the local model of each client is conducted for epochs (

E

) on the B size mini-batched local dataset, and the global model is then updated [25]. In the process of training a model through the FL methodology, the efficiency of the model aggregation is expressed by a quantified metric, namely the convergence rate, which is the number of communication rounds (number of global updates) required to reach the global optimum of the global objective function

F

. According to [26], the convergence rate of the local SGD, i.e., FedSGD in the previously described FL algorithm, has the following impact on hyperparameters

K

and

B

in the case of independent and identically distributed (IID) data:

B

has an

𝒪 (\sqrt{T})

relationship with the convergence rate, while

K

is positively correlated with the convergence rate.

T

, as mentioned in the definition of convergence rate with big-O notation, represents the total number of rounds required for the global model to converge (i.e., the total number of applications of an algorithm). However, in heterogeneous data (i.e., in the case of non-IID data), the convergence rate becomes

𝒪 (1 / T)

[15].

Due to its advantages, FL has been widely incorporated in various industries and research fields. Zhao et al. [27] proposed a privacy-preserving block chain-based FL framework for IoT devices that can predict customers’ requirements and consumption patterns. Moreover, the FL model has been used in conjunction with smart devices to predict human trajectory and behavior. Feng et al. [28] presented a Privacy-Preserving Mobility (PMF) Framework that can predict human mobility patterns while maintaining a balance between predictive accuracy and privacy protection. Lee et al. [29] presented a machine learning framework for the preservation of patient privacy and medical information via the FL environment, while Szegedi et al. [30] proposed an FL method to improve classification accuracy based on Electroencephalogram (EEG) data collected from different medical institutes. Considering previous studies, FL is a machine learning methodology applicable to various industries, and given its characteristic of operating without sharing internal client information with a central server, it is also expected to be applicable to the manufacturing industry. However, existing studies have primarily focused on data analysis and prediction itself, without considering the actual computational devices and hardware specifications of the applied targets. Additionally, despite the recent research on and applications of FL, there are still issues that need to be addressed in the field. Firstly, due to the characteristics of FL, which creates a global model through communication with each device, there are limitations in the field of ensuring communication in terms of latency, bandwidth usage, and the synchronization of updates from multiple devices. Moreover, it is necessary to improve the methodology to efficiently conduct the FL process based on a networked system with heterogeneous data, hardware, and software specifications [15].

2.2. Industrial Articulated Robot Control in Smart Factories

The autonomous manufacturing process of smart factories, which offers higher efficiency, flexibility, and customization than traditional manufacturing processes, is made possible by the integration of cutting-edge technologies with advanced manufacturing facilities [31]. To this end, various countries, including the U.S., Germany and other countries in Europe, India, and China, have been digitized and have modernized their manufacturing facilities to enhance their competitiveness [32].

As mentioned in the Introduction, robotics is one of the most crucial technologies in the creation of a smart factory, and it is continuously evolving to meet the growing demand for flexible automation systems in modern manufacturing industries [33]. Autonomous robots have been devised by the integration of traditional robots with advanced technologies, such as IoT and AI, so that they can connect to other robots and share their distributed resources, context information, and environmental data with other pieces of equipment in the smart factory environment. This results in the generation of a connected ecosystem that involves heterogeneous technologies in flexible automation manufacturing systems, which is particularly beneficial to improve manufacturing performance [34].

However, it can be challenging to demonstrate advanced technologies in industrial articulated robot control, because technologies such as IoT and AI should be tightly coupled with an existing robot control mechanism. In general, an articulated robot (or a robot manipulator) consists of six main components: (1) a base connecting the robot to the ground or a fixed surface; (2) a link, which is a rigid part of the robot; (3) a joint, which causes relative motion between the connected links; (4) a sensor, sending responses from internal or external sources to a controller; (5) an end-effector, interacting with an object to demonstrate a task; and (6) a controller, sending operational commands to the joints and the end-effector based on data observed by the sensors to accomplish a task [6]. It is important to conduct precise joint control to perform manufacturing operations, and the logic of sending an appropriate command to actuators involving multiple joints and an end-effector has been widely studied and is well established [35]. In general, an articulated robot is controlled regarding its kinematic and dynamic features because they are used to address the relationship between the task space of the end-effector and the configuration space of the joints (i.e., their angles and positions). Note that kinematic features are associated with physics to express the configuration of the end-effector, as well as of the joints, and that dynamic features are used to predict robot motion considering input torque and mechanical parameters [36].

Kinematics mainly includes forward kinematics, velocity kinematics, statics, and inverse kinematics [6]. Forward kinematics is a method that derives the position of the end-effector from the joint parameters using kinematic equations. Contrary to forward kinematics, to derive each joint’s angle from the end-effector configuration (angles and positions), the inverse kinematics method is used, and the derived joint angle is called the inverse kinematics solution. While forward kinematics, which uses the angle of the joints to derive the configuration of the end-effector [37], has only one solution, inverse kinematics has the characteristic that it cannot derive a closed solution unless it is used to describe a simple type of robot. Therefore, many studies have been conducted to solve this problem, including the Jacobian inverse method, heuristic methods, e.g., Cyclic Coordinate Descent (CCD) [38], and Forward and Backward Reaching Inverse Kinematics (FABRIK) [39]. However, due to its inherent complexity, the solution of inverse kinematics is computationally demanding. Therefore, the optimization of the computational process or algorithm is necessary when solving problems on lower-specification computing devices.

2.3. Federated Learning for Articulated Robot Control

This study introduces a framework to control multiple articulated robots using FL to improve the performance of picking tasks assigned to each robot under limited computing resources. Figure 1 shows the structure of the proposed framework, which considers two major functions: (a) the picking process of an articulated robot manipulator via an articulated robot control module and an FL module and (b) the FL process via communication between the central server and the clients, represented by articulated robots

j

and

k

in round

i

.

The articulated robot manipulator in Figure 1a is composed of a camera sensor, which recognizes an object waiting for a picking task, and actuators (i.e., T joint (Joint_t), R joints (Joint_r1 and Joint_r2), and Gripper), which perform the picking motion. The articulated robot control module derives a motion plan for picking an object via the aggregated global model, which consists of an image preprocessor to process the raw image acquired through a camera sensor, a kinematics solver to derive the parameters (i.e., angles) of each actuator for the picking task, and a PWM controller to set the duty cycle based on the derived parameters. The FL modules represented in Figure 1a,b are placed in a central server and clients to infer, train, and aggregate local models to deploy a global model. In addition, the components of the FL module involve a parameter database to store the parameters of the local models. The communicator transmits the parameters of the global model as a result of the aggregation. The client consists of an inference node to conduct the inference process, a database with local training data to store the data of each client, a training node to train the local model, and a communicator to send the parameters of the trained local model. The following sections explain how both functions operate in the proposed framework in detail.

2.3.1. FL Module

As mentioned in Section 2.1, the FL module trains local models, makes inferences to plan picking tasks, and updates the global model to enhance the performance of overall clients in different locations with non-IID data. To this end, there are three modes in the FL module: (1) the train/aggregate mode, which trains the local model to update the global model with local data and aggregates the global model with local model parameters; (2) the deploy mode, which initializes the global model at round 0 and deploys the aggregated global model to overall clients; (3) and the inference mode, which inferes a task object with the deployed local model to plan the picking task of the articulated robot.

The proposed FL module is designed to conduct cost-efficient training and inference operations under low-end computing resources without GPU. Object detection, which involves the identification of the location and class of objects within an image, has gained significant attention from both academics and industry due to its wide range of applications. In general, a Convolutional Neural Network (CNN)-based model, such as Region-based CNN (RCNN) [40], You-Only-Look-Once (YOLO) [41], Single Shot Detector (SSD) [42], and Residual Neural Network (ResNet), is used for object detection [43]. The proposed model adopts the network architecture shown in Figure 2 for cost-effective object recognition.

The Convolution–Batch Normalization–ReLu–6 (ConvBNR) layer is composed of a convolution layer, a batch normalization layer, and an activation function of ReLu–6, which contributes to reducing the complexity of multiple layers of the existing CNN-based algorithms into one layer. In particular, the ReLu–6 activation function [44] is selected due to its practicality in mobile and embedded applications requiring reasonable precision in computationally limited environments. Algorithm 1 shows the process for training and updating a global model in the train/aggregate and deploy operating modes.

Algorithm 1 Pseudo Code of Train/Aggregate and Deploy modes that occur in the FL Module of the Articulated Robot and Central Server

1 Initialize initial global model parameter

ω_{0}

2 Deploy model parameter

ω_{0}

across overall clients
3 while global model performance does not achieve the set performance
4 for round

t

5 Sample clients set

S_{t}

6 for client

k

in set

S_{t}

7 Train local model of client

k

8 Send updated model parameter

ω_{t - 1}^{k}

to server
9 Aggregate process with received model parameter set

\sum_{k \in S_{t}} ω_{t - 1}^{k}

10 Deploy global model parameter

ω_{t}

11 end for
12 end for

As depicted in Algorithm 1, each process occurs in the FL module of the clients and the central server. First, to enhance the training model’s performance, weight initialization is conducted at round 0

(t = 0)

, which is the beginning of the FL process. The ConvBNR layer uses ReLu–6 as an activation function, and He Normal Initialization, or Kaiming Initialization [43], is applied for the weight initialization of the model. Next, the global model parameter is deployed to overall clients via Secure File Transfer Protocol (SFTP) before round 1

(t = 1)

starts. The set of clients participating in the local training process,

S_{t}

, is randomly sampled from all clients, with its own size determined by the client participation rate per round,

C, (0 \leq C \leq 1)

. To update the weights of the deployed model with each client’s local dataset, the train process proceeds in parallel based on the hyperparameters

E

(local epoch) and

B

(mini-batch size). Finally, each participating client sends the updated model parameters back to the server at the end of the local training process; when all clients in the sampled client set have completed the training process, the parameters of the collected clients are aggregated and used to update the weights of the global model. This process is then repeated until the performance of the global model is converged.

2.3.2. Articulated Robot Control Module

The proposed articulated robot control module uses the global model generated by the proposed FL module addressed in Section 2.3.1. It recognizes a picking object and its workspace position from the images taken by the camera sensor, and it derives the parameters of each joint of the articulated robot accordingly. Finally, the movements of the end-effector (a gripper) and joints (two R joints and one T joint) are controlled by PWM. The 3−DOF robot arm is an articulated robot that performs a picking task using two R joints (Joint_r1 and Joint_r2), one T joint (Joint_t), and an end-effector (G; Gripper). In Figure 3, the base is used to connect the robot to the ground;

l_{1}

,

l_{2}

, and

l_{3}

are the links of the robot;

θ_{1}

,

θ_{2}

, and

θ_{3}

represent the actuation angles of Joint_t, Joint_r1, and Joint_r2, respectively.

The major role of the articulated robot control module is to capture an image of the picking object and recognize its location via the trained client local model so that the joint movement plan can be computed with the inverse kinematics solution, as shown in Algorithm 2.

Algorithm 2 Pseudo code of the articulated robot control module of a client

1 Capture

I_{o}

of a picking object using a camera sensor
2 Initialize coordinates of task plan

X

and

Y

3 Initialize actuator angles

θ_{1}

,

θ_{2}

, and

θ_{3}

4 Perform image preprocessing process
5 Return the preprocessed image

I_{A T}

6 Update

X

and

Y

from inference node
7 Update

θ_{1}

,

θ_{2}

, and

θ_{3}

from the kinematics solver
8 Set duty cycle to each actuator with the derived

θ_{1}

,

θ_{2}

, and

θ_{3}

Figure 4 shows sample images taken by a camera sensor attached to an articulated robot. After a picking object (i.e., a cuboidal part) is placed on a panel shown in Figure 4a, an adaptive threshold method based on the Gaussian filter is applied to the original image to improve the performance of the training and inference process of the trained client local model under limited power conditions and lighting effects (see Figure 4b).

In Algorithm 3, let the central pixel coordinates of sub-area

s

(

b \times b

resolution) be represented as (

x, y

). If the resolution of the sub-area is

3 \times 3

, sub-area s can be illustrated as Figure 5.

Algorithm 3 Pseudo code of the adaptive threshold method with Gaussian filter

1 Set

C

which is constant for adjusting threshold
2 Set

b \times b

which is resolution (number of pixels) of sub-area

s

3 Initialize

x

and

y

which are central coordinates of sub-area

s

4 For

s

in

I_{o}

5 Compute

μ_{x, y}

which is gaussian weighted average of pixels

p \in s

6 Compute threshold

θ_{x, y}

with

μ_{x, y}

and

C

of

s

7 Binarize pixels

p \in s

8 End for
9 Return processed image

I_{A T}

The weighted average (

μ_{x, y}

) of sub-area

s

is derived using the two-dimensional Gaussian filter

G

(

x, y

) described in Equation (1). With the derived Gaussian weighted average of sub-area

s

, threshold

θ_{x, y}

of sub-area

s

can be expressed as Equation (2) with integral image

I (x, y)

. For example,

p_{x, y} = p_{2,2} = I (2,2) - I (2,1) - I (1,2) + I (1,1)

.

G (x, y) = \frac{1}{\sqrt{2 π σ}} \exp (- \frac{(x^{2} + y^{2})}{2 σ^{2}})

(1)

θ_{x, y} = \frac{1}{b^{2}} \sum_{x_{i}} \sum_{y_{i}} I (x + x_{i}, y + y_{i}) - C

(2)

The preprocessed image goes to the inference node in the aggregated global model. The object detection process feeds the image to a neural network for feature extraction, and the extracted features (e.g., edges, corners or contours, and pixels) are then used to predict the location and class of an object in the image using anchor boxes. Anchor boxes, also known as prior boxes, are pre-defined bounding boxes with specific shapes and sizes that are placed at different locations across the image. These anchor boxes serve as references for the neural network to predict where the object is located and what its size might be. This predicted information, which includes the adjusted anchor box parameters (e.g., width, height, and position) and associated class probabilities, is obtained as an output of the trained client local model. However, the output can sometimes result in multiple overlapping bounding box predictions for the same picking object. To overcome this problem, a non-maximum suppression (NMS) process is conducted. Non-maximum suppression is a post-processing step that removes redundant bounding box predictions. From all of the predicted bounding boxes, it iteratively selects bounding boxes with high confidence scores until there is no more overlapping between the selected boxes. As a result, the final output of the object detection process consists of the selected bounding box and the class of the picking object.

Once the inference node of the FL module identifies the location of an object, a movement plan is generated for the joint angles to set the configuration of the end-effector. Therefore, in the kinematics solver processor, the angle of each joint is derived to conduct the picking task via the end-effector from its current location to the location of the object. To this end, the articulated robot anatomy condition is considered to be a closed form by the geometric analytic inverse kinematics method (see Figure 6).

In Figure 6,

θ_{1}

, which is

θ_{2}

, and

θ_{3}

are the three joint angles; the end-effector position is represented as

P_{e e} = (P_{X}, P_{Y}, P_{Z})

;

l_{1}

,

l_{2}

, and

l_{3}

are the lengths of the three links. Radius

r

of the circle that contains coordinates

(P_{X}, P_{Y})

implies

\sqrt{P_{X}^{2} + P_{Y}^{2}}

.

θ_{1}

, which is the angle for joint 3, can be derived from Equation (3). The length of

D

, which is the confronting side of angle

π - θ_{3}

in the triangle that is created from

l_{2}

and

l_{3}

, can be described as Equation (4). According to this triangle,

θ_{3}

can be derived as Equation (5). Finally, the last angle,

θ_{2}

, is derived from the relationship between

θ_{3}

and

\emptyset

, the angle contained between the two sides

w_{1}

and

D

, as described in Equation (6). The inverse kinematics formulation establishes the relationship between the angles and the link lengths:

θ_{1}

depends on the projection of the end-effector onto the XY plane; distance

D

is calculated using the Pythagorean theorem in 3D space; angle

θ_{3}

follows from the cosine rule applied to the triangle formed by

l_{2}

,

l_{3}

, and

D

; and

θ_{2}

is determined by considering both

θ_{3}

and

\emptyset

, which account for the vertical and radial positioning of the end-effector.

θ_{1} = {t a n}^{- 1} \frac{P_{X}}{P_{Y}}

(3)

D = \sqrt{r^{2} + {(l_{1} - P_{z})}^{2}}

(4)

θ_{3} = {c o s}^{- 1} \frac{l_{2}^{2} - l_{3}^{2} - D^{2}}{2 l_{2} l_{3}}

(5)

θ_{2} = {t a n}^{- 1} \frac{l_{1} - P_{Z}}{r} - {t a n}^{- 1} \frac{l_{3} s i n θ_{3}}{l_{2} + l_{3} c o s θ_{3}}

(6)

After the kinematics solver processor computes each joint angle for the operating task, the PWM controller processor sets the duty cycle of each joint actuator to operate the picking task (i.e., a pick-and-place task). Note that PWM is an analog modulating method using the width, duration, and time of the pulse [6] to control the joint angle to a specific target angle. Due to its advantages (i.e., the convenience of implementation and control, high compatibility with a recent and past digital microprocessor, low power consumption, and robustness under stable temperature variation, machine aging, and degradation), it is widely used to control the motors controlling the joint angles of the articulated robot [45].

3. Results

This section discusses a quantitative analysis of the computational and a power efficiency aspects of the robot control process and the FL process. The analysis is conducted on low-end computing hardware through the framework proposed in this study for learning. It also compares the proposed object detection model with standard CNN models. Section 3.1 introduces the experimental conditions and quantitative indicators. These include a standard CNN model and the object detection model used in this study, the hardware resources of the robot manipulator, the working environment of the robot manipulator, and the hyperparameters of the FL process.

3.1. Scenario

The object detection model trained by the FL module in Section 2.3.1 is compared to a standard CNN model in a comparative experiment to analyze whether it can be trained effectively on low-end hardware equipment. The standard CNN models to be compared are Faster R-CNN [46], SSD [42], and YOLO-LITE [47]. The experiments were conducted on a GPU-equipped hardware setup (Intel Xeon^® Silver 4210 Processor (Intel, Santa Clara, CA, USA), NVIDIA RTX Quadro 4000 (NVIDIA, Santa Clara, CA, USA)) with the objective of comparing not only the proposed object detection model but also popular existing CNN models. This comparison was conducted with a focus on their operation on low-end hardware devices. The metrics selected to compare and analyze each model with the proposed object detection model in the experiments include FLOPs (Floating Point OPerations), number of parameters (#param.), epochs to achieve 80% accuracy, and average GPU usage percentages (%). FLOPs is a metric for measuring the computational requirement of a deep learning model. Therefore, FLOPs and GPU usage generally exhibit a proportional relationship. Therefore, they are highly relevant not only for the speed of the training process but also for running deep learning smoothly on low-end devices. However, in specific cases, such as lightweight models with low parallel processing efficiency, high GPU utilization may occur despite low FLOPs. Next, the proposed framework mentioned in Section 2.3 is evaluated while it is performing picking tasks under limited hardware resources. Figure 7 and Figure 8 show the performance evaluation environment involving four articulated robots.

In Figure 7, each articulated robot picks a small cube (20 mm × 20 mm × 20 mm) located at a random place on the grid (i.e., the task place) and moves it to another location with a distance in the range (1 to 8.5) cm. At this time, to demonstrate the performance of FL, one of the robots is also randomly selected to complete the pick-and-place task. For example, if robot A performs the pick-and-place task and collects data for the FL of a global model, any robot, including robot A, is able to perform the same task, because the global model can be utilized to train other local models of robots. By designing an experiment that applies the FL methodology to various numbers of articulated robots, each with its own local data collection target (i.e., picking object), the applicability of FL for adapting flexible automation systems can be demonstrated. In addition, to avoid the issue of overheating of the robots, their operation speed is set to 1 degree per 0.1 s. Also, the number of replications is set to 30 for the collection of statistically meaningful experiment results. Figure 8 describes the specification of the experiment environment.

In Figure 8, Robots_i

(i = 1, 2, 3, 4)

are installed on a table of 650 mm width and 600 mm height. The base of each robot is installed at a distance of (50 and 77.5) mm from the horizontal and vertical corners, respectively. The grid (i.e., Task Place_i

(i = 1, 2, 3, 4)

) where the picking object of the articulated robot is located has the shape of a square with width = height = 105 mm, while the distance between the center point of each grid and the center point of the base is 92 mm.

Under the given picking task, the performance of the articulated robot control module and the FL module is measured. Table 1 presents the quantitative performance metrics. The FL module is evaluated in terms of training completion time (

T_{f}

), average hardware resource share percentage of clients (

R_{f}

), and total network traffic (

N_{f}

). The articulated robot control module is evaluated in terms of the average picking task completion time for the ith articulated robot over 30 iterations (

T_{a, i}

) and the average hardware resource share percentage of the ith articulated robot control module performing the picking task over 30 iterations (

R_{a, i}

).

In each task, each articulated robot performs the pick-and-place motion to move a 2 cm × 2 cm × 2 cm cubic part of the class specified for each articulated robot a minimum of 1 cm and a maximum of about 8.5 cm. Also, to avoid overheating the actuators in each motion, i.e., positioning the actuators to satisfy each of the derived joint parameters, a time step of 0.1 s/° was used when setting the joint parameters.

The experiments were conducted on four clients (i.e., articulated robots) and one central server (Intel Xeon^® Silver 4210 Processor, NVIDIA RTX Quadro 4000, and Ubuntu 20.04). Table 2 details the hardware specification (i.e., a system on chip (SoC), CPU, GPU, memory, network, and power) of clients related to the joint learning process and the joint parameter derivation task of the articulated robot, while Table 3 details the specification of the articulated robots (i.e., small torque, operating speed, dead bandwidth and frequency of the actuators, and supplied power) that can be highly correlated with the performance of pick-and-place tasks.

3.2. Object Detection Model Performance

First, the performance of the object detection model, which plays an important role in the picking task using the proposed federated learning-based framework, was evaluated. Notice that the proposed object detection model was successfully implemented on low-specification hardware equipment. The experiments were conducted on the GPU-equipped machines described in Section 3.1. This allowed for the testing of not only YOLO-LITE and our proposed model, which were developed on low-end machines, but also standard CNN models such as Faster R-CNN and SDD. In this instance, Faster R-CNN sets ResNet-50 as a backbone, with a 1x learning schedule, and SSD uses VGG16 [42,46,47] as a backbone. The experimental results, based on an 80% accuracy rate with a batch size of 64, a learning rate of 0.001, and a weight decay of 0.1, are presented in Table 4.

According to the experimental results in Table 4, the proposed object detection model requires fewer epochs to train than that of YOLO-LITE, which focuses solely on lightweight models. This is due to the presence of the batch normalization layer, which contributes to a reduction in the number of required epochs. Furthermore, Faster R-CNN and SSD, which require fewer epochs, have 33.2 billion and 34.9 billion FLOPs and 42.5 million and 35.6 million parameters, respectively, while the proposed model requires only 1.8 billion FLOPs and 3.5 million parameters. Therefore, the total GPU usage of the proposed model is 18.2%, indicating that the proposed model can be applied even to low-end computing devices. To make it possible to train an object detection model by performing FL on low-end hardware devices, the proposed object detection model shows viable performance in terms of FLOPs, the number of parameters, and GPU usage, making it suitable for use on low-end hardware devices.

3.3. Robot Control Performance

To evaluate the efficiency of the proposed robot control module, which derives joint parameters for picking tasks, the experiments were conducted over 30 iterations under the given condition (see Section 3.1). Figure 9 and Figure 10 present the results of the average picking task completion time (

T_{a, i}

) and average hardware resource share percentage (

R_{a, i}

) for the four articulated robots over 30 replications.

In Figure 9, although the case with two articulated robots showed the best average picking task completion time of 79.87 s, there are no statistical differences between the four cases at a significance level of 0.05. Theoretically, all the cases are supposed to provide a similar picking task completion time because each robot performs its own picking task independently. In other words, each articulated robot only utilizes its trained client model to perform the inference process on to picking object and conducts the picking task. Therefore, during the picking task, there is no communication between the server and clients. The result actually implies that the variation in picking task completion time between the four cases is caused by factors inherent to the articulated robots themselves. That is why, at a significance level of 0.05, there are no statistical differences between the four cases.

In Figure 10, the case with two articulated robots shows the lowest average hardware resource share percentage, at 20.25%. However, similarly to the results for the average picking task completion time shown in Figure 9, the variation in average hardware resource share percentage between the four cases is caused by factors inherent to the articulated robots themselves, so at a significance level of 0.05, there are no statistical differences between the four cases. From the results in Figure 9 and Figure 10, we can conclude that the proposed framework is appropriately designed to perform identical picking tasks via multiple articulated robots.

3.4. Federated Learning Performance

In the FL module, experiments were conducted to compare the learning performance of relevant machine learning models based on the relevant FL hyperparameters, i.e.,

C

,

B

,

E

, rounds, and other performance metrics, i.e.,

T_{f}

,

R_{f}

, and

N_{f}

, as described in Table 5. In particular, the number of communication rounds was set to 1000 by default, and the modeling completion time was measured when the model reached an object recognition accuracy of 80% in terms of communication rounds.

In Table 5, since FedSGD trains all their local models using the gradient descent method from their full local dataset at once, there is only one case in terms of hyperparameters of

C = 1

,

B = \infty

,

E = 1

. Although it takes a training completion time (

T_{f}

) of 82,980 s, it cannot achieve the targeted object recognition accuracy (i.e., 80%), so the number of communication rounds becomes NA (not available), instead of 1 round. Due to the heavy traffic intensity for the model training, FedSGD has a relatively higher average hardware resource share percentage of clients (

R_{f}

) of 45.8% and a total network traffic (

N_{f}

) of 14,125.62 Mb compared to some of the FedAVG cases. Unlike FedSGD, FedAVG shows different performance in terms of the number of rounds,

T_{f}

,

R_{f}

, and

N_{f}

under the given federated learning hyperparameters, such as

C

(0.25, 0.5, 0.75, 1.0),

B

(5, 10), and

E

(5, 10). When

C = 0.25

, there is only one case that can complete model training with 823 rounds. On the other hand, the number of model training completion cases of

C = (0.5

,

0.75

, and

1.0

) is (3, 4, and 4), respectively. On average, the cases with

C = 1.0

show the best performance, with an average of 165 rounds, which is better than cases with

C = 0.75

, which have an average of 216 rounds. Since

C

represents the number of clients participating in the model training per round, as the value of

C

increases, the number of rounds decreases. As the values of the rounds and

E

increase, the values of training completion time (

T_{f}

) increase. This trend is obvious, considering that epoch (

E

) refers to one complete pass of the entire training dataset utilized in the model training, while the number of rounds (rounds) refers to the number of iterations of the model training. Most of the average hardware resource share percentages of clients (

R_{f}

) in FedAVG cases are lower than those of FedSGD (i.e., 45.8%). This is because FedAVG requires randomly selected clients to participate in the model training in each round, but FedSGD requires all clients to participate in the model training in each round. As the values of the rounds and the fraction of the clients (

C

) participating in each round increase, the total network traffic (

N_{f}

) in FL increases.

Figure 11 illustrates the number of communication rounds (rounds) of FL required for the global model to achieve 80% object recognition accuracy within 1000 rounds of model training. In particular, the train loss (L) of client models is measured according to Equation (7).

\begin{matrix} L = λ_{c o o r d} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 𝟙_{i j}^{o b j} [{(b_{x_{i}} - b_{{\hat{x}}_{i}})}^{2} + {(b_{y_{i}} - b_{{\hat{y}}_{i}})}^{2}] \\ + λ_{c o o r d} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 𝟙_{i j}^{o b j} [{(\sqrt{b_{w_{i}}} - \sqrt{b_{{\hat{w}}_{i}}})}^{2} + {(\sqrt{b_{h_{i}}} - \sqrt{b_{{\hat{h}}_{i}}})}^{2}] \\ + \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 𝟙_{i j}^{o b j} {(C_{i} - {\hat{C}}_{𝚤})}^{2} + λ_{n o o b j} \sum_{i = 0}^{S^{2}} \sum_{j = 0}^{B} 𝟙_{i j}^{o b j} {(C_{i} - {\hat{C}}_{𝚤})}^{2} \\ + \sum_{i = 0}^{S^{2}} 𝟙_{i}^{o b j} \sum_{c \in c l a s s e s} {(p_{i} (c) - {\hat{p}}_{𝚤} (c))}^{2} \end{matrix}

(7)

where

b_{x}

and

b_{y}

are the coordinates of the centroid of a ground-truth bounding box;

b_{\hat{x}}

and

b_{\hat{y}}

are the coordinates of the centroid of an estimated bounding box;

b_{w}

and

b_{h}

are the width and height of a ground-truth bounding box;

b_{\hat{w}}

and

b_{\hat{h}}

are the width and height of an estimated bounding box;

C_{i}

is the ground-truth confidence score at the ith grid;

\hat{C}

is the estimated confidence score at the ith grid;

λ_{c o o r d}

and

λ_{n o o b j}

are multipliers to increase or decrease the weight of the loss to resolve the class imbalance problem;

𝟙_{i j}^{o b j}

is a binary indicator for object recognition (0: not recognized; 1: recognized);

p_{i} (c)

is the ground-truth conditional class probability for class

c

in ith grid;

{\hat{p}}_{𝚤} (c)

is the predicted conditional class probability for class

c

in ith grid;

S^{2}

is the set of grids in the task place; and

B

is the set of anchor box. Equation (7) is the train loss function given by [41], and it consists of three parts: (1) the first and the second terms represent the localization loss between the detected bounding box and the ground truth; (2) the third and fourth terms represent the confidence loss between the estimated selection probability of the class (or an articulated robot) and the observed probability from the actual selection of that class; (3) the fifth term represents the classification loss between the detected class and the ground truth. There are four cases in each graph, with different

C

values. As mentioned in Table 4, there is only one case

(B = 0.5

and

E = 0.5)

that can complete model training with 823 rounds when

C = 0.25

; however, most of the other cases complete their modeling training within 1000 rounds. Due to the required object detection accuracy of 80% (i.e., the stopping condition of modeling training), the train loss in different cases reaches similar values of 11. However, each case has a different conversion pattern of training loss. In particular, when E = 5, cases with B = 5 require a lesser number of rounds to reach the stopping condition of the modeling training than other cases with

B = 10

, as shown in Figure 11a,c,d. On the other hand, when E = 10, all cases in Figure 11b–d only require less than 200 rounds. Under the given scenario, the best performance of 26 rounds is achieved when C = 0.75, B = 10, and E = 10 (see Figure 11c). This implies that the appropriate setting of hyperparameters can influence the model training performance of FL.

4. Discussion

This study presented an FL-based framework to achieve operational efficiency in SME manufacturing environments using industrial articulated robots through FL techniques in a low-end equipment environment. The devised framework conducts FL in a non-GPU environment through a reasoning process. However, since the proposed framework is an a priori study on the application of federated learning, which has not yet been widely used in industrial environments, there are some points to note when applying it to existing manufacturing sites. First, as proposed in this study, existing manufacturing equipment can be improved to perform FL. To do this, the location of the camera sensor (gripper or robot support, etc.) must be clearly defined, and the robot controller must be able to transmit the machining parameters (coordinate and angle information of the robot, etc.) as digital signals so that the gripper of the robot can move to the inferred location of a recognized object. In this study, the robot was assembled directly, a camera was installed, and the robot controller was developed so that the robot could be controlled as desired. Second, in the proposed framework, all data currently being worked on are digitized, and since individual robots automatically make decisions about the current work’s content, the security of the data within a smart factory is important. This part was not mentioned in this study, but if research on industrial Operation Technology (OT) security, which is becoming more important with the current development of smart factories, is conducted together, it is expected that potential information security issues can be resolved. However, since the proposed framework is designed for network use within the factory (i.e., intra-network use), this will not be a major problem if external access to the factory is properly supervised. The third point is the potential for problems with communication latency, which commonly occurs in the implementation of smart factories and digital twin technologies. This is a problem that can also occur in FL that performs individual client model learning and global model learning, but as described in Section 3.4, if the machine learning time is reduced by selecting appropriate parameters or if model learning and updates are performed during times when the factory has low workloads, the possibility of communication latency in the network can be reduced. In addition, if an independent network is implemented within the factory for FL only, without using the existing network, it will be possible to solve both security and communication latency issues. However, additional research seems necessary for economic feasibility analysis related to improved factory productivity resulting from the establishment of an FL environment.

In relation to the aforementioned issues, when applying the proposed framework to an actual manufacturing environment, problems that occur when implementing a smart factory using technologies such as digital twins and AI may occur. Specifically, there are problems such as communication problems between the pieces of data collection equipment (robots, machine vision cameras, etc.) and equipment connected to each piece of data collection equipment and responsible for model learning, unbalanced data problems (in a typical manufacturing environment, it is difficult to obtain data in a defective state), and security problems in OT networks. However, in the case of the FL-based robot control performance improvement framework proposed in this study, if existing problem-solving methods are applied, the problems that may occur when applying it to an actual manufacturing environment can be minimized. First, in terms of security issues in manufacturing sites, from the perspective of network infrastructure, the OT network to which industrial robots and PLCs are connected can be improved by constructing a closed network separated from the IT network, to which manufacturing execution systems (MESs) and enterprise resource planning (ERP) are connected [48], and from the perspective of hardware, it is possible to resolve the inherent security problems that may occur in a manufacturing environment (security vulnerabilities in the equipment and infrastructure themselves, excluding the malicious actions of workers within the process) by installing security equipment such as security switches and firewall equipment [49]. Next, the unbalanced data problem (difficulty in securing data for specific classes, such as bad classes) can be solved by utilizing the Data Augmentation technique [50] and the few-shot learning technique [51]. In addition, it is expected that communication problems between devices will continue to be solved in the future as the development of high-bandwidth communication becomes possible through the development of next-generation wireless and wired communication technologies such as P-5G (Private-5G) and 6G.

5. Conclusions

This study presented an FL-based framework to achieve operational efficiency in SME manufacturing environments using industrial articulated robots through FL techniques in a low-end equipment environment. The devised framework conducts FL in a non-GPU environment, and through a reasoning process, it simplifies the process of deriving the picking tasks and execution plans of articulated robots. In addition, the control algorithm of the 3−DOF articulated robots is devised and utilized to demonstrate how the proposed FL framework recognizes and analyzes the manufacturing environment and part-picking conditions in real time. Experiments were conducted to compare a standard CNN model with the object detection model designed and proposed in this work for low-specification computing devices. Additionally, experiments were performed on a framework that includes multiple robots (or clients) and a single server. Therefore, the proposed methodologies were analyzed and validated using quantified performance metrics. As a result, the proposed object detection model is suitable for application on low-end computing devices, and the framework demonstrates that the FL model can achieve 80% object recognition accuracy under limited resource conditions. Moreover, there are no statistical differences in terms of picking performance between articulated robots at a significance level of 0.05. Therefore, we can conclude that the proposed framework is appropriately designed to perform identical picking tasks via multiple articulated robots. Furthermore, this study is notable for proposing and conducting experiments on methodologies based on actual hardware to assess the viability of robots in flexible automation systems that manufacturing companies can implement to compete on the market. This approach contrasts with existing federated learning, which has primarily been utilized in the medical and telecommunications sectors due to concerns about data privacy. Furthermore, experiments conducted on low-specification hardware devices to verify the applicability of the technology for SMEs demonstrate the applicability of the methodology proposed in this study for various hardware specifications and manufacturing environments. As a result, this study is expected to advance the research on model architecture for non-GPU low-end specification hardware, which is one of the methodologies applied in the proposed framework, and the research on multiple complex structures of articulated robots, such as real manufacturing environments. In addition, in this study, security is considered by using SSH and SFTP for each task occurring in the framework, including model weight transfer and training commands from the central server. For future applications in real industrial environments, security can be further strengthened by implementing design measures such as closed-network configurations, physical/logical firewalls, VLAN configurations for multiple managed switches, and packet-based communication control to minimize potential security risks.

Notwithstanding the contributions mentioned above, this study was only limited to picking tasks using small objects performed by a certain articulated robot. To generalize the proposed FL framework, various articulated robot types with different tasks should be considered in the future. Moreover, it is expected that in future works, variations in machine learning tendency according to each hyperparameter obtained through experiments on the proposed FL module can be generalized by applying a variety of model structures, loss functions, and target tasks to the proposed FL framework. It is also necessary to conduct additional comparative studies on the performance of the proposed framework through improvements to various object recognition algorithms, which are rapidly developing. In addition, studies utilizing the Federated Group Knowledge Transfer (FedGKT) training algorithm have demonstrated that reducing the computational load on in-vehicle edge devices can achieve performance similar to centralized learning methods [52]. By referring to such studies, the applicability of the proposed FL framework to low-spec edge devices can be further enhanced. In addition, this study lacks an implementation and performance test of FL for large-scale and high-performance robots used in industrial sites. This is because the FL framework proposed in this study is designed to perform only picking tasks using low-spec computers such as Raspberry Pi, so it is expected to have computational difficulties in performing large-scale and complex tasks. Just as high-performance computers are utilized for controlling large-scale or high-performance industrial robots, the utilization of higher-performance computers should be considered in order to operate the proposed platform for complex tasks and multiple industrial robots, and further research on this is needed. In addition, scalability research related to our findings needs to be conducted. Despite these limitations, the proposed FL framework has demonstrated smooth operation even on low-spec devices, suggesting that this methodology could be extended to more complex tasks in real manufacturing environments and to robots with more complex structures.

Author Contributions

Conceptualization, J.S., I.-B.L. and S.K.; methodology, J.S. and I.-B.L.; software, J.S. and S.K.; validation, J.S. and S.K.; formal analysis, J.S. and I.-B.L.; investigation, J.S., I.-B.L. and S.K.; resources, I.-B.L. and S.K.; writing—original draft, J.S., I.-B.L. and S.K.; writing—review and editing, J.S. and S.K.; visualization, J.S. and S.K.; funding acquisition, S.K.; supervision, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (No. RS-2023-00239448).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that have been used in this study are confidential. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the support of the National Research Foundation of Korea (NRF) and the Korea International Cooperation Agency (KOICA). The views expressed in this paper are solely those of the authors and do not represent the opinions of the funding agency.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lasi, H.; Fettke, P.; Kemper, H.G.; Feld, T.; Hoffmann, M. Industry 4.0. Bus. Inf. Syst. Eng. 2014, 6, 239–242. [Google Scholar]
Prashar, G.; Vasudev, H.; Bhuddhi, D. Additive manufacturing: Expanding 3D printing horizon in industry 4.0. Int. J. Interact. Des. Manuf. 2023, 17, 2221–2235. [Google Scholar]
Al-Alimi, S.; Yusuf, N.K.; Ghaleb, A.M.; Lajis, M.A.; Shamsudin, S.; Zhou, W.; Altharan, Y.M.; Abdulwahab, H.S.; Saif, Y.; Didane, D.H.; et al. Recycling aluminium for sustainable development: A review of different processing technologies in green manufacturing. Results Eng. 2024, 23, 102566. [Google Scholar]
Krishnan, R. Challenges and benefits for small and medium enterprises in the transformation to smart manufacturing: A systematic literature review and framework. J. Manuf. Technol. Manag. 2024, 35, 918–938. [Google Scholar]
Mattila, J.; Ala-Laurinaho, R.; Autiosalo, J.; Salminen, P.; Tammi, K. Using digital twin documents to control a smart factory: Simulation approach with ROS, gazebo, and Twinbase. Machines 2022, 10, 225. [Google Scholar] [CrossRef]
Groover, M.P. Automation, Production Systems, and Computer-Integrated Manufacturing, 5th ed.; Pearson Education: London, UK, 2019. [Google Scholar]
Bahrin, M.A.K.; Othman, M.F.; Azli, N.H.N.; Talib, M.F. Industry 4.0: A review on industrial automation and robotic. J. Teknol. 2016, 78, 137–143. [Google Scholar]
ISO 8373:2021; Robotics-Vocabulary. International Organization for Standardization: Geneva, Switzerland, 2021. Available online: https://www.iso.org/standard/75539.html (accessed on 11 March 2025).
Sun, Y.; Bai, L.; Dong, D. Lighter and more efficient robotic joints in prostheses and exoskeletons: Design, actuation and control. Front. Robot. AI 2023, 10, 1063712. [Google Scholar]
Wada, K.; Okada, K.; Inaba, M. Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar]
Mohammed, M.Q.; Chung, K.L.; Chyi, C.S. Pick and place objects in a cluttered scene using deep reinforcement learning. Int. J. Mech. Eng. Mechatron. 2020, 20, 50–57. [Google Scholar]
Zeng, A.; Song, S.; Yu, K.T.; Donlon, E.; Hogan, F.R.; Bauza, M.; Ma, D.; Taylor, O.; Liu, M.; Romo, E.; et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. Int. J. Robot. Res. 2022, 41, 690–705. [Google Scholar]
Matenga, A.; Murena, E.; Kanyemba, G.; Mhlanga, S. A novel approach for developing a flexible automation system for rewinding an induction motor stator using robotic arm. Procedia Manuf. 2019, 33, 296–303. [Google Scholar]
Rauch, E.; Dallasega, P.; Unterhofer, M. Requirements and barriers for introducing smart manufacturing in small and medium-sized enterprises. IEEE Eng. Manag. Rev. 2019, 47, 87–94. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A survey on federated learning: Challenges and applications. Int. J. Mach. Learn. Cybern. 2023, 14, 513–535. [Google Scholar]
Qi, P.; Chiaro, D.; Guzzo, A.; Ianni, M.; Fortino, G.; Piccialli, F. Model aggregation techniques in federated learning: A comprehensive survey. Future Gener. Comput. Syst. 2024, 150, 272–293. [Google Scholar]
Jiang, X.; Zhang, J.; Zhang, L. Fedradar: Federated multi-task transfer learning for radar-based internet of medical things. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1459–1469. [Google Scholar]
Zinkevich, M.; Weimer, M.; Li, L.; Smola, A. Parallelized stochastic gradient descent. Adv. Neural Inf. Process. Syst. 2010, 23, 1–9. [Google Scholar]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M.; Ranzato, M.; Senior, A.; Tucker, P.; Yang, K.; et al. Large scale distributed deep networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1–9. [Google Scholar]
Kang, J.; Xiong, Z.; Niyato, D.; Zou, Y.; Zhang, Y.; Guizani, M. Reliable federated learning for mobile networks. IEEE Wirel. Commun. 2020, 27, 72–80. [Google Scholar]
So, J.; Güler, B.; Avestimehr, A.S. CodedPrivateML: A fast and privacy-preserving framework for distributed machine learning. IEEE J. Sel. Areas Inf. Theory 2021, 2, 441–451. [Google Scholar]
Van Steen, M.; Tanenbaum, A.S. A brief introduction to distributed systems. Computing 2016, 98, 967–1009. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
Kundroo, M.; Kim, T. Demystifying impact of key hyper-parameters in federated learning: A case study on CIFAR-10 and Fashion MNIST. IEEE Access 2024, 12, 120570–120583. [Google Scholar]
Zhao, Y.; Zhao, J.; Jiang, L.; Tan, R.; Niyato, D.; Li, Z.; Lyu, L.; Liu, Y. Privacy-preserving blockchain-based federated learning for IoT devices. IEEE Internet Things J. 2020, 8, 1817–1829. [Google Scholar]
Feng, J.; Rong, C.; Sun, F.; Guo, D.; Li, Y. PMF: A privacy-preserving human mobility prediction framework via federated learning. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–21. [Google Scholar]
Lee, J.; Sun, J.; Wang, F.; Wang, S.; Jun, C.H.; Jiang, X. Privacy-preserving patient similarity learning in a federated environment: Development and analysis. JMIR Med. Inform. 2018, 6, e7744. [Google Scholar]
Szegedi, G.; Kiss, P.; Horváth, T. Evolutionary Federated Learning on EEG-data. In Proceedings of the Information Technologies—Applications and Theory (ITAT), Donovaly, Slovakia, 20–24 September 2019. [Google Scholar]
Ryalat, M.; ElMoaqet, H.; AlFaouri, M. Design of a Smart Factory Based on Cyber-Physical Systems and Internet of Things towards Industry 4.0. Appl. Sci. 2023, 13, 2156. [Google Scholar] [CrossRef]
Steiber, A.; Alänge, S.; Ghosh, S.; Goncalves, D. Digital transformation of industrial firms: An innovation diffusion perspective. Eur. J. Innov. Manag. 2021, 24, 799–819. [Google Scholar]
Evjemo, L.D.; Gjerstad, T.; Grøtli, E.I.; Sziebig, G. Trends in smart manufacturing: Role of humans and industrial robots in smart factories. Curr. Robot. Rep. 2020, 1, 35–41. [Google Scholar]
Ray, P.P. Internet of robotic things: Concept, technologies, and challenges. IEEE Access 2016, 4, 9489–9500. [Google Scholar]
Mick, S.; Lapeyre, M.; Rouanet, P.; Halgand, C.; Benois-Pineau, J.; Paclet, F.; Cattaert, D.; Oudeyer, P.; de Rugy, A. Reachy, a 3D-printed human-like robotic arm as a testbed for human-robot control strategies. Front. Neurorobotics 2019, 13, 65. [Google Scholar]
Xie, F.; Chen, L.; Li, Z.; Tang, K. Path smoothing and feed rate planning for robotic curved layer additive manufacturing. Robot. Comput. Integr. Manuf. 2020, 65, 101967. [Google Scholar] [CrossRef]
Paul, R.P. Robot Manipulators: Mathematics, Programming, and Control: The Computer Control of Robot Manipulators; MIT Press: Cambridge, MA, USA, 1981. [Google Scholar]
Luenberger, D.G. Linear and Nonlinear Programming; Addison-Wesley: Reading, MA, USA, 1989. [Google Scholar]
Aristidou, A.; Lasenby, J. FABRIK: A fast, iterative solver for the Inverse Kinematics problem. Graph. Models 2011, 73, 243–260. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Krizhevsky, A.; Hinton, G. Convolutional Deep Belief Networks on Cifar-10. 2010. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bea5780d621e669e8069f05d0f2fc0db9df4b50f (accessed on 6 April 2025).
Yu, Z.; Mohammed, A.; Panahi, I. A review of three PWM techniques. In Proceedings of the American Control Conference, Albuquerque, NM, USA, 4–6 June 1997. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar] [CrossRef] [PubMed]
Huang, R.; Pedoeem, J.; Chen, C. YOLO-LITE: A real-time object detection algorithm optimized for non-GPU computers. In Proceedings of the 2018 IEEE International Conference on Big Data (big Data), Seattle, WA, USA, 10–13 December 2018. [Google Scholar]
Kampa, T.; Müller, C.K.; Großmann, D. Interlocking IT/OT security for edge cloud-enabled manufacturing. Ad Hoc Netw. 2024, 154, 103384. [Google Scholar]
Lin, C.C.; Tsai, C.T.; Liu, Y.L.; Chang, T.T.; Chang, Y.S. Security and privacy in 5g-iiot smart factories: Novel approaches, trends, and challenges. Mob. Netw. Appl. 2023, 28, 1043–1058. [Google Scholar] [CrossRef]
Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
Doshi, K.; Yilmaz, Y. Federated learning-based driver activity recognition for edge devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3338–3346. [Google Scholar]

Figure 1. Overview of the proposed FL framework for articulated robot control: (a) picking process; (b) FL process.

Figure 2. Model network architecture.

Figure 3. Coordinate system of the 3−DOF articulated robot (TRR-type);

l_{1}

,

l_{2}

, and

l_{3}

represent the lengths of each joint;

θ_{1}

,

θ_{2}

, and

θ_{3}

indicate the angle of each joint;

G

represents the coordinates of the end-effector.

Figure 3. Coordinate system of the 3−DOF articulated robot (TRR-type);

l_{1}

,

l_{2}

, and

l_{3}

represent the lengths of each joint;

θ_{1}

,

θ_{2}

, and

θ_{3}

indicate the angle of each joint;

G

represents the coordinates of the end-effector.

Figure 4. Sample images of a picking object: (a) task raw image

I_{o}

captured by a camera sensor; (b) image

I_{A T}

preprocessed via adaptive threshold method with Gaussian filter.

Figure 4. Sample images of a picking object: (a) task raw image

I_{o}

captured by a camera sensor; (b) image

I_{A T}

preprocessed via adaptive threshold method with Gaussian filter.

Figure 5. Example of sub-area

s

with

3 \times 3

resolution.

Figure 5. Example of sub-area

s

with

3 \times 3

resolution.

Figure 6. Articulated robot in spherical coordinates for inverse kinematics analysis.

Figure 7. Experiment environment of the proposed framework.

Figure 8. Specification of the experiment environment.

Figure 9. Average task completion time of articulated robots over 30 iterations.

Figure 10. Average hardware resource share percentage of articulated robots over 30 iterations.

Figure 11. Training effectiveness as a function of the percentage of participating clients per round (

C

= {0.25, 0.5, 0.75, 1.0}), local epochs (

E

= {5, 10}), and mini-batch size (

B

= {5, 10}): (a) training loss versus communication rounds with C = 0.25; (b) training loss versus communication rounds with

C

= 0.5; (c) training loss versus communication rounds with

C

= 0.75; (d) training loss versus communication rounds with

C

= 1.0.

Figure 11. Training effectiveness as a function of the percentage of participating clients per round (

C

= {0.25, 0.5, 0.75, 1.0}), local epochs (

E

= {5, 10}), and mini-batch size (

B

= {5, 10}): (a) training loss versus communication rounds with C = 0.25; (b) training loss versus communication rounds with

C

= 0.5; (c) training loss versus communication rounds with

C

= 0.75; (d) training loss versus communication rounds with

C

= 1.0.

Table 1. Performance metrics of the FL module and the articulated robot control module.

Metric	Description
$T_{f}$	Training completion time (s) for participating clients in FL module at each round.
$R_{f}$	Average hardware resource share percentage (%) of memory and CPU in FL module at each round.
$N_{f}$	Total network traffic (download and upload; Mb) per client in FL module at each round.
$T_{a, i}$	Average picking task completion time (s) for ith articulated robot over 30 iterations.
$R_{a, i}$	Average hardware resource share percentage (%) of memory and CPU of ith articulated robot control module performing picking task over 30 iterations.

Table 2. Hardware specification of the clients.

Category	Specification
SoC	Broadcom BCM2711 SoC
CPU	1.5 GHz ARM Cortex-A72 MP4
GPU	Broadcom VideoCore VI MP2 500 MHz
Memory	8 GB LPDDR4 with 2 GB swap memory
Network	802.11b/g/n/ac Dual-Band
Power	5 V, 3 A

Table 3. Hardware specification of the articulated robots.

Category		Specification
Actuator	Small torque	9.4 kg/cm (4.8 V)
	Operating speed	0.17 s per 60°
	Dead bandwidth	$5 μ s$
	Frequency	50 Hz
Power		5 V, 3 A

Table 4. Training process results of proposed model compared to standard CNN models.

Model	Epochs	FLOPs	#Param	GPU Usage (%)
Proposed model	36	1.8 B	3.5 M	18.2%
Faster R-CNN ¹	15	33.2 B	42.5 M	87.5%
SSD ²	11	34.9 B	35.6 M	78.3%
YOLO-LITE	47	1.6 B	2.2 M	14.7%

¹ Faster R-CNN with ResNet-50 backbone and 1x learning schedule; ² SSD with VGG16 backbone.

Table 5. Federated learning results (Rounds,

T_{f}

(s),

R_{f}

(%), and

N_{f}

(Mb)) in different hyperparameter sets (C, B, and E).

Table 5. Federated learning results (Rounds,

T_{f}

(s),

R_{f}

(%), and

N_{f}

(Mb)) in different hyperparameter sets (C, B, and E).

Type	C	B	E	Rounds ¹	$T_{f}$ (s)	$R_{f}$ (%)	$N_{f}$ (Mb)
FedSGD	1.0	∞	1	NA	82,980	45.8	14,125.62
FedAVG	0.25	5	5	823	199,140	32.5	14,625.37
		5	10	NA	493,140	33.7	14,255.37
		10	5	NA	265,560	35.7	14,154.28
		10	10	NA	503,940	38.6	15,053.75
	0.5	5	5	NA	323,940	31.7	29,046.56
		5	10	106	581,220	33.1	3000.76
		10	5	700	228,780	36.2	19,828.44
		10	10	76	570,480	37.5	2303.5
	0.75	5	5	415	368,452	32.6	17,622.48
		5	10	30	235,325	32.8	1363.89
		10	5	393	364,380	38.3	17,864.67
		10	10	26	423,423	38.2	1259.88
	1.0	5	5	93	46,324	32.3	6008.68
		5	10	190	203,285	32.8	10,983.64
		10	5	244	86,325	37.3	15,764.74
		10	10	133	232,152	37.2	8593.08

¹ NA (not available) when the number of communication rounds is greater than 1000 rounds.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

So, J.; Lee, I.-B.; Kim, S. Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment. Appl. Sci. 2025, 15, 4108. https://doi.org/10.3390/app15084108

AMA Style

So J, Lee I-B, Kim S. Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment. Applied Sciences. 2025; 15(8):4108. https://doi.org/10.3390/app15084108

Chicago/Turabian Style

So, Junyong, In-Bae Lee, and Sojung Kim. 2025. "Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment" Applied Sciences 15, no. 8: 4108. https://doi.org/10.3390/app15084108

APA Style

So, J., Lee, I.-B., & Kim, S. (2025). Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment. Applied Sciences, 15(8), 4108. https://doi.org/10.3390/app15084108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Federated Learning-Based Framework to Improve the Operational Efficiency of an Articulated Robot Manufacturing Environment

Abstract

1. Introduction

2. Materials and Methods

2.1. Federated Learning

2.2. Industrial Articulated Robot Control in Smart Factories

2.3. Federated Learning for Articulated Robot Control

2.3.1. FL Module

2.3.2. Articulated Robot Control Module

3. Results

3.1. Scenario

3.2. Object Detection Model Performance

3.3. Robot Control Performance

3.4. Federated Learning Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI