Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem

Guo, Xiwang; Jiao, Chi; Ji, Peng; Wang, Jiacun; Qin, Shujin; Hu, Bin; Qi, Liang; Lang, Xianming

doi:10.3390/math12244000

Open AccessArticle

Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem

by

Xiwang Guo

¹

,

Chi Jiao

¹

,

Peng Ji

¹,

Jiacun Wang

²

,

Shujin Qin

³

,

Bin Hu

^4,*

,

Liang Qi

^5,*

and

Xianming Lang

¹

College of Information and Control Engineering, Liaoning Shihua University, Fushun 113001, China

²

Department of Computer Science and Software Engineering, Monmouth University, West Long Branch, NJ 07764, USA

³

College of Economics and Management, Shangqiu Normal University, Shangqiu 476000, China

⁴

Department of Computer Science and Technology, Kean University, Union, NJ 07083, USA

⁵

Department of Artificial Intelligence, Shandong University of Science and Technology, Qingdao 266590, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(24), 4000; https://doi.org/10.3390/math12244000

Submission received: 1 November 2024 / Revised: 13 December 2024 / Accepted: 16 December 2024 / Published: 19 December 2024

(This article belongs to the Special Issue Knowledge- and Learning-Driven Meta-Heuristics for Addressing Complex Optimization and Scheduling Problems)

Download

Browse Figures

Versions Notes

Abstract

:

Recycling end-of-life products is essential for reducing environmental impact and promoting resource reuse. In the realm of remanufacturing, researchers are increasingly concentrating on the challenge of the disassembly line balancing problem (DLBP), particularly on how to allocate work tasks effectively to enhance productivity. However, many current studies overlook two key issues: (1) how to reasonably arrange the posture of workers during disassembly, and (2) how to reasonably arrange disassembly tasks when the disassembly environment is not a single type of disassembly line but a hybrid disassembly line. To address these issues, we propose a mixed-integrated programming model suitable for linear and U-shaped hybrid disassembly lines, while also considering how to reasonably allocate worker postures to alleviate worker fatigue. Additionally, we introduce large language model-assisted reinforcement learning to solve this model, which employs a Dueling Deep Q-Network (Duel-DQN) to tackle the problem and integrates a large language model (LLM) into the algorithm. The experimental results show that compared to solutions that solely use reinforcement learning, large language model-assisted reinforcement learning reduces the number of iterations required for convergence by approximately 50% while ensuring the quality of the solutions. This provides new insights into the application of LLM in reinforcement learning and DLBP.

Keywords:

hybrid disassembly line balancing problem; large language model; reinforcement learning; prompt engineering

MSC:

68T20

1. Introduction

The current economy experiences a period of growth, marked by a significant influx of products into everyday life. However, this continuous introduction of new products also leads to a simultaneous increase in discarded items, which poses a clear environmental concern. The effective management of these discarded products is a topic that merits further research. Scholars unanimously advocate for disassembly as the most optimal approach for handling discarded products [1]. Disassembly enables the recovery of useful components from discarded items, which can then be reintegrated into manufacturing process to reduce production costs and generate additional revenue [2]. As a result, the search for a viable automated disassembly process solution brings attention to the disassembly line balancing problem (DLBP).

In actual production processes, both linear disassembly lines and U-shaped disassembly lines are widely used. Linear disassembly lines have a notably low deployment cost, making them ideal for handling small-scale products. Wang et al. [3] investigate key issues related to linear disassembly lines, while Li et al. [4] find that U-shaped disassembly lines can achieve higher disassembly efficiency with the same number of workstations as linear lines. Although there is substantial research on single disassembly lines, studies that integrate both linear and U-shaped disassembly lines remain scarce [2]. This paper focuses on the application scenarios in this area, as illustrated in Figure 1.

Each scrapped product possesses unique characteristics; while overarching disassembly operations may be mechanized, intricate details require human oversight. Tang [5], Zheng et al. [6], and Tuncel et al. [7] elucidated how component variances can influence the disassembly process. Currently, human expertise is indispensable in disassembly operations [8]. Moreover, Tang et al. [9] incorporated human factors into the study of DLBP.

Previous research has primarily focused on employee safety. Although safety is of the utmost importance, there exists a factor of greater significance for employees: Li et al. [10] assert that disassembly is exceedingly labor-intensive and poses risks to human health, making employee fatigue a significant concern. Guo et al. [11] suggest that the adoption of specific postures for the disassembly of various product types and the distribution of multiple tasks within a single posture can mitigate employee fatigue. In practical disassembly scenarios, the commonly observed postures include standing for the disassembly of larger components and sitting for the disassembly of delicate parts. Evidently, the improper execution of postures in disassembly tasks not only diminishes efficiency but may also result in disassembly failures. Consequently, this paper introduces a hybrid disassembly line balancing problem (HDLBP) that incorporates human postures to minimize costs.

The hybrid disassembly line balancing problem (HDLBP) described in this paper is a distinct NP-complete problem due to its integration of hybrid line configurations (linear and U-shaped), ergonomic constraints like worker posture allocation, and multi-objective optimization requirements. Its complexity arises from the combinatorial explosion of task assignments and the need to balance productivity and worker well-being, setting it apart from traditional disassembly line problems and making it a unique challenge in sustainable manufacturing. Given the complexity of the issue, the large language model (LLM) emerges as a strategic choice for addressing such challenges. Zhao et al. [12] highlight the LLM’s exceptional capabilities in complex reasoning tasks. Furthermore, in scenarios characterized by limited information, the LLM is adept at deducing rational outcomes. Currently, the application of the LLM in solving simpler downstream tasks is regarded as a superior solution [13]. Liu et al. [14] further emphasize that the LLM’s advantages can be fully utilized by decomposing problems into multiple sub-problems. Considering the characteristics of the HDLBP, it is decomposed into the disassembly sequence problem (DSP) and the DLBP, as illustrated in Figure 2. The former is more aligned with logical reasoning, whereas the latter is inclined toward numerical computations.

For the DLBP, reinforcement learning methods have proven effective [15]. Suleyman Mete et al. [16] employ reinforcement learning to tackle the DLBP. Zhang et al. [17] utilize DDPG to address human–robot collaboration challenges. Likewise, DRL [8] has also been applied, showcasing effective performance in resolving the DLBP.

Mei et al. [18] suggest that the DQN algorithm excels in the DLBP. Furthermore, Liu et al. [19] and Zhao et al. [20] enhance the DQN algorithm to address related challenges. However, despite existing research demonstrating the effectiveness of various reinforcement learning algorithms in the DLBP, there are still some challenges. The current DQN algorithm may face issues of slow convergence and poor stability when dealing with complex scenarios. Given the maturity of reinforcement learning methods in the DLBP and the long-term perspective required to address the aforementioned challenges, this paper adopts the Duel-DQN algorithm based on the Dueling network [21] to tackle the issue.

This work endeavors to bridge the research gap and introduces the following novel contributions:

(1): It proposes a hybrid disassembly line environment that combines both linear and U-shaped disassembly lines to enhance compatibility with workstation postures, optimizing profitability through a mathematical framework for the HDLBP.
(2): It recommends a strategy that decomposes the HDLBP into the DSP and the DLBP, applying the LLM and Duel-DQN algorithms, respectively.
(3): It identifies the optimal LLM model through comparisons with various alternatives, validating its feasibility and improvement rates against non-LLM algorithms.

The remainder of this paper is structured as follows: The HDLBP is delineated in Section 2. The methodology for its solution is articulated in Section 3. The experimental outcomes are presented in Section 4. The paper concludes with considerations for future work in Section 5.

2. Problem Description

2.1. Problem Statement

The traditional DLBP focuses solely on individual disassembly lines and neglects employee status considerations. In practical situations, multiple disassembly lines often merge into a single system, adding complexity by requiring appropriate product assignments to each line.

The HDLBP encompasses various product types, involving the allocation of products to designated disassembly lines, establishing a disassembly sequence, and distributing disassembly tasks across the line. Disassembly lines generally comprise multiple workstations, each tailored for an employee to maintain a specific posture. Given these constraints, the problem has the potential to yield numerous solutions.

To optimize efficiency, each part within a product is labeled with the optimal disassembly posture prior to disassembly. This study classifies two postures: “standing posture” (denoted as 1) and “sitting posture” (denoted as 0). If parts are assigned to workstations with mismatched postures, disassembly tasks cannot be executed. After completing disassembly operations, reusable subcomponents are acquired, incurring associated costs.

Each product’s part combination is distinct. One complete way to describe product structure is by using an AND/OR graph [22]. This paper makes the following assumptions:

(1): Product parameters are well defined, including the cost associated with the product’s disassembly, resultant cost, and the AND/OR graph representation.
(2): The hybrid disassembly line bifurcates into two distinct configurations: a linear line and a U-shaped line.
(3): Each workstation within the system is allocated a predetermined fixed cycle time.
(4): The requisite disassembly posture for each subassembly within the product is established and aligns with the disassembly posture allocated to the workstation.
(5): The configuration of the posture remains invariant, as do the disassembly posts allocated to the product.
(6): Parameters pertinent to the disassembly line are delineated, including, but not limited to, the operational cost of the workstation and the cost associated with initiating the disassembly line.
(7): To the greatest extent feasible, the distribution of tasks among the workstations on the disassembly line should be equitable to ensure uniform labor intensity among workers.

2.2. AND/OR Graph

The disassembly process of a product constitutes a discrete event, permitting it to be represented using AND/OR graphs. Figure 3 illustrates the AND/OR graph corresponding to product PC, highlighting the precedence and conflicting relationships among various operations. In this graph, subcomponents are denoted by angle brackets indexed by integer i, while arcs symbolize disassembly actions. A node may embody multiple disassembly actions, indicating an “or” relationship, whereas operations emanating from a singular initial node denote an “and” relationship.

In Figure 3, a total of 13 disassembly actions are presented. Component <1> deconstructs into components <2> and <3>, constituting action 1, with components <2> and <3> sharing an “and” relationship. Component <2> is subject to actions 2, 3, and 4, establishing an “or” relationship among these actions, indicating that at any specific instance, only one action may be selected for implementation. Commencing from component <1> in the graph, a disassembly pathway can be delineated, incorporating the execution of actions 1, 3, and 8.

2.3. Mathematical Model

2.3.1. Disassembly Matrix

The disassembly matrix

D = [d_{i j}^{p}]

describes the relationship between components and disassembly tasks, where i represents the components, j represents the task, and p represents the product number.

d_{i j}^{p} = \{\begin{matrix} 1, & if component i is obtained by task j \\ in product p; \\ - 1, & if component i is disassembled by task j \\ of product p; \\ 0, & otherwise . \end{matrix}

2.3.2. Precedence Matrix

The precedence matrix

S = [s_{j k}^{p}]

describes the relationship between two tasks, where j and k represent the disassembly tasks, and p represents the product number.

s_{j k}^{p} = \{\begin{matrix} 1, & if task j is a predecessor of task k \\ in product p; \\ 0, & otherwise . \end{matrix}

2.3.3. Incidence Matrix

The incidence matrix

R = [r_{j k}^{p}]

describes the conflicting relationship between two tasks, where j and k represent the disassembly tasks, and p represents the product number.

r_{j k}^{p} = \{\begin{matrix} 1, & if task j conflict with each other \\ in product p; \\ 0, & otherwise . \end{matrix}

2.3.4. Constraint Model

This section shows a linear model of the HDLBP, where the required notation and decision variables are defined as follows.

(1): Notations:

$P$	Set of all end-of-life products, $P$ = {1, 2, …, P}.
$I_{p}$	Set of all components/parts in product p, $I_{p} = {1, 2, \dots, I_{p}}$ .
$J_{p}$	Set of all tasks in product p, $J_{p} = {1, 2, \dots, J_{p}}$ .
$L$	Set of all disassembly lines, $L = {1, 2}$ $W = {1, 2, \dots, W^{1}}$
$W^{2}$	Set of all U-shaped workstations, $W = {1, 2, \dots, W^{2}}$
$S$	Set of edges of U-shaped workstations, $S = {1, 2}$
$I_{j}^{p, D}$	Set of the relationship between components i and task j in product p.
$J_{j}^{p, R}$	Set of tasks that conflict with task j in product p.
$J_{j}^{p, S}$	Set of immediate tasks for task j in product p.
P	Number of production.
$I_{p}$	Number of components/parts in production p.
$J_{p}$	Number of tasks in production p.
$W^{1}$	Number of linear workstations
$W^{2}$	Number of U-shaped workstations.
$t_{p j}$	Time to execute the j-th task of the p-th product.
$c_{p j}$	Unit time cost of executing the j-th task of the p-th product.
$c_{l}$	Unit time cost of opening of the l-th disassembly line.
$c_{l w}$	Fixed cost of opening of the w-th workstation in l-th
$a_{p j}$	Posture to execute the j-th task of the p-th product.
$b_{l w}$	Configuration of opening of the w-th workstation in the l-th disassembly line.

(2): Decision variables

z_{p l} = \{\begin{matrix} 1, & product p is performed at the \\ disassembly line . \\ 0, & otherwise . \end{matrix}

x_{p j w}^{1} = \{\begin{matrix} 1, & if disassembly task j in product p is performed \\ at the workstation w in the linear disassembly \\ line . \\ 0, & otherwise . \end{matrix}

x_{p j w s}^{2} = \{\begin{matrix} 1, & if disassembly task j in product p is performed \\ at the s - th position in workstation w in \\ the U - shaped disassembly line . \\ 0, & otherwise . \end{matrix}

y_{l} = \{\begin{matrix} 1, & open the l - th disassembly line . \\ 0, & otherwise . \end{matrix}

u_{l w} = \{\begin{matrix} 1, & open the workstation w in the l - th disassembly line . \\ 0, & otherwise . \end{matrix}

T^{l}, the cycle time of the l - th disassembly line .

Based on these symbols and decision variables, the objective function and constraint formulas of the mathematical model are as follows.

(3): Minimize Cost Model (CS Model):

\begin{matrix} min_{f} = \sum_{p \in P} \sum_{j \in J_{p}} (\sum_{w \in W^{1}} (c_{p j} t_{p j}) x_{p j w}^{1} \\ + \sum_{w \in W^{2}} \sum_{s \in S} (c_{p j} t_{p j}) x_{p j w s}^{2}) \\ + \sum_{l \in L} (c_{l} t^{l} + \sum_{w \in W^{l}} c_{l w} u_{l w}) \end{matrix}

(1)

The objective function describes the process of minimizing cost, including the cost of the linear disassembly line and the cost of the U-shaped disassembly line. In the actual situation, it is impossible to simply rely on the objective function to solve the problem; it is also necessary to define the constraints according to the actual situation.

(4): Linear Disassembly Line Constraint:

\begin{matrix} x_{p j w}^{1} \leq u_{l w}, \forall p \in P, \forall j \in J_{p}, l = 1, \\ \forall w \in W^{1}, a n d a_{p j} = b_{l w} \end{matrix}

(2)

\begin{matrix} x_{p j w}^{1} \leq z_{p l}, \forall p \in P, \forall j \in J_{p}, l = 1, \\ \forall w \in W^{1}, a n d a_{p j} = b_{l w} \end{matrix}

(3)

\begin{matrix} \sum_{j \in J_{p}} t_{p j} x_{p j w}^{1} \leq T^{1}, \forall w \in W^{1} \end{matrix}

(4)

\begin{matrix} \sum_{w \in W^{1}} w (x_{p j w}^{1} - x_{p k w}^{1}) + W^{1} (\sum_{w \in W^{1}} x_{p k w}^{1} - 1) \leq 0, \\ \forall p \in P, \forall j, k \in J_{p} a n d s_{p j k} = 1 \end{matrix}

(5)

\begin{matrix} s_{p j k} (\sum_{w \in W^{1}} x_{p k w}^{1} - \sum_{w \in W^{2}} x_{p j w}^{1}) \leq 0, \\ \forall p \in P, \forall j, k \in J_{p}, \forall w \in W^{1} \end{matrix}

(6)

\begin{matrix} \sum_{w \in W^{1}} (x_{p j w}^{1} + x_{p k w}^{1}) \leq 1, \\ \forall p \in P, \forall j, k \in J_{p} a n d r_{p j k} = 1 \end{matrix}

(7)

The above constraints define the conditions that must be satisfied when assigning a product to a linear disassembly line. Constraint (2) ensures that task j of product p can only be assigned to workstation w within the product’s designated disassembly line and that the workstation posture matches. Constraint (3) guarantees that a disassembly line must be active when a product is assigned to it. Constraint (4) ensures that each workstation on the disassembly line does not exceed the cycle time. Constraint (5) ensures that task assignments on the disassembly line adhere to precedence relationships. Constraint (6) mandates that a task’s predecessor is assigned before it. Finally, Constraint (7) ensures that task assignments comply with conflict constraints on the disassembly line.

(5): U-shaped Disassembly Line Constraint:

\begin{matrix} x_{p j w s}^{2} \leq u_{l w}, \forall p \in P, \forall j \in J_{p}, l = 2, \\ \forall w \in W^{2}, \forall s \in S a n d a_{p j} = b_{l w} \end{matrix}

(8)

\begin{matrix} x_{p j w s}^{2} \leq z_{p l}, \forall p \in P, \forall j \in J_{p}, l = 2, \\ \forall w \in W^{2}, \forall s \in S a n d a_{p j} = b_{l w} \end{matrix}

(9)

\begin{matrix} \sum_{j \in J_{p}} \sum_{s \in S} t_{p j} x_{p j w s}^{2} \leq T^{2}, \forall w \in W^{2} \end{matrix}

(10)

\begin{matrix} \sum_{w \in W^{2}} (w (x_{p j w 1}^{2} - x_{p k w 1}^{2}) + (2 W^{2} - w) (x_{p j w 2}^{2} - x_{p k w 2}^{2})) + \\ 2 W^{2} (\sum_{w \in W^{2}} \sum_{s \in S} x_{p k w s}^{2} - 1) \leq 0, \\ \forall p \in P, \forall j, k \in J_{p} a n d s_{p j k} = 1 \end{matrix}

(11)

\begin{matrix} s_{p j k} (\sum_{w \in W^{1}} \sum_{s \in S} x_{p k w s}^{2} - \sum_{w \in W^{2}} \sum_{s \in S} x_{p j w s}^{2}) \leq 0, \\ \forall p \in P, \forall j, k \in J_{p}, \forall w \in W^{2}, \forall s \in S \end{matrix}

(12)

\begin{matrix} \sum_{w \in W^{1}} \sum_{s \in S} (x_{p j w s}^{2} + x_{p k w s}^{2}) \leq 1, \\ \forall p \in P, \forall j, k \in J_{p} a n d r_{p j k} = 1 \end{matrix}

(13)

The above constraints outline the conditions that must be satisfied when the product is assigned to a U-shaped disassembly line. Constraint (8) specifies that when product p is assigned to a U-shaped disassembly line, disassembly task j must be allocated to that line, and the required disassembly posture a must match the workstation’s configuration b. Constraint (9) guarantees that the U-shaped disassembly line is active when task j is assigned and that it meets the necessary posture requirements. Constraint (10) limits the cycle time of each workstation w on the U-shaped disassembly line to ensure that it does not exceed the line’s overall cycle time. Constraint (11) enforces that task j assigned to product p on the U-shaped line adheres to precedence relationships. Constraint (12) mandates that predecessor task k is assigned before task j, and Constraint (13) ensures that task j complies with the conflict constraints of the U-shaped disassembly line.

(6): Hybrid Disassembly Line Constraint:

\begin{matrix} \sum_{l \in L} z_{p l} = 1, \forall p \in P \end{matrix}

(14)

\begin{matrix} \sum_{w \in W^{1}} x_{p j w}^{1} + \sum_{w \in W^{2}} \sum_{s \in S} x_{p j w s}^{2} \leq 1, \\ \forall p \in P, \forall j \in J_{p} \end{matrix}

(15)

\begin{matrix} z_{p l} \leq y_{l}, \forall p \in P, \forall l \in L \end{matrix}

(16)

\begin{matrix} u_{l w} \leq y_{l}, \forall w \in W^{l}, \forall l \in L \end{matrix}

(17)

\begin{matrix} z_{p l} \in {0, 1}, \forall p \in P, \forall l \in L \end{matrix}

(18)

\begin{matrix} x_{p j w}^{1} \in {0, 1}, \forall p \in P, \forall j \in J_{p}, \forall w \in W^{l} \end{matrix}

(19)

\begin{matrix} x_{p j w s}^{2} \in {0, 1}, \forall p \in P, \forall j \in J_{p}, \forall w \in W^{2}, \forall s \in S \end{matrix}

(20)

\begin{matrix} y_{l} \in {0, 1}, \forall l \in L \end{matrix}

(21)

\begin{matrix} u_{l w} \in {0, 1}, \forall w \in W^{l}, \forall l \in L \end{matrix}

(22)

\begin{matrix} T^{l} \in R_{+}, \forall l \in L \end{matrix}

(23)

The above constraints are based on both linear and U-shaped disassembly lines, aiming to combine them into a hybrid disassembly line and add necessary adjustments for practical applications. Constraint (14) ensures that each product can only be assigned to one disassembly line. Constraint (15) guarantees that each disassembly task is performed no more than once per product. Constraint (16) restricts product assignment to open disassembly lines. Constraint (17) ensures that only workstations w on active disassembly lines can be used. Constraints (18)–(23) define the range of decision variables.

Of the constraints mentioned, Constraints (2) and (3) apply to the linear and the U-shaped disassembly line models, while Constraints (8) and (9) address posture-related considerations.

3. Proposed Algorithm

The LLM is designed to comprehend and produce human-like text, enabling it to perform various language-related tasks based on the input it receives. It offers extensive assistance in content generation and research by producing diverse textual content tailored to research needs. In this paper, we guide the LLM to achieve specific outcomes using carefully crafted prompt words.

Reinforcement learning algorithms involve dynamic interactions between agents and the environment. Agents observe the state of the environment, make decisions, and receive rewards accordingly. Here, the agent performs tasks informed by the LLM’s guidance.

3.1. Environment

The problem is decomposed into two parts: DSP and DLBP. In the DSP, the focus is on assigning products to a disassembly line and selecting the disassembly sequence. In the DLBP, attention shifts to allocating the selected disassembly sequence to specific workstations.

The solution process is illustrated in Figure 2. We first address the DSP by interacting with the LLM to obtain the disassembly sequence for each product. This sequence is then forwarded to the DLBP, where we modify the environment based on the information transmitted by the LLM and use the Duel-DQN algorithm to optimize task allocation. This yields an allocation scheme and corresponding cost, which are relayed back to the DSP, informing the LLM to generate subsequent disassembly sequences.

3.2. Large Language Model

The DSP emphasizes logical reasoning, where reliance on reinforcement learning algorithms could lead to numerous trial-and-error attempts, with minor mistakes potentially resulting in incorrect outcomes. In contrast, the LLM can reduce this issue by utilizing pre-existing dialogues and applying contextual knowledge to handle logical problems more effectively, thus deducing improved results.

3.2.1. Prompt Engineering

Prompt engineering [23] is a technique focused on crafting prompts to guide LLMs in generating desired outputs. The goal is to fully leverage LLM capabilities for specific tasks. With the rise of advanced LLMs, such as ChatGPT-3, prompt engineering has become increasingly crucial, as well-constructed prompts can significantly enhance user efficiency.

For the HDLBP, simply requesting answers from an LLM may not yield the desired outcomes. We must design prompts that effectively guide the LLM to act in alignment with our expectations.

3.2.2. Zero-Shot Prompting

Zero-shot prompting [24] is a strategy used in natural language processing and machine learning tasks to address unseen categories or tasks, particularly with pre-trained language models. The HDLBP is a novel problem for large language models; however, the LLM’s strong logical reasoning capabilities can be leveraged to generate accurate responses, though guiding it effectively requires carefully crafted prompts. For the tasks in this paper, we need the LLM to understand the following concepts:

The problem is a branch of the disassembly line balancing problem.
The problem has two disassembly lines: one linear and one U-shaped.
The problem requires solving for multiple products, each represented by its own AND/OR graph.
The objectives that the LLM needs to accomplish.

This setup enables the LLM to grasp the relationships between disassembly tasks and common sequences within each product. Given the task complexity, the LLM requires both expert-level knowledge and user-provided guidance. Here, AND/OR graphs are used to describe products, and users are expected to clarify the relationships within each graph to ensure the LLM’s comprehensive understanding.

3.2.3. Prompt Generation Algorithm

The DLBP is not a problem with specific parameters; instead, its surrounding parameters can vary significantly. Given the wide variety of product forms, it is impractical to explain each product’s component structure in detail during every interaction with the LLM. To address this, we require an algorithm to improve efficiency. Before introducing the algorithm, we will first explain two key formulas.

F (i, j) = \{\begin{matrix} T r u e & if j is a successor of i \\ F a l s e & otherwise \end{matrix}

(24)

where i and j represent the task numbers within the product.

We use a relationship matrix

R_{p, i, j}

to assist in describing the connections between tasks. The specific equation is as follows:

R_{p, i, j} = {(p, i, j) | i, j \in J_{p}, F (i, j) = T r u e}

(25)

where

J_{p}

represents the task set of product p.

Based on the two formulas above, this paper provides an algorithm to generate LLM prompt words through the description of product disassembly tasks, as shown in Algorithm 1.

Algorithm 1 Prompt Word Generation Algorithm

Require: $D_{p}$ (the task set of product p) and $L_{l}$ (set of disassembly lines)
Ensure: the final generated prompt word S
Initialization: generate empty string S, string buffer $S^{'}$ , relationship matirx, $R_{p, i, j} \leftarrow ⌀$
$S \leftarrow$ “There are $l e n (L)$ disassembly lines. There are $l e n (P)$ products”.
for p in $D$ do
for i in $D_{p}$ do
for j in $D_{p}$ do
if $i \neq j and F (i, j)$ then
$R \leftarrow R \cup (p, i, j)$
end if
end for
end for
end for
$R \leftarrow$ Sort $R$ with priority determined first by p and then by i.
$S^{'} \leftarrow$ Convert $R$ to a string.
$S \leftarrow S \cup S^{'}$
return S

The research content assumes that both disassembly lines have five workstations and are dedicated to disassembling the PC product. An example of the prompting words is shown in the dialogue below.

Generate results:
Q: Imagine you are in an environment similar to a disassembly line balancing problem. There are two disassembly lines: one linear and one U-shaped. There is one product: a PC.
The PC has a total of 13 disassembly tasks: Task 1 initiates the disassembly process, followed by three options: Task 2, Task 3, or Task 4. If Task 2 is chosen, the next task is either Task 5 or Task 6. If Task 3 is chosen, the next task is either Task 7 or Task 8. If Task 4 is chosen, you can proceed to either Task 9 or Task 10. Task 7 leads to Task 13, Task 9 leads to Task 11, and Task 10 leads to Task 12.
In the format “I Choose [x] for [product name] in [disassembly line name]”, where [x] is a disassembly sequence representing your choices. For example, [x] is [1, 2, 3, 4, 5].
I will now assign the PC to the linear disassembly line. Please provide a possible disassembly plan for the product. Afterward, I will give you the cost of the plan, and you may adjust it to minimize cost as much as possible.
A: I Choose [1,3,7,13] for PC in linear disassembly line.

In this setup, we need to convert the LLM’s response into a format that is compatible with the reinforcement learning algorithm. Since the LLM’s output is consistently formatted, we can create a program to perform this transformation using regular expressions, as illustrated in Algorithm 2. In the example dialogue above, the LLM’s response is converted to the disassembly sequence

D

= [1, 3, 7, 13].

Algorithm 2 Disassembly sequence extraction algorithm

Require: LLM text, S
Ensure: set of disassembly sequences $D$ , selected disassembly line l
Initialization: A pattern matching string, $R e$ , a set of disassembly sequences $D$ , selected disassembly line l
$R e \leftarrow$ “∖[∖d +(?:,∖d)*∖] in PC (?:linear|U-shaped)”
$D \leftarrow$ Use $R e$ to match S and extract the array part obtained after splitting
$l \leftarrow$ Split to obtain the disassembly line segment.
return $D, l$

3.2.4. Combining with Duel-DQN

Liu et al. [14] delegate the cross-compilation portion in swarm intelligence computing to an LLM for resolution. On the reinforcement learning side, we can either insert information provided by the LLM before the environment update or have the LLM assist in updating the experience pool of the Duel-DQN algorithm. In this study, we choose the former approach, where environmental information is adjusted with each environment reset. As shown in Figure 4, after each transmission, the LLM generates a string, which the environment converts into a disassembly sequence, updating the environmental parameters accordingly. Once the Duel-DQN algorithm produces results, they are converted into a string and sent back to the LLM to continue the sequence generation.

3.3. Duel-DQN Algorithm

The Duel-DQN algorithm in this paper primarily handles the task disassembly segment. The product allocation information provided by the LLM in the previous section significantly reduces the trial-and-error time and costs of the Duel-DQN algorithm, allowing it to focus solely on the efficient assignment of tasks to workstations.

3.3.1. State Design

To ensure the reusability of prior experience following environmental modifications, we designed the state to encompass all potential scenarios on the disassembly line. The state, denoted as

X

, is a three-dimensional array with dimensions representing the number of disassembly lines, products, and tasks. The values within this array indicate the allocation of specific tasks from particular products to designated workstations.

3.3.2. Action Design

In this paper, action A represents the assignment of the current task to a specified workstation or the choice not to disassemble. Given that U-shaped disassembly lines have twice the available workstations compared to linear lines, action A ranges from 0 to twice the maximum number of workstations. For linear disassembly lines, actions are filtered to range from 0 to the maximum number of workstations, ensuring that no illegal operations occur.

3.3.3. Reward Design

The algorithm’s reward is structured around three key aspects: (1) A slight penalty is applied if the agent does not assign a workstation to the current task. (2) When the agent assigns a valid workstation, the reciprocal of the associated cost serves as a reward. (3) A significant penalty is imposed if the agent assigns an invalid workstation, signaling that such actions are not permissible.

3.3.4. Algorithm Process

Following the explanation above, the training algorithm for this method is as follows:

In Algorithm 3, disassembly sequences

D

and l are sequences received from the LLM. Before executing the algorithm, the environment must be modified using these two parameters.

Algorithm 3 Train algorithm

Require: set of disassembly sequences $D$ , Action A, selected disassembly line l, set of line workstations $W^{l}$ , state $X$ , reward $R e w a r d$
Ensure: $X$ , $R e w a r d$ , $C o s t$
Initialization: the cost lost in the previous iteration, $C o s t \leftarrow 0$
if l is a linear disassembly line then
check the legality of A and if it is not legitimate
else
continue
end if
for p in $D$ do
for j in $D_{p}$ do
if A == $l e n (W^{l})$ then
$X_{l p j} \leftarrow A$
$R e w a r d \leftarrow$ impose a slight penalty
return $X$ , $R e w a r d$ , $C o s t$
else if $a_{l A} = = b_{l A}$ then
$X_{l p j} \leftarrow A$
$C o s t \leftarrow$ calculate the Cost obtained from distribution
$R e w a r d$ += $C o s t$
else
$R e w a r d \leftarrow$ impose an extreme penalty
return $X$ , $R e w a r d$ , $C o s t$
end if
end for
end for
send $C o s t$ to LLM.
return $X$ , $R e w a r d$ , $C o s t$

4. Experiment and Results

This section explores the HDLBP, incorporating human postures, comparing different solutions, and validating the performance of the LLM+Duel-DQN method. In this study, we use LM Studio to build the LLM. The algorithm is implemented in PyCharm Community Edition 2023.1 with Python 3.9. The operating system is Windows 10, running on an Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz, with an NVIDIA GeForce RTX 2070 8 GB GPU, and CUDA version V11.7.

4.1. Test Instances

The PC [25], washing machine [26], radio [27] and mechanical module [28] serve as the four test instances. The AND/OR graph for the PC is provided in this text, while the graphs for the washing machine and radio are not further described. The washing machine contains 15 subcomponents and 13 disassembly operations, and the radio has 29 subcomponents and 30 disassembly operations. In this study, experiments begin with the PC and the washing machine, while the radio and the mechanical module are introduced later for data validation. Table 1 outlines the four types of test cases.

This study deploys Zephyr-7B-

β

[29], Airboros-13b-gpt4-1.4, and Nous-Capybara-7B locally, while utilizing GPT-3.5 Turbo and chatGlm3 [30] online. These specific LLMs were selected based on their different architectures, functionalities, and performance benchmarks, within the constraints of available resources. Given the significant differences among various LLMs, the paper aims to identify a universal LLM solution as much as possible.

4.2. Hyperparameter Tuning

In this paper, our primary focus is on the hyperparameter “temperature”. The larger temperature value increases the randomness in the answers generated by the large language model (LLM). Since the problem involves a finite set of solutions, the optimal temperature is expected to be within the range of 0.2 to 0.5. For this experiment, Zephyr-7B-

β

is employed with a products PC and washing machine, requiring the LLM to generate a possible disassembly sequence and allocation plan.

As shown in Table 2, we ensure that the LLM generates at least 10 outputs for evaluation in each iteration. Ultimately, the parameter range from 0.3 to 0.5 is found to be suitable. Similar experiments are conducted with other LLMs, and a range of 0.2–0.5 consistently produces the desired results without requiring additional generations. In the subsequent experiments in this paper, the temperature is fixed at 0.3.

4.3. Adjustment of Prompt Words

After multiple rounds of experiments, we observed that the accuracy of solutions provided by LLM is not very high. To address this, we manually supplemented the adjusted the generated prompts based on the model’s requirements. This adjustments includes modifying the description of AND/OR graphs, emphasizing AND/OR relationships, redefining goals, and reducing the number of questions. The following outlines our improvement strategies for the deployed model and the expected outcomes.

Nous-Capybara-7, which focuses primarily on multi-turn dialogues and summarizing descriptions, was utilized. In practical experiments, we conducted multi-turn dialogues in advance to help the model understand the problem environment. Each conversation focused on describing one product, provided relevant disassembly line information at the end, and instructed the LLM to generate a disassembly sequence with the lowest possible disassembly cost. The model ultimately produced the desired results and was capable of engaging in further dialogue to recommend additional disassembly sequences. An example of the prompting words is shown in the dialogue below.

For Nous-Capybara-7:
Q: What are the disassembly steps for the products PC and washing machine?
Q: Can you find a disassembly sequence for the product PC and a disassembly sequence for the product washing machine?
A: I apologize …Product PC: 1. Initiate the disassembly process. 2. …From Task 9, choose between Task 10 and Task 12 …
Q: What are the disassembly steps for the product PC?
Q: What are the disassembly steps for the product washing machine?
Q: Provide the disassembly line information.
Q: Please choose one for me …
A: I choose [1, 3, 7, 11] for PC and choose [1, 3, 8, 9] for washing machine.
Q: OK, please make another choice.
A: I choose [1, 2, 5, 10, 11] for the washing machine and [1, 4, 9, 12] for the PC.

Zephyr-7B-

β

is a model that focuses on writing and chatting but is less effective at tasks involving reasoning and mathematical calculations. In experiments, Zephyr-7B-

β

typically provided only one disassembly sequence. Therefore, in conversations, it is crucial to explicitly emphasize this objective. Although the model generated a considerable amount of text, it ultimately produced the desired results. An example of the prompting words is shown in the dialogue below.

For Zephyr-7B- $β$ :
Q: I need you to provide …
A: For product PC …plan could be [1, 4, 9] …For product washing machine, a possible disassembly plan could be [1, 2, 5] …
Q: Now I have 2 disassembly lines, please choose one for me, just say “I choose x for PC in line 1 and choose y for washing machine in line 2”. For example, you can say “I choose [1, 3, 7, 11] for PC in line 1 and choose [1, 3, 8, 9] for washing machine in line 2”. You need to choose tasks until you can’t continue executing them. For example …
A: …choose Line 1 with the sequence [1, 3, 7, 11] for product PC. On the other hand …choose Line 2 with the sequences [1, 3, 8, 9] for product washing machine …

Airboros-13b-gpt4-1.4 is an extended version similar to GPT. However, the memory demands of this model exceed the capacity of the devices used in this study, preventing the full utilization of its capabilities in the experimental environment. As a result, this experiment simulated how to adjust parameters when the model’s performance is limited. In this setup, each paragraph was split into multiple sequential inputs, guiding the model to understand the content step by step, ultimately leading it to output the target disassembly sequence. An example of the prompting words is shown in the dialogue below.

For Airboros-13b-gpt4-1.4:
Q: Task 1 initiates the disassembly process.
A: According to the information provided …
Q: After Task 1, you have three options: proceed to Task 2, Task 3, or Task 4.
…
Q: Please generate a sequence of tasks in the format [x] with any number of tasks. x represents the sequence of PC.
A: [1, 3, 7, 13].
Q: I have disassembly line 1 and disassembly line 2, please choose one for PC. You can say “disassembly line x”
A: disassembly line 1.

As for ChatGPT-3.5-turbo, the model quickly understood prompt words and provided the expected results without requiring additional adjustments. We anticipated that the responses from the three locally deployed LLMs mentioned earlier would closely align with those from the online models. We conducted 1000 simulations to evaluate the accuracy of the LLM. During these simulations, the LLM was instructed to recommend disassembly sequences aimed at maximizing profit. If a decision resulted in a higher cost compared to the previous one, it was labeled as a ‘decision error’; otherwise, it was considered a correct decision. We defined

N o u s

as the state of each simulation, represented as

n o u s = [a_{1}, a_{2}, \dots, a_{n}], a_{n} \in 0, 1

, where 1 represents a correct decision and 0 represents a decision error.

For each simulation

a_{i}

, we needed to calculate its correctness rate M:

M = \frac{\sum_{i = 1}^{n} a_{i}}{n}, n = 1, 2, \dots, 1000

(26)

Since the results generated by the LLM alone were not very accurate, we applied a simple correction algorithm to evaluate whether the disassembly sequence was correct. In this paper, we used the AND/OR diagram of the product to verify the disassembly sequence. A brief description of the algorithm is provided in Algorithm 4. Figure 5 demonstrates that using the algorithm significantly improved accuracy. In the subsequent experiments, we primarily relied on Nous-Capybara-7.

Algorithm 4 Correction Algorithm

Require: set of disassembly sequences $D$
Ensure: Check for continuity of the disassembly sequence
Initialize: $l i s t \leftarrow$ Disassembly sequence within $D$
for i in $l i s t$ do
if successor nodes of $l i s t [i]$ match nodes in $l i s t [i + 1]$ in the AND/OR graph then
Continue
else
Remind the large language model to regenerate the sequence
return False
end if
end for
Send $l i s t$ to the second stage
return True

4.4. Solving Process

In this section, we use the PC product to illustrate the data exchange process of the solution. Before the experiment begins, it is essential to ensure that the LLM understands the relationships between each disassembly task of the PC product. The dialogue below demonstrates how this is communicated to the LLM.

For LLM
Q: environment description:…
Q: In the format “I Choose [x] for [product name] in [disassembly line name]”, where [x] is a disassembly sequence representing your choices. For example, [x] is [1, 2, 3, 4, 5].
A: I choose [1, 2, 6] for PC in linear disassembly.

The string “I choose [1, 2, 6] for PC in linear disassembly” needs to be converted into a variable before being passed to Duel-DQN. In this case study, the processed variable becomes the disassembly sequence

D = [1, 2, 6]

. After validation through Algorithm 4, the disassembly sequence

D

and the disassembly line status are passed to Duel-DQN. Following training using Algorithm 3, a dataset is produced. In this experiment, the algorithm allocates the LLM-provided disassembly sequence to workstations 1, 3, and 5, resulting in a calculated cost of 120 and a reward of 100, expressed as

X = [[1, 0, 2, 0, 5]]

,

R e w a r d = 100

,

C o s t = 120

. Finally, the algorithm converts this dataset into a string: “The sequence [1, 2, 6]’s cost is 120, please choose another one”, prompting the LLM to generate further. The subsequent dialogue proceeds as follows:

For LLM
Q: The sequence in linear disassembly line [1, 2, 6]’s cost is 120, please choose another one.
A: I choose [1, 3, 7, 13] for PC in linear disassembly.

In the next round of dialogue, we obtain the disassembly sequence

D = [1, 3, 7, 13]

. After the calculation, the results are returned to the LLM for it to provide a new solution.

4.5. Efficiency Comparison

This section evaluates the efficiency of the large language model (LLM) by comparing its performance with that of the Duel-DQN algorithm in completing similar tasks. Using the product PC as an example, we assess the total number of disassembly sequences generated by each method when searching for suboptimal solutions. This is achieved by expanding the possible disassembly sequences for the product PC through the addition of AND/OR graph branches.

The specific results are shown in Table 3. It can be observed that due to the powerful reasoning ability of the LLM, there is a significant advantage in recommending disassembly sequences compared to Duel-DQN.

4.6. Feasibility Analysis

We increased the disassembly paths of the product by duplicating the existing nodes in the PC and informing the LLM that the newly added paths are identical to the original ones. Subsequently, both the Duel-DQN algorithm and the LLM traversed all disassembly sequences to gather statistical data.

Table 4 illustrates the costs generated by three different methods in various cases, while Figure 6 illustrates the rewards obtained by the model under Case 1, where the posture ratio of standing to sitting is 0.5. The graph shows that while both approaches yield very similar final values, the LLM achieves a faster convergence rate. This is due to the LLM’s strong reasoning capabilities, which significantly reduce trial-and-error time and increased iteration efficiency.

5. Conclusions and Future Research

This paper presents an HDLBP problem that considers employee posture constraints, aiming to minimize costs, and proposes a mathematical model. To solve this problem, we adopted the Duel-DQN algorithm from reinforcement learning and used an LLM to assist it in addressing the problems proposed in this paper. We decompose the problem into two subproblems: DSP and DLBP. We allow the LLM and Duel-DQN to solve these two subproblems, respectively, and interact with each other. This approach can significantly reduce the number of iterations required by Duel-DQN, enhancing the overall iteration efficiency. By comparing our approach with traditional methods that use only reinforcement learning or solvers, we found that the results of these three methods are very similar. This indicates that we have ensured the solution results do not suffer losses while increasing iterative efficiency.

It is worth noting that although the method described in this paper reduces the number of iterations, its effectiveness is significantly constrained by the foundational capabilities of the large model, and the overall iteration time is also limited by the model’s performance. Additionally, the lack of better-quality resources imposes many constraints on the continuation of experiments. However, we believe that these limitations will be addressed with advancements in technology.

Future research directions include (1) further developing LLMs to reduce their reasoning costs on such problems and enhance their overall problem-solving capabilities; (2) allowing LLMs to intervene directly in reinforcement learning algorithms to further improve agent learning efficiency; (3) training large models using relevant data, which could enhance their ability to process data in specific domains, thereby improving the overall performance of the algorithms; and (4) applying this method to a wider range of optimization problems, such as [31,32,33,34]. This interdisciplinary approach is expected to bring innovative solutions to many optimization problems in future manufacturing processes.

Author Contributions

Conceptualization, X.G. and S.Q.; Formal analysis, J.W.; validation, C.J. and P.J.; Writing—original draft, C.J. and P.J.; writing—review and editing, L.Q. and B.H.; supervision, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ‘National Local Joint Engineering Laboratory for Optimization of Petrochemical Process Operation and Energy saving Technology’ grant number LJ232410148002. and ‘the Innovation Team Project of the Educational Department of Liaoning Province’ grant number LJ222410148036.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

LLM	Large Language Mode
DQN	Deep Q-Network
Duel-DQN	Dueling Deep Q-Network
DSP	Disassembly Sequence Problem
DLBP	Disassembly Line Balancing Problem
HDLBP	Hybrid Disassembly Line Balancing Problem

References

Guo, X.W.; Zhou, M.C.; Abusorrah, A.; Alsokhiry, F.; Sedraoui, K. Disassembly sequence planning: A survey. IEEE/CAA J. Autom. Sinica 2021, 8, 1308–1324. [Google Scholar] [CrossRef]
Güler, E.; Kalayci, C.B.; Ilgin, M.A.; Özceylan, E.; Güngör, A. Advances in partial disassembly line balancing: A state-of-the-art review. Comput. Ind. Eng. 2024, 188, 109898. [Google Scholar] [CrossRef]
Wang, K.; Li, X.; Gao, L.; Garg, A. Partial disassembly line balancing for energy consumption and profit under uncertainty. Robot. Comput.-Integr. Manuf. 2019, 59, 235–251. [Google Scholar] [CrossRef]
Li, Z.; Janardhanan, M.N. Modelling and solving profit-oriented u- shaped partial disassembly line balancing problem. Expert Syst. Appl. 2021, 183, 115431. [Google Scholar] [CrossRef]
Tang, Y. Learning-based disassembly process planner for uncertainty management. IEEE Trans. Syst. Man-Cybern.-Part Syst. Humans 2008, 39, 134–143. [Google Scholar] [CrossRef]
Zheng, F.; He, J.; Chu, F.; Liu, M. A new distribution-free model for disassembly line balancing problem with stochastic task processing times. Int. J. Prod. Res. 2018, 56, 7341–7353. [Google Scholar] [CrossRef]
Tuncel, E.; Zeid, A.; Kamarthi, S. Solving large scale disassembly line balancing problem with uncertainty using reinforcement learning. J. Intell. Manuf. 2014, 25, 647–659. [Google Scholar] [CrossRef]
Liu, Q.; Liu, Z.; Xu, W.; Tang, Q.; Pham, D.T. Human-robot collaboration in disassembly for sustainable manufacturing. Int. J. Prod. Res. 2019, 57, 4027–4044. [Google Scholar]
Tang, Y.; Zhou, M.; Gao, M. Fuzzy-petri-net-based disassembly plan- ning considering human factors. IEEE Trans. Syst. Man-Cybern.-Part Syst. Hum. 2006, 36, 718–726. [Google Scholar] [CrossRef]
Li, K.; Liu, Q.; Xu, W.; Liu, J.; Zhou, Z.; Feng, H. Sequence planning considering human fatigue for human-robot collaboration in disassembly. Procedia CIRP 2019, 83, 95–104. [Google Scholar] [CrossRef]
Guo, X.; Wei, T.; Wang, J.; Liu, S.; Qin, S.; Qi, L. Multiobjective u-shaped disassembly line balancing problem considering human fatigue index and an efficient solution. IEEE Trans.-Comput. Soc. Syst. 2023, 10, 2061–2073. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Braberman, V.A.; Bonomo-Braberman, F.; Charalambous, Y.; Colonna, J.G.; Cordeiro, L.C.; de Freitas, R. Tasks People Prompt: A Taxonomy of LLM Downstream Tasks in Software Verification and Falsification Approaches. arXiv 2024, arXiv:2404.09384. [Google Scholar]
Liu, S.; Chen, C.; Qu, X.; Tang, K.; Ong, Y.-S. Large language models as evolutionary optimizers. arXiv 2023, arXiv:2310.19046. [Google Scholar]
Guo, X.; Bi, Z.; Wang, J.; Qin, S.; Liu, S.; Qi, L. Reinforcement learning for disassembly system optimization problems: A survey. Int. J. Netw. Dyn. Intell. 2023, 2, 1–14. [Google Scholar] [CrossRef]
Mete, S.; Serin, F. A reinforcement learning approach for disassembly line balancing problem. In Proceedings of the 2021 International Conference on Information Technology (ICIT), Amman, Jordan, 14–15 July 2021; pp. 424–427. [Google Scholar] [CrossRef]
Zhang, R.; Lv, Q.; Li, J.; Bao, J.; Liu, T.; Liu, S. A reinforcement learning method for human-robot collaboration in assembly tasks. Robot.-Comput.-Integr. Manuf. 2022, 73, 102227. [Google Scholar] [CrossRef]
Mei, K.; Fang, Y. Multi-robotic disassembly line balancing using deep reinforcement learning. In Proceedings of the International Manufacturing Science and Engineering Conference, Virtual, 21–25 June 2021; American Society of Mechanical Engineers (ASME): New York, NY, USA, 2021; Volume 85079. [Google Scholar]
Liu, Z.; Liu, Q.; Wang, L.; Xu, W.; Zhou, Z. Task-level decision- making for dynamic and stochastic human-robot collaboration based on dual agents deep reinforcement learning. Int. J. Adv. Manuf. Technol. 2021, 115, 3533–3552. [Google Scholar] [CrossRef]
Zhao, X.; Li, C.; Tang, Y.; Cui, J. Reinforcement learning-based selec- tive disassembly sequence planning for the end-of-life products with structure uncertainty. IEEE Robot. Autom. Lett. 2021, 6, 7807–7814. [Google Scholar] [CrossRef]
Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling network architectures for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1995–2003. [Google Scholar]
Tian, G.D.; Ren, Y.P.; Feng, Y.X.; Zhou, M.C.; Zhang, H.H.; Tan, J.R. Modeling and planning for dual-objective selective disassembly using and/or graph and discrete artificial bee colony. IEEE Trans. Ind. Inform. 2018, 15, 2456–2468. [Google Scholar] [CrossRef]
Wang, J.; Liu, Z.; Zhao, L.; Wu, Z.; Ma, C.; Yu, S.; Dai, H.; Yang, Q.; Liu, Y.; Zhang, S.; et al. Review of large vision models and visual prompt engineering. Meta-Radiology 2023, 1, 100047. [Google Scholar] [CrossRef]
Li, Y. A practical survey on zero-shot prompt design for in-context learning. arXiv 2023, arXiv:2309.13205. [Google Scholar]
Tang, Y.; Zhou, M.C. A systematic approach to design and operation of disassembly lines. IEEE Trans. Autom. Sci. Eng. 2006, 3, 324–329. [Google Scholar] [CrossRef]
Ilgin, M.A.; Gupta, S.M. Recovery of sensor embedded washing machines using a multi-kanban controlled disassembly line. Robot.-Comput.-Integr. Manuf. 2011, 27, 318–334. [Google Scholar] [CrossRef]
Wang, K.P.; Li, X.Y.; Gao, L. A multi-objective discrete flower pollination algorithm for stochastic two-sided partial disassembly line balancing problem. Comput. Ind. Eng. 2019, 130, 634–649. [Google Scholar] [CrossRef]
Wang, K.; Li, X.; Gao, L.; Li, P.; Sutherland, J.W. A discrete artificial bee colony algorithm for multiobjective disassembly line balancing of end-of-life products. IEEE Trans. Cybern. 2022, 52, 7415–7426. [Google Scholar] [CrossRef] [PubMed]
Tunstall, L.; Beeching, E.; Lambert, N.; Rajani, N.; Rasul, K.; Belkada, Y.; Huang, S.; von Werra, L.; Fourrier, C.; Habib, N.; et al. Zephyr: Direct distillation of lm alignment. arXiv 2023, arXiv:2310.16944. [Google Scholar]
Du, Z.; Qian, Y.; Liu, X.; Ding, M.; Qiu, J.; Yang, Z.; Tang, J. Glm: Gen- eral language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; Volume 1, pp. 320–335. [Google Scholar]
Fu, Y.P.; Ma, X.M.; Gao, K.Z.; Li, Z.W.; Dong, H.Y. Multi-objective home health care routing and scheduling with sharing service via a problem-specific knowledge based artificial bee colony algorithm. IEEE Trans. Intell. Transp. Syst. 2024, 25, 1706–1719. [Google Scholar] [CrossRef]
Zhao, Z.; Bian, Z.; Liang, J.; Liu, S.; Zhou, M. Scheduling and Logistics Optimization for Batch Manufacturing Processes With Temperature Constraints and Alternative Thermal Devices. IEEE Trans. Ind. Inform. 2024, 20, 11930–11939. [Google Scholar] [CrossRef]
Zhao, Z.; Li, X.; Liu, S.; Zhou, M.; Yang, X. Multi-Mobile-Robot Transport and Production Integrated System Optimization. IEEE Trans. Autom. Sci. Eng. 2024, 1–12. [Google Scholar] [CrossRef]
Zhao, Z.; Li, S.; Liu, S.; Zhou, M.; Li, X.; Yang, X. Lexicographic Dual-Objective Path Finding in Multi-Agent Systems. IEEE Trans. Autom. Sci. Eng. 2024, 1–11. [Google Scholar] [CrossRef]

Figure 1. An example of a multi-product hybrid disassembly line balancing problem.

Figure 2. Algorithm execution process.

Figure 3. A PC AND/OR graph.

Figure 4. The interaction process between LLM and Duel-DQN.

Figure 5. Model accuracy.

Figure 6. Algorithm comparison.

Table 1. Case description.

Case ID	Product
Case ID	PC	Washing Machine	Radio	Mechanical Module
1	1	1	1	1
2	2	2	3	3
3	5	5	6	5
4	10	10	10	10

Table 2. Adjustment results for ‘temperature’.

Temperature	Does It Meet the Expected Answer?	Notes
0.1	Generally Yes	Always execute Task 13 after executing Task 12
0.2	No	Sometimes LLM forgets ‘or’ relationships
0.3	Yes	Need to inform the end of the task
0.4	Yes	Need to further emphasize task relationships
0.5	Yes	Need multiple rounds of dialogue
0.7	No	Unable to generate fixed format answer
1.0	No	Excessive randomness generated
>1.0	No	Excessive randomness generated

Table 3. Efficiency comparison.

Total Complete Disassembly Seq.	Number of Seq. Used
Total Complete Disassembly Seq.	Duel-DQN	LLM
6	6	6
60	42	60
360	141	104
600	279	127

Table 4. The reward after training.

Case ID	Cost
Case ID	CPLEX	Duel-DQN	Duel-DQN with LLM
1	292	366	315
2	853	1030	926
3	1304	1316	1308
4	1948	1840	1663

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, X.; Jiao, C.; Ji, P.; Wang, J.; Qin, S.; Hu, B.; Qi, L.; Lang, X. Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem. Mathematics 2024, 12, 4000. https://doi.org/10.3390/math12244000

AMA Style

Guo X, Jiao C, Ji P, Wang J, Qin S, Hu B, Qi L, Lang X. Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem. Mathematics. 2024; 12(24):4000. https://doi.org/10.3390/math12244000

Chicago/Turabian Style

Guo, Xiwang, Chi Jiao, Peng Ji, Jiacun Wang, Shujin Qin, Bin Hu, Liang Qi, and Xianming Lang. 2024. "Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem" Mathematics 12, no. 24: 4000. https://doi.org/10.3390/math12244000

APA Style

Guo, X., Jiao, C., Ji, P., Wang, J., Qin, S., Hu, B., Qi, L., & Lang, X. (2024). Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem. Mathematics, 12(24), 4000. https://doi.org/10.3390/math12244000

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem

Abstract

1. Introduction

2. Problem Description

2.1. Problem Statement

2.2. AND/OR Graph

2.3. Mathematical Model

2.3.1. Disassembly Matrix

2.3.2. Precedence Matrix

2.3.3. Incidence Matrix

2.3.4. Constraint Model

3. Proposed Algorithm

3.1. Environment

3.2. Large Language Model

3.2.1. Prompt Engineering

3.2.2. Zero-Shot Prompting

3.2.3. Prompt Generation Algorithm

3.2.4. Combining with Duel-DQN

3.3. Duel-DQN Algorithm

3.3.1. State Design

3.3.2. Action Design

3.3.3. Reward Design

3.3.4. Algorithm Process

4. Experiment and Results

4.1. Test Instances

4.2. Hyperparameter Tuning

4.3. Adjustment of Prompt Words

4.4. Solving Process

4.5. Efficiency Comparison

4.6. Feasibility Analysis

5. Conclusions and Future Research

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI