Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network

Li, Bin; Yang, Caijie; Yang, Zhongzhen

doi:10.3390/jmse11122240

Open AccessArticle

Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network

by

Bin Li

¹

,

Caijie Yang

² and

Zhongzhen Yang

^3,*

¹

School of Mechanical & Automotive Engineering, Fujian University of Technology, Fuzhou 350118, China

²

School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China

³

Donghai Academy, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(12), 2240; https://doi.org/10.3390/jmse11122240

Submission received: 30 October 2023 / Revised: 21 November 2023 / Accepted: 22 November 2023 / Published: 27 November 2023

(This article belongs to the Special Issue Seas of Change: Advancing Sustainable Maritime and Freight Transportation via Decarbonization and Digitalization in Green Shipping Corridors)

Download

Browse Figures

Versions Notes

Abstract

:

In response to the evolving challenges of the integration and combination of multiple container terminal operations under berth water depth constraints, the multi-terminal dynamic and continuous berth allocation problem emerges as a critical issue. Based on computational logistics, the MDC-BAP is formulated to be a unique variant of the classical resource-constrained project scheduling problem, and modeled as a mixed-integer programming model. The modeling objective is to minimize the total dwelling time of linerships in ports. To address this, a Dueling Double DQN-based reinforcement learning algorithm is designed for the multi-terminal dynamic and continuous berth allocation problem A series of computational experiments are executed to validate the algorithm’s effectiveness and its aptitude for multiple terminal joint operation. Specifically, the Dueling Double DQN algorithm boosts the average solution quality by nearly 3.7%, compared to the classical algorithm such as Proximal Policy Optimization, Deep Q Net and Dueling Deep Q Net also have better results in terms of solution quality when benchmarked against the commercial solver CPLEX. Moreover, the performance advantage escalates as the number of ships increases. In addition, the approach enhances the service level at the terminals and slashes operation costs. On the whole, the Dueling Double DQN algorithm shows marked superiority in tackling complicated and large-scale scheduling problems, and provides an efficient, practical solution to MDC-BAP for port operators.

Keywords:

deep reinforcement learning; computational logistics; complex logistics systems oriented neural–physical fusion computation; multi-terminal dynamic and continuous berth allocation problem; resource-constrained project scheduling problem

1. Introduction

With the steady expansion of global trade scales and the widespread adoption of container transportation [1,2,3,4], the rise and subsequent evolution of container technology has profoundly impacted the overarching framework of global logistics. The Production Plan, Task Scheduling, and Resource Allocation (PPTSRA) within the Container Terminal Handling Systems (CTHS) have emerged as focal points and challenges globally. To address these challenges, a myriad of algorithms have been extensively applied to the PPTSRA issues in CTHS, especially the Berth Allocation Problem (BAP) [5,6,7] and related dock operational space allocation and scheduling. In the entire collaborative body of the global supply chain and logistics, the BAP occupies a prominent position at the tactical decision-making and implementation levels [8,9]. The implications of BAP are not limited to the ship’s port dwelling time and operational costs but also involve the terminal’s total throughput capacity, loading and unloading efficiency, equipment scheduling strategies, and its synergistic effects with the broader supply chain [10]. Hence, from various perspectives, such as improving port operational efficiency, reducing port dwelling time, and operational costs, BAP holds profound academic and practical values [11,12].

However, BAP presents a combinatorial optimization challenge accompanied by multiple constraints and numerous objectives, involving various interconnected decision elements. These include ship arrival and departure times, dynamic berth resource allocation, and various other operational constraints [13]. Notably, BAP has been proven to be a Non-deterministic Polynomial Complete (NPC) problem [14], making its related planning, scheduling, and decision-making strategies a continued research focus and challenge within the academic community.

In a single-berth allocation problem, Kim et al. [15] developed an integer programming model to schedule quay berths and quayside cranes, taking into account a multitude of constraints. This approach is divided into two stages: the first stage employs subgradient optimization to determine vessel berth positions, timing, and crane allocations, achieving near-optimal solutions; the second stage, building upon the results of the first, uses dynamic programming to devise detailed crane scheduling. The performance of the algorithm was validated through numerical experiments. Yang et al. investigated the continuous berth allocation challenge under dynamic ship arrival scenarios and formulated an integer linear programming model aiming at minimizing the total dwelling time of ships at the port [16]. Lin et al. took into account both the dwelling time and associated penalty costs, delving into optimization studies for berth allocation strategies at container terminals [17]. Sheikholeslami et al. [18] aimed to minimize the vessels’ delay time of departure and proposed a model to realize the goal by considering tide effects in terminal with discrete berth. Additionally, metaheuristic-based approaches have garnered attention. For instance, Park et al. employed a modified particle swarm optimization algorithm to analyze a two-stage stochastic planning model, thereby achieving robust berth allocation strategies [7]. Zeng and his team conducted an in-depth discussion on integrated operational strategies at container terminals against the backdrop of time-of-use electricity pricing, employing a tailored genetic algorithm for solutions [19]. Song Y et al. systematically [20] studied the berth allocation problem under varying water depth conditions. They adjusted the priority of each ship using a weighting strategy, basing their research on a mathematical model that sought to minimize the total weighted service time.

However, with the continuous flourishing development of the global shipping industry and the increasing mergers and acquisitions of terminals by large shipping companies or groups, the berth allocation mechanism of a single terminal evidently struggles to adapt to the increasingly complex and highly integrated logistics industry landscape [21,22,23,24]. Consequently, the significance of research on the Multi-terminal Dynamic and Continuous Berth Allocation Problem (MDC-BAP) is becoming increasingly prominent. Compared to the traditional single-terminal BAP, MDC-BAP exhibits a noticeable escalation in structural complexity and a challenge in its resolution. Given the distinct strong coupling characteristics and intrinsic high complexity of MDC-BAP, research on this topic remains relatively nascent. Hendriks [25] and Xu et al. [26] delved into the joint berth allocation within container hub ports, offering invaluable theoretical insights and practical strategies for this domain. Li Bin et al. [27], from a novel perspective, transformed the MDC-BAP problem into a heterogeneous multi-knapsack problem for optimization modeling. They subsequently constructed a mixed-integer programming model aimed at minimizing the total costs for both ports and shipping entities. Following this, they introduced a two-stage imperial competition algorithm, fusing computational logistics with swarm intelligence characteristics to address the problem.

Although in recent years many scholars have employed various intelligent optimization algorithms for the specific BAP problem and achieved notable algorithmic performance results, when facing large-scale, highly dynamic, and highly uncertain actual production environments, their optimization efficiency and accuracy often face significant challenges. Especially in the PPTSRA scenarios of the CTHS, the application of computational intelligence faces evident bottlenecks in terms of universality, robustness, agility, portability, and scalability (GRAPE). This limitation is even more pronounced in the Heterogeneous Container Terminal Cluster Logistics Generalized Computation Systems (HCTC-LGCS) [27]. However, with the rapid development of artificial intelligence and machine learning, they have been widely applied in the PPTSRA of container terminals [28,29,30,31]. Notably, Reinforcement Learning (RL) provides a novel computational approach to NPC and resource-constrained scheduling dilemmas [32,33,34,35,36]. Compared to traditional optimization algorithms and heuristic strategies, reinforcement learning does not rely on rules specific to a problem or predefined objective functions. Instead, it learns decision policies adaptively through continuous interaction with the environment, achieving optimal or near-optimal solutions. This characteristic endows RL with clear superiority when dealing with dynamism, uncertainty, and numerous complex constraints [37,38,39]. For instance, when confronted with resource-constrained scheduling issues, reinforcement learning possesses online learning and adaptive capabilities, allowing it to flexibly balance potentially conflicting objectives and formulate efficient scheduling strategies while adhering to operational constraints.

Moreover, concerning MDC-BAP, its computational complexity notably rises exponentially with the increase in the number of terminals and vessels. Specifically, for a single terminal with

n

vessels,

p

berth points, and a time scale of

t

, its computational complexity is

O (n * p * t)

. When extended to

m

quays, the complexity becomes

O (m * n * p * t)

, revealing a clear exponential growth in problem-solving complexity as the number of ships and terminals increase. Traditional heuristic algorithms struggle to find quality solutions for such high-dimensional problems. Additionally, benefiting from its ability to learn from accumulated experience in historical data, RL exhibits excellent generalization performance when dealing with unseen but structurally similar problems [40,41,42,43]. Therefore, integrating reinforcement learning into the research on Continuous, Linked, and Optimized Berth Allocation under the HCTC-LGCS not only offers efficient optimization strategies for MDC-BAP but also presents a potential pathway to address the challenges of universality, robustness, agility, portability, and scalability in computational intelligence applied to PPTSRA issues.

In this paper, we conducted an in-depth modeling of the MDC-BAP problem, taking into account the water depth constraints of the terminal. To precisely address its inherent high coupling, dynamic characteristics, and complexity, we chose the D3QN (Dueling Double DQN) reinforcement learning algorithm as our solution strategy. After a series of numerical experiments, the effectiveness and feasibility of this algorithm in dealing with this problem were verified. The remaining structure of the article is as follows: Section 2 elaborates on the operations research abstraction and mathematical modeling for multi-dock berth allocation; Section 3 delves into the D3QN algorithm and its network structure; Section 4 empirically demonstrates the advantages of the D3QN algorithm through comparative analysis of experimental data; Section 5 inductively summarizes the entire article through in-depth discussions and looks forward to future research directions.

2. Problem Statement and Model Formulation

2.1. Multi-Container Terminal Berth Allocation for Computational Logistics

Under the current computational logistics theoretical framework, multiple container terminals located along the coast and belonging to the same organizational entity can be considered as a high-level computational node cluster in the global supply-chain environment, referred to as Heterogeneous Container Terminal Cluster Logistics Generalized Computation Systems (HCTC-LGCS). This concept further interprets terminal operations as a generalized decision-making and scheduling challenge involving “Computation (Computation), Memory (Memory), and Switching (Switching)” (abbreviated as CMS) [17]. Within this framework, MDC-BAP is a pivotal component. Specifically, from the perspective of computational logistics, the Multi-Terminal Dynamic and Continuous Berth Allocation Problem (MDC-BAP) can be seen as a specific variant of the Resource-Constrained Project Scheduling Problem (RCPSP). This problem possesses pronounced non-linear characteristics and deep coupling, leading to its extremely high computational complexity.

The Resource-Constrained Project Scheduling Problem (RCPSP) is a classical optimization problem in project management, which focuses on how to allocate resources to project activities under limited resource constraints in order to achieve specific objectives, such as the shortest project completion time [44,45]. At any given point in time in RCPSP, the amount of each resource available is limited, and each activity has a predetermined duration and a need for one or more resources. These resources may include manpower, machinery, materials, etc. The heart of the problem is how to schedule the start of activities to ensure that, for each resource, the demand for that resource does not exceed the amount of that resource available throughout the duration of the project, taking into account the tight front-loading relation vessel between activities. This means that some activities must be completed before others. RCPSP is a typical NPC problem, which means that as the number of activities and resources increases, the search for an optimal solution in the problem space is exponentially more difficult [46,47,48,49].

In order to describe RCPSP, researchers often use Activity-On-Arrow (AOA) directed graphs to represent it. Figure 1 shows a simple example to describe the whole scheduling process. In this case, consider a renewable resource with availability 4. Each activity is represented by a circle, where the upper number indicates the most probable duration of the activity and the lower number indicates the activity’s demand for the resource. The arrow lines indicate the tight forward relation vessel between activities. For example, Activity 1 most likely has a duration of 3 and a resource requirement of 2. The immediate preceding activity is Activity 0, and the immediate following activity is Activity 6. There are eight activities in the project, of which Activity 0 and Activity 7 are virtual activities representing the beginning and end of the project, respectively, and they do not require resources and consume time. With this representation, the sequence of execution between tasks or activities becomes intuitive.

Further research found that MDC-BAP and RCPSP share similarities in several aspects. Specifically, every vessel awaiting assignment can be considered as an activity, and every available berth point at a container terminal is regarded as a limited resource. The operation time of a vessel at a container terminal equates to the execution duration of an activity, and the waiting relation vessels between vessels due to constraints form dependencies between tasks.

Based on the computational logistics, MDC-BAP can be abstracted as a special kind of RCPSP, which is called the Heterogeneous and Reconfigurable Resource Pool Constrained Project Scheduling Problem (HRRP-CPSP). Compared with the classical RCPSP, MDC-BAP not only needs to consider the arrival order of vessels and the quantity constraints of container terminal resources, but also needs to synthesize the multi-dimensional factors such as the size of the vessels, berthing requirements, and operation duration. Vessels in MDC-BAP arrive dynamically, leading to berth allocation decisions being dynamically assigned. This presents a stark contrast to the static task scheduling in RCPSP. Meanwhile, MDC-BAP involves not only scheduling in time series but also spatial interaction between vessels and container terminals, which is beyond the consideration of a single resource type in RCPSP. Furthermore, there is no two terminals in the world that are the same, and thus multiple terminal frontal shorelines can be abstracted as a distributed pool of heterogeneous and reconfigurable resources.

The MDC-BAP defined in this paper defines the problem as follows:

In the considered model, there are multiple terminals managed by the same operating entity. Each vessel can berth at only one of these terminals. Within each terminal, a continuous berthing allocation strategy is implemented based on its unique shoreline. Once a container vessel’s berthing position has been determined, its berthing operations are continuous and cannot be interrupted until it has completed its operations and departed from its current berth.
In the dynamic port arrival mechanism of vessels, each container-carrying vessel has a predetermined expected docking terminal before executing the berth allocation algorithm. At the same time, the containers reserved for that vessel will be pre-positioned in the yard. If there’s a discrepancy between the vessel’s actual docking terminal and its expected one, its containers need to undergo a transfer operation.
The transfer of containers must be completed before the vessel arrives at the port, ensuring that no additional time cost is added to the vessel’s berthing loading and unloading operations, and the cost of container transfer is reflected in the transportation cost generated during the transfer process, and other costs incurred are not taken into account.
Each vessel is given a minimum departure time limit. The berth allocation strategy must ensure that the actual departure time of each vessel does not exceed its set minimum departure time.
There are differentiated priority settings between ships. Berth allocations will be sorted according to these priorities to ensure that ships are allocated accordingly in their order of priority.
For the same terminal shoreline, container vessels operating at the same time point must not overlap in both time and spatial dimensions. Adjacent vessels need to maintain a prescribed safety interval, determined based on fifteen percent of the vessel’s length proportion. For the purpose of this calculation, this safety distance has been included in the ship length data in this study.
On physical condition constraints, the water depth at the terminal where each vessel berthed must exceed the vessel’s draft, and the vessel’s length must not exceed the physical length of the dock.
The potential impact of force majeure and contingencies on the efficiency of port operations is not considered.

2.2. Notation

All notations of this paper are described in alphabetic order.

Notation
Sets
$Q$	Set of all docks, $Q = {1, \dots, \| Q \|}$ , where $\| Q \|$ represents the total number of docks;
$V$	Set of incoming vessels during a planning period, where $\| V \|$ represents the total number of vessels;
Model-Related Notations
$A_{i}$	Moment of vessel $i$ arrival, $\forall i \in V$
$B_{q}$	Number of available berths in quay $q$ , $\forall q \in Q$ ;
$D_{i}$	Latest departure time for vessel $i$ , $\forall i \in V$ ;
$L_{q}$	Length of the shoreline of quay $q$ , $\forall q \in Q$ ;
$N_{q}$	Number of berthing points in quay $q$ , $\forall q \in Q$ ;
$M$	A sufficiently large positive integer
$P r_{i}$	Time required for vessel $i$ to pre-prepare containers, $\forall i \in V$
$P_{i q}$	If the vessel $i$ is berthed in predetermined quay, then $P_{i q} = 1$ ; otherwise $P_{i q} = 0$ , $\forall i \in V$ , $\forall q \in Q$
$Q_{i}^{0}$	Quays that can accommodate the draft of vessel $i$ , $\forall i \in V$ , $Q_{i}^{0} \in Q$ ;
$W_{i}$	Weight considering the priority of vessel $i$ , $\forall i \in V$ ;
$b_{i}$	Loading and unloading time of vessel $i$ , $\forall i \in V$ ;
$c_{q}^{0}$	Number of gantry cranes at quay $q$ , $q \in Q$ ;
$c_{i}^{m i n}$	Minimum number of gantry cranes that can be allocated to vessel $i$ , $\forall i \in V$ ;
$c_{i}^{m a x}$	Maximum number of gantry cranes that can be allocated to vessel $i$ , $\forall i \in V$ ;
$d_{q}^{0}$	Water depth of quay $q$ , $\forall q \in Q$
$d_{i}^{1}$	Draft of the vessel $i$ , $\forall i \in V$ ;
$l_{i}$	Length of vessel $i$ , $\forall i \in V$ ;
$p_{i}$	The quay where vessel $i$ plans to berth, $\forall i \in V$ ;
$x_{i}$	Time taken by vessel $i$ from anchorage to berth, $\forall i \in V$ ;
$x_{i}^{0}$	Time taken by vessel $i$ from berth to anchorage, $\forall i \in V$ ;
$y_{i}$	Preparation operation time required for vessel $i$ , $\forall i \in V$ ;
$y_{i}^{0}$	Time required for vessel $i$ to clean up after loading and unloading, $\forall i \in V$ ;
Decision variables
$A_{i}^{0}$	Time of vessel $i$ reaches the docking spot, $\forall i \in V$
$C_{i}$	Departure time of the vessel $i$ from the berthing point, $\forall i \in V$
$E_{i}$	Departure moment of vessel $i$ from the port, $\forall i \in V$
$S_{i}$	Moment when vessel $i$ starts berthing, $\forall i \in V$
$T_{i}^{w}$	Waiting time of vessel $i$ , $\forall i \in V$
$T_{q q^{'}}$	Duration needed to move from the expected quay $q$ to the actual quay $q^{'}$ , if $q = q^{'}$ , $T_{q q^{'}} = 0$ , $\forall q, q^{'} \in Q$
$Z_{i j q}$	If vessel $i$ and vessel $j$ dock at the same quay, $Z_{i j q} = 1$ ; otherwise $Z_{i j q} = 0$ , $\forall i, j \in V, i \neq j, \forall q \in Q$
$Z_{i j}^{0}$	If vessel $j$ starts berthing after vessel $i$ departs the berthing point, $Z_{i j}^{0} = 1$ ; otherwise $Z_{i j}^{0} = 0$ , $\forall i, j \in V, i \neq j$
$b_{i q}$	If vessel $i$ docks at quay $q$ $b_{i q} = 1$ ; otherwise $b_{i q} = 0$ , $\forall i \in V$ , $\forall q \in Q$
$c_{i}^{t}$	Number of quay cranes required by vessel $i$ at moment $t$
$e_{i j}$	If vessel $i$ and vessel $j$ dock at the same quay and vessel $j$ is berthed to the right of vessel $i$ $e_{i j} = 1$ ; otherwise $e_{i j} = 0$ , $\forall i, j \in V, i \neq j$
$σ_{i j}$	If vessel $j$ starts berthing after vessel $i$ departs the berth, and both vessels dock at the same quay $σ_{i j} = 1$ ; otherwise $σ_{i j} = 0$ $\forall i, j \in V, i \neq j$

2.3. Mathematical Model

The objective function (1) aims to minimize the time cost of all ships in port, which is the sum of their port stay duration, transit time and pre-storage time. It is noteworthy that in previous research [27], to achieve valid solutions, researchers often allowed vessels to exceed their latest departure time and only computed corresponding extension costs. However, in this study, it was explicitly stipulated that the departure time of vessels must not exceed their predetermined latest departure time, undoubtedly increasing the complexity of solving the problem.

m i n \sum_{i \in V}^{} \sum_{q \in Q}^{} \sum_{q^{'} \in Q}^{} W_{i} (E_{i} - A_{i} + T_{q q^{'}} + p r_{i}) \forall i, j \in V, i \neq j, \forall q, q^{'} \in Q

(1)

In the modeling study of the vessel berthing process, a crucial premise is that the start time of a ship’s berthing must strictly follow its arrival at the port. Based on this premise, Constraint (2) is established to ensure that this condition is adhered to. Furthermore, Constraint (3) is introduced as a quantitative expression of the waiting time for ships. According to Constraint (3), the waiting time of a ship is defined as the time when the ship begins berthing minus the time of the ship’s arrival at the port.

S_{i} \geq A_{i}, \forall i \in V

(2)

T_{i}^{w} = S_{i} - A_{i}, \forall i \in V

(3)

Constraint (4) is utilized to define the time of vessel

i

arrival at the berth point. According to this constraint, the arrival time at the berth point is calculated as the sum of the time when the ship begins berthing, the time taken to travel from the berth point to the anchorage, and the time required for preparatory operations. This expression ensures a comprehensive consideration of the ship’s transit time. Furthermore, Constraint (5) stipulates the uniqueness of ship berthing. This constraint guarantees that each ship can be docked at only one designated dock at any given time. This not only reflects the physical limitations in actual operations but also ensures the logical consistency of the model and its feasibility in practical application.

A_{i}^{0} \geq S_{i} + x_{i} + y_{i}, \forall i \in V

(4)

\sum_{q \in Q} b_{i q} = 1, \forall i \in V

(5)

In the mathematical model for vessel berth allocation, the allocation of quay crane resources is a key factor. Each dock possesses a specific number of quay cranes, and this resource limitation is precisely reflected in Constraints (6)–(8). These constraints dictate the number of quay cranes that can be allocated to a ship, ensuring that at any given time, the total number of quay cranes allocated to a ship does not exceed the total number available at the dock where it is berthed

c_{i}^{t} \geq c_{i}^{m i n}, i \in V

(6)

c_{i}^{t} \leq c_{i}^{m a x}, \forall i \in V

(7)

\sum_{i \in V q \in Q} c_{i}^{t} b_{i q} \leq c_{q}^{0}, \forall i \in V, \forall q \in Q

(8)

In the multi-dock berth allocation problem, ensuring that the allocation between two vessels adheres to physical constraints is crucial. Specifically, Constraints (9)–(17) are established to ensure that two vessels berthed at the same quay do not overlap in time and space. This means that when allocating a berth point to any vessel, it must be ensured that the berth point is not occupied by another vessel during the allocated time period. The implementation of these constraints in the model ensures the logical and practical feasibility of berth allocation, providing a robust and efficient decision-support tool for port berth management.

S_{i} \geq C_{j} + M * (1 - Z_{i j}^{0}) \forall i, j \in V, i \neq j

(9)

Z_{i j q} \geq (b_{i q} - 1) * M + (b_{j q} - 1) * M + 1, \forall i \in V, \forall q \in Q

(10)

σ_{i j} \geq (Z_{i j q} - 1) * M + (Z_{i j}^{0} - 1) * M + 1, \forall i, j \in V, i \neq j, \forall q \in Q

(11)

S_{i} \geq C_{i} - M (1 - σ_{i j}), \forall i, j \in V, i \neq j, Q_{i}^{0} \cap Q_{j}^{0} \neq \emptyset

(12)

e_{i j} \geq 1 - (Z_{i j q} - 1) * M, \forall i, j \in V, i \neq j, \forall q \in Q

(13)

p_{i} \geq l_{i} e_{i j} + p_{j} - M (1 - e_{i j}), \forall i, j \in V, i \neq j, Q_{i}^{0} \cap Q_{j}^{0} \neq \emptyset

(14)

σ_{i j} + σ_{j i} + e_{i j} + e_{j i} \geq Z_{i j q}, \forall i, j \in V, i \neq j, \forall q \in Q

(15)

σ_{i j} + σ_{j i} \leq Z_{i j q}, \forall i, j \in V, i \neq j, \forall q \in Q

(16)

e_{i j} + e_{j i} \leq Z_{i j q}, \forall i, j \in V, i \neq j, \forall q \in Q

(17)

Constraint (18) explicitly stipulates that the docking point of a vessel must be within the quayside length of the dock. This provision ensures that the berthing position of the vessel does not exceed the actual physical boundaries of the dock, thereby maintaining the safety and effectiveness of port operations. Simultaneously, Constraint (19) emphasizes that the dock where a vessel docks must have sufficient water depth to meet the vessel’s draft requirements. The existence of this constraint is to ensure the safe berthing of the vessel at the dock, and also to avoid safety incidents that may be caused by insufficient water depth at the dock.

L_{q} \geq p_{i} * b_{i q} + l_{q}, \forall i \in V, \forall q \in Q

(18)

d_{q}^{0} \geq d_{i}^{1} - M (1 - b_{i q}), \forall i \in V, \forall q \in Q

(19)

In the study of vessel berth allocation and subsequent operations, Constraints (20)–(22) define the time frame for vessel operations at the berth. Specifically, Constraint (20) dictates that the time a vessel leaves the berth point is the sum of its arrival time at the berth point plus the required time for cleaning and loading or unloading containers. Furthermore, Constraint (21) defines the departure time of the vessel, which is the time it leaves the berth point plus the time taken to travel from the berth point to the anchorage. This definition reflects the entire time process of a vessel from completing loading/unloading operations to actually leaving the port. Finally, Constraint (22) imposes a limit on the departure time of the vessel, ensuring it does not exceed the agreed latest departure time. This constraint is set to maintain the orderliness of port operations and the reliability of vessel operational schedules, while also considering the contractual obligations between shipping companies and port administrators.

C_{i} \geq A_{i}^{0} + y_{i}^{0} + b_{i}, \forall i \in V

(20)

E_{i} \geq C_{i} + x_{i}^{'}, \forall i \in V

(21)

E_{i} \leq D_{i}, \forall i \in V

(22)

Finally, Constraints (23)–(26) represent the domain definition of the decision variables.

σ_{i j}, e_{i j} \in {0, 1}, \forall i, j \in V, i \neq j, Q_{i}^{0} \cap Q_{j}^{0} \neq \emptyset

(23)

b_{i q} \in {0, 1}, \forall i \in V, \forall q \in Q_{i}

(24)

Z_{i j q} \in {0, 1} \forall i, j \in V, i \neq j, \forall q \in Q

(25)

Z_{i j}^{0} \in {0, 1} \forall i, j \in V, i \neq j

(26)

3. Solutions of the Formulated Problem

In this section, the MDC-BAP is discussed from the perspective of reinforcement learning, and subsequently we introduce a solution method based on D3QN. To evaluate this method, we also list four other solution strategies as references, including CPLEX, PPO, DQN, and Dueling DQN.

3.1. Reinforcement Learning

Reinforcement learning focuses on strategies where an agent interacts with the environment to optimize long-term rewards; its theoretical framework is based on Markov Decision Processes (MDP). Unlike supervised learning, which is based on input–output pairings, reinforcement learning determines the best strategy through exploration and exploitation in unknown environments, and it is grounded on Markov Decision Processes. Key components include states

S

, actions

A

, and rewards

R

. The goal is to determine an action for each state, to maximize cumulative rewards while considering the trade-off between immediate and long-term rewards. The State Value Function

V (s_{t})

describes the expected cumulative reward in a given state s t and follows the expected cumulative reward under a particular strategy, which is defined as follows:

V (s_{t}) = E [\sum_{t = 0}^{\infty} γ^{t} R_{t} | S_{t} = s_{t}]

(27)

The function of action value

Q (s_{t}, a_{t})

indicates the anticipated reward for executing an action in a specific state and then pursuing a strategy, which is defined as follows:

Q (s_{t}, a_{t}) = E [\sum_{t = 0}^{\infty} γ^{t} R_{t} | S_{t} = s_{t}, A_{t} = a_{t}]

(28)

3.2. Construction of Partially Observable Markov Decision Models

To determine the optimal berthing strategy, this study formulates the MDC-BAP as a Markov Decision Process (MDP). Considering the data based only on partial observations of the vessels awaiting berthing and the status of the quay, this decision-making problem can be viewed as a Partially Observable Markov Decision Process (POMDP). In this construction, the three core elements are the state space, action space, and reward function. When making berth assignments, the agent receives the current state

s_{t}

and produces the corresponding berthing decision

a_{t}

. Subsequently, the environment provides feedback in the form of a reward

r_{t}

. By continuously iterating this process and accumulating a large amount of experiential data, a data-driven approach is used to update the model, leading to a gradual optimization of the strategy.

3.3. Constructions of State

In the application of deep reinforcement learning, constructing the state space is crucial. An accurate and efficient state space can enhance the performance of the model. When formulating state features, the two key factors are the representativeness of the features and their correlation. To ensure the effectiveness of decision making, using the latest sequence of N continuous observations is crucial and helps neural networks learn strategies more accurately. Although increasing the number of observations in the observation sequence can provide more comprehensive environmental information, excessive information might adversely affect computation speed, thereby affecting the timeliness of decision making. For MDC-BAP, if we choose N to be 10, which means using the first 10 consecutive observations to form the observation sequence, the observed information at time

t

can be divided into three main parts as shown in Table 1; the observation sequence can be represented as

S_{t} = (B_{q}, E, V_{s})

.

The reason why these three parts of the feature capture the core information of the state is that Berth status of the quay

B_{q}

presents the usage status of the dock, providing a foundation for berth allocation decisions. The vessel’s dynamic variables

E

, as a key dynamic factor in the decision-making process, occupies a central position in the berth allocation strategy; the vessel’s static information variable

V_{s}

provides basic constraints for berth allocation decisions, ensuring the feasibility and rationality of the allocation.

3.4. Constructions of Action

The design of the action space in reinforcement learning is crucial. It directly affects the efficiency and effectiveness of the learning algorithm; designing an appropriate action space can not only accelerate learning but also ensures that the final strategy is more reasonable and practical. For complex decision-making problems, especially when involving multiple independent but interrelated choices, the design strategy of combined actions becomes a direction worth exploring. For example, let “Time

t

, Quay 1, Berth Point 1” be represented as

a_{t} = 1

. With this notation, each action corresponds to a specific dock and its berth point combination, which can ensure that the strategy can consider both the dock and berth point when making decisions. The specific definition is as follows:

(1): Berth Decision 1: Choose the $i$ berth point at the $q$ quay:

$a = \sum_{k = 1}^{q - 1} N_{k} + i, 0 < q \leq | Q |, 0 < i \leq N_{q}$

(29)
(2): Berth Decision 2: Wait:

$a = \sum_{q \in Q} N_{q} + 1$

(30)

The action space in berth allocation problems encompasses all potential actions a vessel might take, such as choosing a specific available berth for docking or deciding not to perform a particular action for the moment. When constructing the action space, it is essential to deeply consider environmental constraints, including the berth capacity of each dock, water depth, and other factors, while also taking into account attributes of the vessel, such as its draft and length. An inappropriate design of the action space might hinder the agent from learning effective strategies. Therefore, the meticulous design of the action space should be based on the actual characteristics of the problem and ensure its compatibility with the state space. If the chosen berth doesn’t meet certain constraints, the vessel needs to adopt a waiting behavior.

3.5. Constructions of Reward

The reward function plays a crucial role in evaluating policy performance and guiding policy optimization in reinforcement learning. This study aims to minimize the time vessels spend in the port through this function. Although the design of the reward function can integrate various evaluation criteria, such as economic losses due to waiting times, dock usage efficiency, and transportation costs, it has been observed that the objective function is mainly influenced by waiting times and vessel transfer times, while other time losses can be considered constants. In this study, we selected the negative sum of waiting times and transfer times caused by berthing as the immediate reward for each berth allocation. To ensure the single-time reward was not negative, it was added to a constant

G

to guarantee its non-negativity. The specific definition is as follows:

r_{t} = G - (T_{i}^{w} + T_{q q^{'}}), i \in V, q, q^{'} \in Q

(31)

R = \sum_{t \in T} r_{t}

(32)

3.6. Dueling Double DQN

The Dueling Double DQN (D3QN) algorithm integrates both the Double DQN and Dueling DQN methods. Considering that both Double DQN and Dueling DQN target the intrinsic limitations of DQN to provide enhancements, it is necessary to first briefly review the basic principles of the DQN algorithm to gain a deeper understanding of D3QN’s characteristics and contributions.

The DQN algorithm adopts an off-policy learning mechanism and approximates the Q-value function of Q-learning through neural networks [50,51,52]. This strategy falls under the category of value-based reinforcement learning. It has shown significant advantages in dealing with high-dimensional computation and decision-making problems. Notably, DQN comprises an evaluation Q-network and a target Q-network. The target Q-values of the target Q-network can be calculated using the following formula:

Q_{t \arg e t} = r + γ * (1 - done) * \max_{a^{'}} Q (s^{'}, a'; θ)

(33)

Additionally, the loss function is defined as

L o s s (θ) = {(Q_{t \arg e t} - Q (s, a; θ))}^{2}

(34)

where

γ

denotes the discount factor;

θ

indicates the shared parameters of the network.

d o n e

indicates whether state

s

is a terminal state; it is set to 1 if it is, and 0 otherwise. While DQN excels in various aspects, it still faces the problem of overestimation during the Q-value learning process. To address this dilemma, Double DQN (DDQN) was introduced. Its core concept utilizes two Q-networks: one dedicated to the selection of the next action, while the other is responsible for estimating the Q-value for that action. This separation method significantly reduces the phenomenon of Q-value overestimation. The DDQN’s target Q-values can be calculated using the following formula:

Q_{t \arg e t} = r + γ * (1 - done) * Q (s^{'}, a r g m a x Q (s^{'}, a; θ); θ^{'})

(35)

To more accurately estimate Q-values in deep network structures, Dueling DQN was proposed. The design philosophy of this strategy is based on differentiated considerations of state values and action advantage values. The state–action function can be represented as:

Q (s, a; θ) = V (s; θ) + [A (s, a; θ) - \frac{1}{| A |} \sum_{a^{'}} A (s, a; θ)]

(36)

The state value function

V

is used to estimate the expected return in a given state, while the action advantage function

A

aims to measure the relative advantage of taking a certain action in a specific state compared to the average action. The value

\frac{1}{| A |} \sum_{a^{'}} A (s, a; θ)]

is interpreted as the mean of the action advantage function A, which provides a benchmark for assessing the relative utility of each action.

Meanwhile, the Dueling DQN algorithm uses the same method as DQN to calculate the target Q-value:

Q_{t \arg e t} = r + γ * (1 - done) * \max_{a^{'}} Q (s^{'}, a^{'}; θ)

(37)

The D3QN algorithm integrates the previously mentioned techniques, drawing on their essence to enhance algorithmic performance. With the robustness of DQN, the overestimation correction strategy of DDQN, and the precision of Dueling DQN, D3QN constructs an efficient and stable solution framework for complex reinforcement learning tasks. The basic framework of D3QN is shown in Figure 2 and the iteration and target Q-values of the D3QN algorithm can be expressed as

Q (s, a; θ) = V (s; θ) + [A (s, a; θ) - \frac{1}{| A |} \sum_{a^{'}} A (s, a; θ)]

(38)

Q_{t \arg e t} = r + γ * (1 - done) * Q (s^{'}, a r g m a x Q (s^{'}, a; θ); θ^{'})

(39)

3.7. Network Infrastructure

Data from the current observation time and its preceding nine times were chosen in this research, constituting an observation sequence as the model’s input. The observation sequence at each moment can be divided into three main parts: (1) the vessel’s static information variables

V_{s}

; (2) the berth status of the quay

B_{q}

; (3) and the vessel’s dynamic variables

E

. Considering that both

B_{q}

and

E

represent time series data, an LSTM (Long Short-Term Memory) network was selected for extracting features. Meanwhile,

V_{s}

are transformed through a fully connected layer. After individual processing, the three sets of data will be integrated and fed into a Transformer network for further feature integration. The output part of the network is completed by a fully connected layer to decide on the appropriate action. The specific parameters and settings of the model are detailed in Table 2, respectively, while the structural details of the model are illustrated in Figure 3.

3.8. D3QN Algorithm for MDC-BAP

In the study of the Multi-Dock Berth Allocation Problem (MDC-BAP), this research incorporates more stringent and realistic constraints compared to previous works. Unlike earlier studies that often overlooked the requirement for vessels to depart within a specified time, or only involved scenarios with a limited number of vessels, this research focuses on simulating situations in actual port operations where there are many vessels that need to depart within an agreed timeframe. In real-world port operations, the large number of vessels and the strict requirements for departure times add complexity to the problem. Under such conditions of high vessel numbers and stringent constraints, traditional heuristic algorithms often struggle to generate effective solutions.

To address this issue, the D3QN algorithm is designed to solve MDC-BAP. This algorithm is aimed at tackling the complexity of high-dimensional vessel problems and the dynamism of vessel arrivals, which are often challenging to manage effectively with traditional methods. The core strength of the D3QN algorithm lies in its ability to process and analyze large-scale datasets through deep learning techniques, thereby adapting to changing conditions and emerging patterns. In the context of high-dimensional vessel numbers and strong constraints, the D3QN algorithm not only improves the efficiency of berth allocation but also optimizes berth utilization and minimizes waiting times through learning from historical data. Additionally, the algorithm takes into account the port’s capacity for rapid response to dynamic vessel arrival patterns. Through continuous learning and strategy adjustment, the D3QN algorithm is better equipped to handle the uncertainties and complexities in port operations, providing a flexible and efficient solution for vessel berth allocation.

The core steps of the multi-dock berth allocation method based on the D3QN algorithm are as follows (Algorithm 1).

Algorithm 1: D3QN algorithm for MDC-BAP

Input:
Training iteration times

K

, training sample length

T

, vessel number

V_{n}

;
Initialize the current Q network parameters

θ

, Initialize

Q_{t \arg e t}

network parameters

θ^{'}

;
Q network parameters to

Q_{t \arg e t}

network, θ’ ← θ;
Initialize observation sequence

S_{t}

, decay factor γ, exploration rate ϵ, the target Q network update frequency

c

, replay buffer

R^{0}

with capacity N, min-batch size

M_{b a t c h}

, training steps total_steps = 0;
Initialize arrive vessel list

V_{a r r i v e}

, berthing vessel list

V_{b e r t h}

Output: Final vessels berth allocation table

1. For

k \leftarrow 0

to

K - 1

do

2. Resset the environment, get observation sequence

S_{0}

3. Initialize Reward = 0, ep_reward = 0, t = 0, is_terminal = False, step_redo = False;

4. For

t \leftarrow 0

to

T - 1

do

5. Calculate

V_{a r r i v e}

,

V_{b e r t h}

6. For

i \leftarrow 1

to

l e n g t h (V_{b e r t h})

do

7. If vessel I finish berthing then:

8. Update

S_{t}

, vessel

i

leave berth point;

9. End if

10. End for

11. Sorted

V_{a r r i v e}

by priority

W_{i}

12. For

i \leftarrow 1

to

l e n g t h (V_{n})

do

13. Input

S_{t}

into the current Q network, calculate the Q value corresponding to each action

A_{t}

;

14. Use ϵ-greedy to select vessel

i

’s berth action

a_{t}

from

A_{t}

;

15. Update the next observation sequence

S_{t + 1}

, Get the reward

R_{t}

, and the completion flag is_terminal;

16. Store <

S_{t}

,

a_{t}

,

S_{t + 1}

,

R_{t}

, is_terminal> to the replay buffer pool

R^{0}

;

17. If total_steps > 200 then

18. Take the min-batch

M_{b a t c h}

in the replay buffer pool

R^{0}

to learning;

19. Update

Q_{t \arg e t}

by Formula (39)

20. Use Formula (34) to calculate the squared error loss and then perform backward gradient propagation to update parameter θ:

21. End if

22. End for

23. End for

24. End for

The algorithm parameter details are as follows (Table 3).

4. Simulation Results and Discussion

In this section, we aim to provide a detailed description of the test dataset constructed for MDC-BAP, and further explore a comparative experimental analysis between the D3QN algorithm and other mainstream algorithms applicable to this problem. All experiments were coded and tested in the PyCharm-integrated environment. The running environment consisted of an AMD Ryzen 5 3600 6-Core [email protected] (AMD, Santa Clara, CA, USA), 128 GB RAM, and a 64-bit Windows 10 operating system. The programming language used was Python, specifically Python 3.8.

4.1. Introduction to Port Production Services

The research context of this paper is based on the joint operations of Quay A and Quay B, managed and operated by a certain port group limited company on the southeast coast (hereinafter referred to as the port group). Test instances are generated based on the actual arrival of vessels to analyze the performance of the models and algorithms designed in this paper. The port group centrally manages and coordinates both Quay A and Quay B. Quay B has a shoreline length of 2800 m with a maximum natural water depth of 17 m, while Quay A has a shoreline length of 1500 m with a water depth of 14 m. The transshipment of containers can be achieved between these two terminals. Based on the water depth of the berth, the terminals are classified into two types: those with a depth of 14 m or less are determined as Type 1 terminals, and those with a depth of 14 m or more are determined as Type 2 terminals. Hence, Quay A belongs to Type 1, while Quay B belongs to Type 2. Table 4 displays the relevant data for the terminals, where the container loading and unloading efficiency of a single gantry crane is uniformly set at 72 standard boxes (Twenty-foot Equivalent Unit, TEU) per hour.

4.2. Introduction to Container Vessel Information

Quay A and Quay B mainly serve medium to large container mainline and feeder vessels. According to reference [53] on the core attributes of container vessels, 12 types of container vessel-related information were randomly generated for the calling ports. The related information for these vessels includes a vessel’s length, draft, cargo load, container loading and unloading ratio, vessel contribution at the port, operation time, anchorage to berth time (berthing time), vessel operation preparation time, vessel clearance operation time, berth to channel time (departure time), the maximum time a vessel can stay in port, minimum quay crane allocation, maximum quay crane allocation, and priority.

These container vessels arrive at the port according to the container liner schedule set by the shipping company, and the estimated arrival time matches the actual arrival time of the vessel. The vessel’s berthing operation time is directly related to the import and export container volume and the current number of quay cranes allocated to the vessel. According to the principle of quay crane operations, unloading is performed before loading. Each vessel has a desired berthing terminal, and the containers waiting for loading are stored in the container yard of the desired terminal, while containers needing transshipment are quickly transported to the actual berthing terminal after the vessel’s berthing plan is determined. According to the terminal’s depth requirements, the first six types of vessels can berth at both Terminal A and Terminal B and are transferable-type container vessels with a maximum port stay of 45~48 h. The latter six types of vessels can only berth at Terminal B, with a maximum stay of 48~50 h. The latest departure time for a vessel is its expected arrival time plus its maximum port stay time. The maximum safety distance required for a container ship terminal shoreline is 5% of the vessel’s length. Thus, the berthing distance between two vessels on a quay crane should be no less than the maximum safety distance required by both vessels (for calculation convenience, the vessel’s length in this study already includes the safety distance).

Based on the actual conditions of the dock and the definitions of the berthing vessels, this paper generated test cases according to the following rules: the dock’s front shoreline was divided into units of 10 m; a planning cycle of one week (a total of 168 h) was selected, with a time scale of 15 min; and within the planning period, based on different scale cases (LE1~LE4), four instances were produced, and each instance led to the creation of five test cases, thus resulting in a total of 20 test cases. The detailed description of the cases is shown in Table 5.

4.3. Experimental Verification and Result Analysis

To validate the efficacy of the D3QN algorithm in MDC-BAP, experiments were conducted on all instances presented in Table 5. Table 6 and Figure 4, Figure 5, Figure 6 and Figure 7 show cases in comparison in MDC-BAP by using CPLEX, Proximal Policy Optimization (PPO), DQN, Dueling DQN, and D3QN algorithms in different instances. In the current experiment, the time cost of vessels in the port was used as the evaluation metric. For CPLEX, the maximum set solution time was 54,000 s (equivalent to 15 h), with a minimum gap value of 5%.

Through a meticulous analysis of Table 6, we systematically evaluated the performance of the D3QN algorithm in multi-dock berth allocation tasks against other mainstream methods. As a commonly used deterministic optimization method, CPLEX is often regarded as the gold standard for such problems, capable of outputting optimal or near-optimal solutions. Hence, this paper used the CPLEX solution as the benchmark for evaluation.

When the number of vessels was 85, the D3QN algorithm was optimized by approximately 0.45% compared to CPLEX. When the vessel count reached 90, the performance of the D3QN algorithm increased to 2.98% compared to CPLEX. This suggests that the D3QN algorithm has a significant performance advantage for medium-sized problems. Upon examining the scenarios with 95 and 100 vessels, the optimization benefits of D3QN over CPLEX became even more pronounced, reaching 5.63% and 5.89%, respectively. This further validates the exceptional performance of the D3QN in handling large-scale problems. Delving deeper, from a micro-perspective, D3QN not only surpasses other deep learning methods in most scenarios; notably, it was observed that as the number of vessels increased, the improvement of the D3QN algorithm became even more apparent compared to the DQN algorithm and the Dueling DQN algorithm. This also further indicates that the D3QN algorithm can compensate for the shortcomings of DQN and Dueling DQN, laying a solid theoretical foundation for the practical application of the D3QN algorithm in multi-container dock berth allocation tasks. Figure 8, Figure 9, Figure 10 and Figure 11 illustrate the berth allocation scheduling charts generated by the D3QN algorithm when solving for four different vessel quantity instances., which can intuitively present the final berth state of the MDC-BAP.

As shown in Figure 12, when evaluating the average computational time cost, it was evident that the computation time of reinforcement learning algorithms was significantly less than CPLEX. Specifically, the difference in computation time among algorithm D3QN, DQN, and Dueling DQN was not significant. Among them, the PPO algorithm stood out due to its unique online learning update strategy, which is that the PPO algorithm updates the model only after all the vessel berths are completed. However, this online learning strategy has shown certain limitations in the multi-dock berth allocation problem. Specifically, it might lead to vessels exceeding the maximum allowed time constraint in the port, which in turn may result in an inability to find feasible solutions that satisfy all constraints in certain scenarios.

To delve deeper into the convergence properties of the D3QN algorithm on a specific model, this study conducted deep learning experiments on a representative set of instances from the LE4 case. The corresponding convergence curves are shown in Figure 13. From these two figures, it can be observed that as the agent continues to learn and iterate, the berth allocation strategy approaches a stable state. While slight fluctuations in results can be observed in the later stages of learning, most of these fluctuations are due to the randomness in task selection. While slight fluctuations in results can be observed in the later stages of learning, most of these fluctuations are due to the randomness in task selection. This further validates its adaptability and robustness in addressing complex problems.

4.4. Generalization Experience

The core objective of reinforcement learning is to train an agent capable of making efficient decisions in diverse environments. Complementary to this, rigorously evaluating the model’s generalization ability is a crucial step in assessing its true value. This is distinctly different from traditional supervised learning: generalization in reinforcement learning implies that the agent should not only excel in environments it has been trained in but also maintain its decision-making efficiency in unseen, perhaps slightly altered or more complex environments. To validate this core capability, this study specifically designed 10 different test cases for ships with the same number of berths, aiming to delve into the generalization performance of the D3QN algorithm in varying environments. Detailed experimental results and data can be found in Table 7.

When applying the trained network model to test instances, we recorded the performance metrics in detail and made an in-depth comparison with the CPLEX method to assess the generalization ability of the algorithm. Table 7 reveals that when the number of ships is relatively limited (e.g., 85 ships), the performance of the algorithm is not very satisfactory, with only two sets of solutions surpassing the output of CPLEX, and one set failing to obtain a valid solution. However, as the number of ships increased, the reinforcement learning algorithm demonstrated excellent berth allocation capabilities across multiple test instances. Notably, since the D3QN algorithm does not require retraining of the model, its computational cost is significantly lower than CPLEX. In contrast, CPLEX requires a complete recomputation process when faced with new cases.

5. Conclusions

This study conducted a comprehensive and systematic investigation into the Multi-Dock Berth Allocation Problem (MDC-BAP), with the objective of optimizing berth allocation strategies by minimizing the dwell time of vessels at the port. Given the limitations of traditional heuristic algorithms in handling high-dimensional vessel issues and under stringent constraints, this research adopted a deep reinforcement learning approach—the Dueling Double DQN (D3QN) algorithm. The adoption of this algorithm primarily aimed to address the high-dimensional challenges posed by the dynamic arrival of vessels, particularly in complex scenarios where traditional methods struggle. Through comparative experiments with the commercial optimization tool CPLEX and other reinforcement learning algorithms (such as DQN and Dueling DQN), the D3QN algorithm demonstrated significant advantages in handling MDC-BAP. Compared to CPLEX, the D3QN algorithm achieved notable success in reducing the dwell time of vessels in port and also displayed superiority in terms of computational time costs. This finding holds significant practical implications for enhancing port operational efficiency and alleviating port congestion. Compared to the DQN and Dueling DQN algorithms, the D3QN algorithm exhibited superior performance, especially in addressing issues of overestimated Q-values and insufficient precision in Q-value estimation. Through its unique dual learning structure and optimized strategies, the D3QN algorithm effectively avoided overestimation of Q-values, thereby enhancing the accuracy and reliability of decision making, which also confirms the effectiveness of the D3QN algorithm in solving high-dimensional challenges associated with the MDC-BAP.

Regarding future research directions, these include further optimization of the reward function, exploration of more strategic vessel scheduling schemes, and implementation of joint optimization of multiple dock berth allocations and scheduling. These measures will provide more comprehensive and integrated solutions for MDC-BAP, further enhancing the efficiency and effectiveness of port operations.

Author Contributions

Conceptualization, B.L. and Z.Y.; Methodology, B.L. and C.Y.; Software, B.L. and C.Y.; Validation, B.L. and Z.Y.; Formal analysis, B.L., C.Y. and Z.Y.; Investigation, B.L. and Z.Y.; Resources, B.L. and Z.Y.; Data curation, B.L. and C.Y.; Writing – original draft, B.L., C.Y. and Z.Y.; Writing – review & editing, B.L. and Z.Y.; Visualization, B.L. and C.Y.; Supervision, B.L. and Z.Y.; Project administration, B.L. and Z.Y.; Funding acquisition, B.L. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Humanities and Social Science Programming Foundation of Ministry of Education in China (19YJA630031) and the National Natural Science Foundation of China (72072097).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lee, P.T.-W.; Zhang, Q.; Suthiwartnarueput, K.; Zhang, D.; Yang, Z. Research Trends in Belt and Road Initiative Studies on Logistics, Supply Chains, and Transportation Sector. Int. J. Logist. Res. Appl. 2020, 23, 525–543. [Google Scholar] [CrossRef]
Notteboom, T.; Rodrigue, J.P. Containerisation, Box Logistics and Global Supply Chains: The Integration of Ports and Liner Shipping Networks. Marit. Econ. Logist. 2008, 10, 152–174. [Google Scholar] [CrossRef]
Kim, K.H.; Haralambides, H. Smart Operations Planning in Container Terminals: Integrating Algorithms with Our Practical Knowledge Base. Marit. Econ. Logist. 2021, 23, 1–3. [Google Scholar] [CrossRef]
Ishii, M.; Lee, P.T.-W.; Tezuka, K.; Chang, Y.-T. A Game Theoretical Analysis of Port Competition. Transp. Res. E Logist. Transp. Rev. 2013, 49, 92–106. [Google Scholar] [CrossRef]
Ji, B.; Yuan, X.; Yuan, Y. Modified NSGA-II for solving continuous berth allocation problem: Using multiobjective constraint-handling strategy. IEEE Trans. Cybern. 2017, 47, 2885–2895. [Google Scholar] [CrossRef] [PubMed]
Park, H.J.; Cho, S.W.; Lee, C. Particle swarm optimization algorithm with time buffer insertion for robust berth scheduling. Comput. Ind. Eng. 2021, 160, 107585. [Google Scholar] [CrossRef]
Wang, R.; Ji, F.; Jiang, Y. An adaptive ant colony system based on variable range receding horizon control for berth allocation problem. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21675–21686. [Google Scholar] [CrossRef]
Dulebenets, M.A. A Diffused Memetic Optimizer for reactive berth allocation and scheduling at marine container terminals in response to disruptions. Swarm Evol. Comput. 2023, 80, 101334. [Google Scholar] [CrossRef]
Chang, Y.M.; Xiao-Ning, Z.; Wang, L. Review on integrated scheduling of container terminals. J. Traffic Transp. Eng. 2019, 19, 136–146. [Google Scholar]
Meng, Q.; Weng, J.; Suyi, L. Impact analysis of mega vessels on container terminal operations. Transp. Res. Procedia 2017, 25, 187–204. [Google Scholar] [CrossRef]
Bierwirth, C.; Meisel, F. A follow-up survey of berth allocation and quay crane scheduling problems in container terminals. Eur. J. Oper. Res. 2015, 244, 675–689. [Google Scholar] [CrossRef]
Xiang, X.; Liu, C. An expanded robust optimisation approach for the berth allocation problem considering uncertain operation time. Omega 2021, 103, 102444. [Google Scholar] [CrossRef]
Park, Y.M.; Kim, K.H. A Scheduling Method for Berth and Quay Cranes. OR Spectr. 2003, 25, 1–23. [Google Scholar] [CrossRef]
Guo, L.; Wang, J.; Zheng, J. Berth allocation problem with uncertain vessel handling times considering weather conditions. Comput. Ind. Eng. 2021, 158, 107417. [Google Scholar] [CrossRef]
Kim, K.H.; Park, K.T. A Note on a Dynamic Space-Allocation Method for Outbound Containers. Eur. J. Oper. Res. 2003, 148, 92–101. [Google Scholar] [CrossRef]
Yang, J.M.; Hu, Z.H.; Ding, X.Q.; Luo, J.X. An integer linear programming model for continuous berth allocation problem. In Proceedings of the 2009 International Conference on Information Management, Innovation Management and Industrial Engineering, Xi’an, China, 26–27 December 2009; Volume 4, pp. 74–77. [Google Scholar]
Lin, S.W.; Ting, C.J.; Wu, K.C. Simulated annealing with different vessel assignment strategies for the continuous berth allocation problem. Flex. Serv. Manuf. J. 2018, 30, 740–763. [Google Scholar] [CrossRef]
Sheikholeslami, A.; Mardani, M.; Ayazi, E.; Arefkhani, H. A dynamic and discrete berth allocation problem in container terminals considering tide effects. Iran. J. Sci. Technol. Trans. Civ. Eng. 2020, 44, 369–376. [Google Scholar] [CrossRef]
Chen, S.; Zeng, Q.; Li, Y. Integrated operations planning in highly electrified container terminals considering time-of-use tariffs. Transp. Res. E Logist. Transp. Rev. 2023, 171, 103034. [Google Scholar] [CrossRef]
Song, Y.; Zhang, J.; Liu, M.; Chu, C. The berth allocation optimisation with the consideration of time-varying water depths. Int. J. Prod. Res. 2019, 57, 488–516. [Google Scholar] [CrossRef]
Lee, P.T.W.; Lin, C.W.; Shin, S.H. A Comparative Study on Financial Positions of Shipping Companies in Taiwan and Korea Using Entropy and Grey Relation Analysis. Expert Syst. Appl. 2012, 39, 5649–5657. [Google Scholar] [CrossRef]
Yang, Z.; Ng, A.K.Y.; Lee, P.T.-W.; Wang, T.; Qu, Z.; Rodrigues, V.S.; Pettit, S.; Harris, I.; Zhang, D.; Lau, Y.-Y. Risk and Cost Evaluation of Port Adaptation Measures to Climate Change Impacts. Transp. Res. D Transp. Environ. 2018, 61, 444–458. [Google Scholar] [CrossRef]
Feng, X.; He, Y.; Kim, K.H. Space Planning Considering Congestion in Container Terminal Yards. Transp. Res. B Methodol. 2022, 158, 52–77. [Google Scholar] [CrossRef]
Ducruet, C.; Notteboom, T. The Worldwide Maritime Network of Container Shipping: Spatial Structure and Regional Dynamics. Glob. Netw. 2012, 12, 395–423. [Google Scholar] [CrossRef]
Hendriks, M.P.M.; Armbruster, D.; Laumanns, M.; Lefeber, E.; Udding, J.T. Strategic allocation of cyclically calling vessels for multi-terminal container operators. Flex. Serv. Manuf. J. 2012, 24, 248–273. [Google Scholar] [CrossRef]
Xu, Y.; Du, Y.; Long, L. Berth Scheduling Model and Algorithm for Coordinated Operation of Multiple Container Terminals in a Port. Syst. Eng. 2015, 33, 128–138. [Google Scholar]
Li, B.; Tang, Z. Multi-Container Terminal Berth Collaborative Allocation Based on Computational Logistics and Swarm Intelligence. J. Comput. Eng. Appl. 2023, 59, 262–284. [Google Scholar]
Li, B.; Sun, B.; Yao, W.; He, Y.; Song, G. Container Terminal Oriented Logistics Generalized Computational Complexity. IEEE Access 2019, 7, 94737–94756. [Google Scholar] [CrossRef]
Raeesi, R.; Sahebjamnia, N.; Mansouri, S.A. The synergistic effect of operational research and big data analytics in greening container terminal operations: A review and future directions. Eur. J. Oper. Res. 2023, 310, 943–973. [Google Scholar] [CrossRef]
Filom, S.; Amiri, A.M.; Razavi, S. Applications of machine learning methods in port operations—A systematic literature review. Transp. Res. E Logist. Transp. Rev. 2022, 161, 102722. [Google Scholar] [CrossRef]
Lamii, N.; Fri, M.; Mabrouki, C.; Semma, E. Using Artificial Neural Network Model for Berth Congestion Risk Prediction. IFAC-PapersOnLine 2022, 55, 592–597. [Google Scholar] [CrossRef]
Jin, J.; Ma, M.; Jin, H. Container terminal daily gate in and gate out forecasting using machine learning methods. Transp. Policy 2023, 132, 163–174. [Google Scholar] [CrossRef]
Xiao, Y.; Liu, J.; Wu, J.; Ansari, N. Leveraging deep reinforcement learning for traffic engineering: A survey. IEEE Commun. Surv. Tutor. 2021, 23, 2064–2097. [Google Scholar] [CrossRef]
Zhu, Q.; Wu, X.; Lin, Q.; Ma, L.; Li, J.; Ming, Z.; Chen, J. A survey on evolutionary reinforcement learning algorithms. Neurocomputing 2023, 556, 126628. [Google Scholar] [CrossRef]
Wang, Q.; Tang, C. Deep reinforcement learning for transportation network combinatorial optimization: A survey. Knowl.-Based Syst. 2021, 233, 107526. [Google Scholar] [CrossRef]
Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
Zhang, Y.; Bai, R.; Qu, R.; Tu, C.; Jin, J. A deep reinforcement learning based hyper-heuristic for combinatorial optimisation with uncertainties. Eur. J. Oper. Res. 2022, 300, 418–427. [Google Scholar] [CrossRef]
Guo, S.; Zhang, X.; Du, Y.; Zheng, Y.; Cao, Z. Path planning of coastal ships based on optimized DQN reward function. J. Mar. Sci. Eng. 2021, 9, 210. [Google Scholar] [CrossRef]
Chen, C.; Ma, F.; Xu, X.; Chen, Y.; Wang, J. A novel ship collision avoidance awareness approach for cooperating ships using multi-agent deep reinforcement learning. J. Mar. Sci. Eng. 2021, 9, 1056. [Google Scholar] [CrossRef]
Zhu, Z.; Hu, C.; Zhu, C.; Zhu, Y.; Sheng, Y. An improved dueling deep double-q network based on prioritized experience replay for path planning of unmanned surface vehicles. J. Mar. Sci. Eng. 2021, 9, 1267. [Google Scholar] [CrossRef]
Jędrzejowicz, P.; Ratajczak-Ropel, E. Reinforcement Learning strategies for A-Team solving the Resource-Constrained Project Scheduling Problem. Neurocomputing 2014, 146, 301–307. [Google Scholar] [CrossRef]
Wang, T.; Cheng, W.; Zhang, Y.; Hu, X. Dynamic Selection of Priority Rules Based on Deep Reinforcement Learning for Rescheduling of RCPSP. IFAC-PapersOnLine 2022, 55, 2144–2149. [Google Scholar] [CrossRef]
Peng, W.; Lin, X.; Li, H. Critical chain based Proactive-Reactive scheduling for Resource-Constrained project scheduling under uncertainty. Expert Syst. Appl. 2023, 214, 119188. [Google Scholar] [CrossRef]
Cai, H.; Bian, Y.; Liu, L. Deep reinforcement learning for solving resource constrained project scheduling problems with resource disruptions. Robot. Comput.-Integr. Manuf. 2024, 85, 102628. [Google Scholar] [CrossRef]
Ma, Z.; He, Z.; Wang, N. A genetic algorithm for the proactive resource-constrained project scheduling problem with activity splitting. IEEE Trans. Eng. Manag. 2019, 66, 459–474. [Google Scholar] [CrossRef]
Snauwaert, J.; Vanhoucke, M. A classification and new benchmark instances for the multi-skilled resource-constrained project scheduling problem. Eur. J. Oper. Res. 2023, 307, 1–19. [Google Scholar] [CrossRef]
Pellerin, R.; Perrier, N.; Berthaut, F. A survey of hybrid metaheuristics for the resource-constrained project scheduling problem. Eur. J. Oper. Res. 2020, 280, 395–416. [Google Scholar] [CrossRef]
Hartmann, S.; Briskorn, D. An updated survey of variants and extensions of the resource-constrained project scheduling problem. Eur. J. Oper. Res. 2022, 297, 1–14. [Google Scholar] [CrossRef]
Sánchez, M.G.; Lalla-Ruiz, E.; Gil, A.F.; Castro, C.; Voß, S. Resource-constrained multi-project scheduling problem: A survey. Eur. J. Oper. Res. 2023, 309, 958–976. [Google Scholar] [CrossRef]
Ding, H.; Zhuang, C.; Liu, J. Extensions of the resource-constrained project scheduling problem. Autom. Constr. 2023, 153, 104958. [Google Scholar] [CrossRef]
Farazi, N.P.; Zou, B.; Tulabandhula, T. Dynamic On-Demand Crowdshipping Using Constrained and Heuristics-Embedded Double Dueling Deep Q-Network. Transp. Res. E Logist. Transp. Rev. 2022, 166, 102890. [Google Scholar] [CrossRef]
Tan, L.; Kuang, Z.; Gao, J.; Zhao, L. Energy-Efficient Collaborative Multi-Access Edge Computing via Deep Reinforcement Learning. IEEE Trans. Ind. Inform. 2023, 19, 7689–7699. [Google Scholar] [CrossRef]
Li, B. Hierarchical, parallel, heterogeneous and reconfigurable computation model of container terminal handling system. J. Traffic Transp. Eng. 2019, 19, 136–155. [Google Scholar]

Figure 1. Instance-directed graph of RCPSP.

Figure 2. The structure of DQN.

Figure 3. Structural networks.

Figure 4. Instance LE1 Bar Chart.

Figure 5. Instance LE2 Bar Chart.

Figure 6. Instance LE3 Bar Chart.

Figure 7. Instance LE4 Bar Chart.

Figure 8. Instance LE1 berth allocation scheme of D3QN algorithm.

Figure 9. Instance LE2 berth allocation scheme of D3QN algorithm.

Figure 10. Instance LE3 berth allocation scheme of D3QN algorithm.

Figure 11. Instance LE4 berth allocation scheme of D3QN algorithm.

Figure 12. The average computational time cost of all algorithms.

Figure 13. Corresponding convergence curves of D3QN algorithm.

Table 1. Status information table.

State Information	Container	Explanation
Berth status of the quay $B_{q}$	$B_{q}$	Berth status table of dock $q$ at current and past nine moments, $q \in Q$ .
Vessel’s dynamic variables $E$	$P^{0}$	Berth status of all vessels at time current and past nine moments, $P^{0} \in {0, 1, 2, 3}$ , 0 represents arrival, 1 represents waiting, 2 represents allocated, 3 represents departure.
	$T^{w}$	Waiting time table of all vessels at current and past nine moments;
	$R^{0}$	Remaining work time table for all vessels at current and past nine moments.
Vessel’s static information variables $V_{s}$	$l e n g t h$	Length table for all vessels.
	$d e p t h$	Draft table for all vessels.
	$l o a d i n g$	Loading and unloading duration table for all vessels.

Table 2. Network structure parameters.

Network Infrastructure	Dimension	Activation Function	Descriptions
Input $V_{s}$	$\| V \|$	-	-
Input $B_{q}$	$N_{q}$	-	-
Input $E$	$\| V \|$	-	-
$L s t m$ 1	256	-	Connection Input $B_{q}$
$L s t m$ 2	256	-	Connection Input $E$
Fully connected layer $F C_{1}$	256	Relu	Connection Input $V_{s}$
Fully connected layer $F C_{2}$	512	Relu	Connection Merge Data of $(L s t m 1, L s t m 2, F C_{2})$
$T r a n s o f m e r$	1024	-	Connection $F C_{2}$
output $a_{t}$	$\sum_{q \in Q} N_{q} + 1$	Relu	Connection $T r a n s o f m e r$ , output $a_{t}$

Table 3. Algorithm parameters.

Parameter	Parameter Value
Reward Discount Factor $γ$	0.9
Learning rate $α$	0.0001
Network update step $c$	100
Random sampling probability $ε$	0.01
Replay buffe capacity $R^{0}$	10,000
Min_batch_size $M_{b a t c h}$	128
Number of training rounds $K$	300

Table 4. Basic data of joint operational terminals.

Quay Name	Quay Number	Shoreline Length (m)	Quay Crane Number	Quay Depth (m)	Quay Type
A	1	1500	16	14	1
B	2	2800	29	17	2

Table 5. Instance Description.

Instance	Planning Period	Vessel Number
LE1	A week	85
LE2	A week	90
LE3	A week	95
LE4	A week	100

Table 6. Experiment results.

Vessel Number	Algorithm	Instance
Vessel Number	Algorithm	1	2	3	4	5
85	D3QN	7309	7601	7393	7163	7500
	CPLEX	7241	7334	7860	7192	7538
	PPO	7473	7717	7705	7483	7720
	DQN	7274	7614	7421	7154	7500
	Dueling DQN	7456	7623	7441	7199	7511
90	D3QN	8890	8150	7972	8209	7900
	CPLEX	9130	8665	7789	8557	8280
	PPO	9087	8285	8112	8391	8262
	DQN	8901	8152	7978	8245	7904
	Dueling DQN	8953	8186	8000	8251	7927
95	D3QN	8808	9011	8396	8204	8327
	CPLEX	9697	9416	8845	8682	8682
	PPO	8841	9096	8518	8343	8756
	DQN	8924	9097	8463	8204	8440
	Dueling DQN	8907	9089	8459	8204	8401
100	D3QN	8905	8757	9132	8654	8678
	CPLEX	9343	9086	10,505	8829	9242
	PPO	9353	9092	9328	8808	8834
	DQN	9082	8920	9218	8770	8800
	Dueling DQN	9021	8874	9198	8720	8733

Table 7. Generalization experience.

Instance	Vessel Number	Cplex Algorithm	D3QN Algorithm
Instance	Vessel Number	Total Time Cost	Total Time Cost	Relative Gap
1	85	7241	7687	−6.16%
2	85	7334	7844	−7.10%
3	85	7860	7856	0.05%
4	85	7192	7455	−3.66%
5	85	7538	7889	−4.67%
6	85	7341	7988	−8.81%
7	85	8204	-	-
8	85	7735	7567	2.16%
9	85	7311	7730	−5.73%
10	85	7241	7859	−8.44%
11	90	9130	9099	0.34%
12	90	8665	8530	1.56%
13	90	7789	8297	−6.52%
14	90	8557	8711	−2.24%
15	90	8280	8257	0.28%
16	90	7894	8077	−2.32%
17	90	8649	8298	4.06%
18	90	8313	8307	0.07%
19	90	7706	7621	1.09%
20	90	8199	8532	−4.06%
21	95	9697	9197	5.16%
22	95	9416	9393	0.21%
23	95	8845	8694	1.67%
24	95	8682	8262	4.66%
25	95	8961	8818	1.58%
26	95	9041	8965	0.70%
27	95	8912	8974	−0.73%
28	95	8672	8884	−2.46%
29	95	9566	9745	−1.90%
30	95	9257	8925	3.58%
31	100	9343	9479	−1.46%
32	100	9086	9164	−0.87%
33	100	10,505	9514	9.42%
34	100	8829	8969	−1.59%
35	100	9242	8946	3.16%
36	100	9497	9199	3.13%
37	100	9331	9063	2.87%
38	100	10,535	9807	6.91%
39	100	9722	9383	3.49%
40	100	9954	9989	−0.35%
Average		8631	8796	0.01%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, B.; Yang, C.; Yang, Z. Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network. J. Mar. Sci. Eng. 2023, 11, 2240. https://doi.org/10.3390/jmse11122240

AMA Style

Li B, Yang C, Yang Z. Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network. Journal of Marine Science and Engineering. 2023; 11(12):2240. https://doi.org/10.3390/jmse11122240

Chicago/Turabian Style

Li, Bin, Caijie Yang, and Zhongzhen Yang. 2023. "Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network" Journal of Marine Science and Engineering 11, no. 12: 2240. https://doi.org/10.3390/jmse11122240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiple Container Terminal Berth Allocation and Joint Operation Based on Dueling Double Deep Q-Network

Abstract

1. Introduction

2. Problem Statement and Model Formulation

2.1. Multi-Container Terminal Berth Allocation for Computational Logistics

2.2. Notation

2.3. Mathematical Model

3. Solutions of the Formulated Problem

3.1. Reinforcement Learning

3.2. Construction of Partially Observable Markov Decision Models

3.3. Constructions of State

3.4. Constructions of Action

3.5. Constructions of Reward

3.6. Dueling Double DQN

3.7. Network Infrastructure

3.8. D3QN Algorithm for MDC-BAP

4. Simulation Results and Discussion

4.1. Introduction to Port Production Services

4.2. Introduction to Container Vessel Information

4.3. Experimental Verification and Result Analysis

4.4. Generalization Experience

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI