Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning

Lee, Chang-Min; Jung, Byung-Gun

doi:10.3390/jmse12091603

Open AccessArticle

Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning

by

Chang-Min Lee

and

Byung-Gun Jung

^*

Division of Marine System Engineering, Korea Maritime and Ocean University, 727, Taejong-ro, Yeongdo-gu, Busan 49112, Republic of Korea

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(9), 1603; https://doi.org/10.3390/jmse12091603

Submission received: 28 July 2024 / Revised: 23 August 2024 / Accepted: 5 September 2024 / Published: 10 September 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The control system of oil-fired boiler units on ships plays a crucial role in reducing the emissions of atmospheric pollutants such as nitrogen oxides

({N O}_{x}),

sulfur dioxides

({S O}_{2})

, and carbon dioxide

({C O}_{2})

. Traditional control methods using conventional measurement sensors face limitations in real-time control due to response delays, which has led to the growing interest in combustion control methods using flame images. To ensure the precision of such combustion control systems, the system model must be thoroughly considered during controller design. However, finding the optimal tuning point is challenging due to the changes in the system model and nonlinearity caused by environmental variations. This study proposes a controller that integrates an internal model control (IMC)-based PID controller with the deep deterministic policy gradient (DDPG) algorithm of deep reinforcement learning to enhance the adaptability of image-based combustion control systems to environmental changes. The proposed controller adjusts the PID parameter values in real-time through the learning of the determination constant lambda (

λ

) of the IMC internal model. This approach reduces computational resources by shrinking the learning dimensions of the DDPG agent and limits transient responses through constrained learning of control parameters. Experimental results show that the proposed controller exhibited rapid adaptive performance in the learning process for the target oxygen concentration, achieving a reward value of −0.05 within just 105 episodes. Furthermore, when compared to traditional PID tuning methods, the proposed controller demonstrated superior performance, achieving a target value error of 0.0032 and a low overshoot range of 0.0498 to 0.0631, providing the fastest response speed and minimal oscillation. Additionally, experiments conducted on an actual operating ship verified the practical feasibility of this system, highlighting its potential for real-time control and pollutant reduction in marine applications.

Keywords:

combustion control; emission prediction; IMC-based PID; real-time control; image-based control; deep deterministic policy gradient algorithm

1. Introduction

Combustion boilers are widely used in the maritime industry for preheating, hot water, and steam supply and have shown continuous growth in the context of atmospheric pollutant emission restrictions [1]. During the combustion process, these boilers produce exhaust gases that contain atmospheric pollutants such as

{N O}_{x},

{S O}_{x}

and

{C O}_{2}

. These pollutants contribute to greenhouse gas effects and accelerate global warming, underscoring the necessity to reduce their emissions during the combustion process [2,3].

To reduce atmospheric pollutants, it is necessary to appropriately regulate the air and fuel supplied to the combustion process. Accordingly, ongoing research focuses on directly controlling the flow rates of fuel and air supplied to combustion systems to mitigate atmospheric pollutants [4,5].

However, a significant challenge with these combustion control systems, which utilize direct measurement devices, is the inherent delay in the response of oxygen concentration changes in the exhaust gases to control outputs. Additionally, disturbances such as variations in intake air temperature, fuel properties, and combustion efficiency can impact the emission of atmospheric pollutants from the combustion system in real-time [6,7].

This issue can be addressed by utilizing flame images generated during the combustion process. Since flame images reflect the combustion state, they can reduce the delay in assessing the current state of exhaust gases. By analyzing the radiative emissions and color space of the flame, it is possible to monitor the production of atmospheric pollutants in real-time [8,9].

Previous studies developed a system for real-time monitoring of air pollutants and oxygen concentration by analyzing two-dimensional HSV images collected using accessible webcams, which identified spectral characteristic differences across various fuel-air ratios [10]. In subsequent research, this monitoring system was utilized as a control input to propose an oxygen concentration control system that could be easily applied to marine boilers. The proposed system models the correlation between oxygen and combustion based on operational data and uses an IMC-PI closed-loop control structure, effectively controlling exhaust gas emissions and reducing the production of air pollutants [11].

However, systems with complex combustion mechanisms, such as boilers, exhibit variable internal models due to numerous factors. Therefore, it is crucial to employ control algorithms that can adapt to a wide range of environmental changes. In the field of control engineering, extensive research has been conducted on various adaptive control methods [12,13,14]. Notably, adaptive tuning of PID parameters, which accounts for 90% of control processes in industrial applications, has been widely studied to optimize performance under varying conditions [15,16].

Recently, neural network-based supervised learning techniques for tuning PID parameters have gained attention due to their ability to map high-dimensional relationships between inputs and outputs. These techniques have demonstrated superior performance compared to other intelligent methods in the context of adaptive tuning [17,18,19]. However, such supervised learning approaches require extensive data sources to cover a wide range of environmental changes. Multivariable systems, like oil-fired boilers, can demand significant time and human resources, making them challenging to implement in real-world engineering applications.

Unlike supervised learning methods, deep reinforcement learning (DRL), an unsupervised learning approach, does not require labeled data, thus overcoming some of these challenges [20,21]. Consequently, DRL methods have been widely applied in the field of PID parameter tuning. In Lee’s study, an adaptive PID controller was developed to adjust PID gains in real-time while adapting to environmental changes in a dynamic positioning system (DPS) [22]. Carlucho’s research addressed the issue of simultaneously outputting multiple parameters from a PID controller based on reinforcement learning (RL) [23]. Additionally, Siraskar proposed an adaptive PID tuning method that features auto-tuning capabilities and high-frequency noise suppression [24].

Nevertheless, these studies have shown that excessive PID parameter outputs and integral windup can occur during the exploration process of DRL learning, potentially destabilizing the system.

To address this issue, Lawrence’s study aimed to improve stability by representing the PID controller with a shallow neural network in the actor network [25]. Furthermore, Lakhani proposed an RL-based stability-preserving PID adaptive tuning framework to ensure controller stability [26]. In Ding’s research, the actions of the agent were constrained during the multi-stage focusing process, enabling stable PID tuning even with limited prior knowledge [27].

However, these methods required 1500, 3000, and 4000 episodes, respectively, for the system to stabilize. Attempts have been made to apply these DRL-based PID frameworks, which require such long episodes, to an oxygen concentration control system based on flame images. However, excessive response of control parameters during the exploration process led to issues such as flame extinction, causing system shutdown, or accelerated contamination of the heat transfer surfaces in the boiler system due to unstable combustion.

Therefore, the objective of this study is to develop an RL-based PID adaptive tuning framework that ensures improved tuning performance while minimizing the impact of episodes during the exploration process on the system. To achieve this, the concept of the internal model in IMC-based PID control is utilized [28]. The internal model of the system is leveraged to constrain excessive control parameter outputs. When gradual variations occur in the system due to changes in the combustion environment, such as variations in fuel and air quality or fouling of heat transfer surfaces, the proposed controller adjusts the IMC tuning constant, lambda (

λ

), based on the experimentally obtained internal model to ensure that the system adapts to these altered conditions. This approach ensures that each control parameter is connected by the internal model and changes within a limited range, thereby restricting excessive system responses. Moreover, reducing the control parameters to be tuned from three to one (lambda) simplifies the system’s dimensionality and decreases the number of learning episodes required.

The innovative contributions of this paper are as follows:

Real-Time Image-Based Combustion Control: This study replaces the traditional oxygen concentration measurement methods by utilizing a predictor based on flame images that reflect the combustion state in real-time. This significantly reduces the delay in exhaust gas control and enables real-time control.
Proposed Adaptive Controller for Boiler Combustion Control: The study proposes a control system that integrates an internal model control (IMC)-based PID controller with the deep reinforcement learning algorithm and deep deterministic policy gradient (DDPG) algorithm, designed to effectively adapt to changes in the combustion environment.
Validation of Practical Applicability: The proposed control system has been validated for its practicality through experiments conducted in actual ship operation environments, demonstrating its potential for easy integration into existing systems.

2. Image-Based Boiler Combustion Control System

The existing ship oil combustion boiler control system,

S 1

, is a proportional combustion control system that simultaneously controls the airflow and fuel amount to maintain a constant steam pressure. This system focuses on combustion stability and follows the ratio set by the manufacturer during the commissioning process. This ratio helps maintain flame stability within a limited range of variations in the environmental conditions of the supplied air and fuel characteristics. However, during this process, changes in the combustion process may lead to varying levels of air pollutant emissions [29].

This study examines the control performance of the

S 2

system, which is additionally implemented on the original

S 1

system. The schematic diagrams of both

S 1

and

S 2

systems are shown in Figure 1.

The

S 2

system is an image-based combustion control (ICC) system that uses flame images as real-time input to predict oxygen concentration through an SEF + SVM predictor. The predicted oxygen concentration is used by the controller to adjust the damper servo motor, compensating for deviations from the target value. This controlled damper changes the amount of air supplied to the combustor, thereby controlling the air pollutants generated during the combustion process.

The saturation extraction filter (SEF) is a method for preprocessing flame images to process images linearly related to various combustion states. This process involves converting the RGB flame image into HSV format and then extracting the saturation component. The extracted saturation data are transformed into a histogram, removing noise and unnecessary redundant data from the original image. This process generates a feature set that more effectively represents the combustion state.

The support vector machine (SVM), a supervised learning-based classification model, is trained to predict the oxygen concentration in exhaust gases using features extracted from the SEF. This approach builds a robust predictive model capable of handling non-linearity and complexity. The training process leverages flame image data collected under various combustion conditions to enhance the model’s accuracy and reliability.

By integrating SEF and SVM methods, the system can predict oxygen concentration in real-time from flame images, and this predicted value is used as input for the control system. The oxygen concentration predictor used in

S 2

is a model trained on data collected from a gas analyzer that offers higher accuracy than the traditional lambda probe method. This model uses flame images of quasi-instantaneous combustion states as input, ensuring high accuracy while reducing latency. According to the study, this method demonstrated its effectiveness and practicality, achieving an R² value of 0.97 in oxygen concentration prediction through experiments. In addition, given that the input process for flame images may vary over time, it is important to retrain the model periodically to ensure it maintains a high level of accuracy.

Therefore, the

S 2

system can establish a real-time control system for regulating the oxygen concentration in oil-fired boiler (OFB) exhaust gases. By controlling the predicted oxygen concentration, the system can effectively manage the air pollutants generated during the combustion process. Additionally, the

S 2

system enhances the existing proportional control system,

S 1

, by adding a function to adjust the limited air supply for air pollutant control. This allows for efficient exhaust gas management while maintaining combustion stability. The

S 2

system has the advantage of high accessibility, as it can be easily applied not only to newly constructed ship OFBs at a low cost but also to existing ships in operation by utilizing existing flame observation ports and additional adjustments to existing air control dampers.

The target oxygen concentration for the

S 2

control system is set at 4%. According to related studies, pollutants such as

{C O}_{2}, {N O}_{x},

and

{S O}_{2}

are inversely proportional to oxygen concentration. Research by J. Chen et al. identified a correlation between

{N O}_{x}

emissions and oxygen concentration in the range of 2% to 5% during flame image prediction. Specifically, they found that at an oxygen concentration of 4.02%, the formation of soot and graphite is minimized, resulting in the least noise during flame image recognition. Additionally, G. Xiao et al. discovered that at 80% load, an oxygen concentration of 3.5% achieves a balance between heat release and

{N O}_{x}

formation. Their study also found that to reduce

{N O}_{x}

emissions by a factor of two, the oxygen concentration needs to be increased by 1.14 times. Particularly, they confirmed that near 3.5% oxygen concentration at 80% load, an optimal balance between heat release and

{N O}_{x}

emissions is achieved.

Based on a comprehensive review of these findings, setting the target oxygen concentration at 4% is considered suitable for optimizing boiler combustion through flame image analysis [30,31].

2.1. Experimental Setup for Image-Based Combustion Control (ICC) System

The experiment is conducted on a 9200 t ship, and the details of the boiler and combustor of the test subject OFB are shown in Table 1.

To implement the ICC system on the aforementioned OFB, the experimental environment is configured as shown in Figure 2.

The burner of the cylindrical water-tube boiler is located at the bottom of the cylinder. The burner initiates the combustion reaction between fuel and air, producing flames and exhaust gases, which contain information about the oxygen concentration within the exhaust. The flame generated by the burner is captured in high definition by a 1920 × 1080 pixel CMOS webcam in real-time. The camera is placed in the flame observation port on the side of the boiler according to SOLAS regulations. The collected images are transmitted to a computer via a USB 3.0 interface. These flame images are used by the computer to extract information about the exhaust gases. The computer analyzes the transmitted flame images using an SEF + SVM predictor to estimate the oxygen concentration. The predicted oxygen concentration serves as a crucial input variable for combustion control. The oxygen concentration input is converted into a control signal for the air regulation damper by the controller. The output analog control signal is converted through an A/D converter drive to an analog control output ranging from 0 to 90 degrees, and a servo motor attached to the damper end provides real-time control.

2.2. Data-Driven System Modeling

The ICC system, which receives flame images as input and outputs oxygen concentration, is difficult to calculate dynamically due to numerous variables such as air properties, changes in fuel characteristics, and changes in heat transfer efficiency due to contamination. Therefore, the transfer function is estimated using MATLAB’s System Identification Toolbox, version R2024a. This method leverages machine learning algorithms to estimate the transfer function by learning from input and output data, making it an advanced modeling technique. It is particularly suitable for irregular and nonlinear systems and has the advantage of being applicable to models with many system variables [32]. Based on the system identification results from response analysis of input–output data, the estimated transfer function model of the ICC system, denoted as

{\bar{G}}_{S 2}

in Equation (1), was found. The estimated model showed an accuracy of 99.28% and an MSE of 0.0001833.

{\bar{G}}_{S 2} = \frac{0.2187 s + 0.5960}{s^{2} + 2847.82 s + 1508.78}

(1)

To further understand the characteristics of the system, it can be represented in pole-zero form as shown in Equation (2).

{\bar{G}}_{S 2} = \frac{0.2187 (s + 2.728)}{(s + 0.523) (s + 2847.29)}

(2)

The system is in SOPZ form. Examining the poles and zeros of the transfer function

{\bar{G}}_{S 2}

, the poles are located at

S 1 \approx - 0.523

and

S 2 \approx - 2847

. Since the real parts of both poles are negative, they are located in the left half of the complex plane, indicating that the system is stable and controllable. The zero is also real and negative, confirming that it does not affect the system’s stability.

Converting Equation (2) to the system’s time constant form results in Equation (3).

{\bar{G}}_{S 2} = \frac{0.21866 (2.728 s + 1)}{(1.887649 s + 1) (0.000351 s + 1)}

(3)

Accordingly, the time constants of this system are found to be

τ_{a} = 3.51 \times 10^{- 4}

,

τ_{b} = 1.887649

. Examining the time constants,

τ_{b}

is much larger than

τ_{a}

, indicating that the impact of

τ_{a}

on the system is negligible. Therefore, it suggests that variables other than changes in air supply do not significantly affect the system.

3. Preliminaries

3.1. Internal Model Control-Based PID Control

Internal model control (IMC) is a control system design methodology that enhances the performance of the controller by using a process model. The basic idea of IMC is that the control system should include an internal model of the process being controlled. The fundamental structure of an IMC-based PID controller, which consists of a single PID controller with three adjustable parameters, is combined in parallel form, as shown in Equation (4).

u (t) = K_{p} [e (t) + \frac{1}{τ_{i}} \int_{0}^{t} e (t) d t + τ_{d} \frac{d e (t)}{d t}]

(4)

In Equation (4),

u (t)

is the manipulated variable at time t.

K_{p}

,

τ_{i}

, and

τ_{d}

are the proportional, integral, and derivative parameters, respectively. The error

e (t)

is the difference between the control variable

y (t)

and the setpoint at time t. IMC-based PID control improves control performance by tuning the PID parameters using the internal model of the control system. This method ensures effective control by directly utilizing and compensating for the dynamic characteristics of the system during controller design. The structure of the IMC controller

Q (s)

for controlling the target system model

G (s)

is shown in Equation (5).

Q (s) = G {(s)}^{- 1} \cdot f (s)

(5)

G (s)

is the transfer function of the process being controlled, and

f (s)

is the IMC filter function. When

G (s)

is unstable, it is difficult to directly use the inverse model, so an appropriate filter must be applied. Therefore, an IMC controller is designed by applying a suitable filter

f (s)

to the system function. The IMC filter function

f (s)

for the system

G (s)

is given in Equation (6).

f (s) = \frac{{(η s + 1)}^{m}}{{(λ s + 1)}^{n}}

(6)

Here,

λ

and

η

are the IMC filter parameters, primarily used for ensuring system stability and noise reduction. The orders

m

and

n

are determined based on the system’s stability and performance requirements. Generally,

m

is set equal to the number of poles of the system, while

n

is set to match the total number of poles of the system.

3.2. Reinforcement Learning–Deep Deterministic Policy Gradient

The deep deterministic policy gradient (DDPG) algorithm is a model-free, policy-based, and off-policy reinforcement learning algorithm designed to solve control problems in continuous action spaces. DDPG uses the actor–critic methodology to learn and optimize policies, and it was developed specifically to overcome the limitations of deep Q-networks (DQN). DDPG consists of two neural networks: an actor and a critic. Figure 3 demonstrates the principle of the DDPG algorithm [33,34].

The actor network receives the current state (State,

s_{t}

) as input and outputs continuous action (Action,

a_{t}

) values. The critic network

θ^{Q}

evaluates the

Q

-values for the given state and action. The weights of the actor network

θ^{μ}

are updated using the deterministic policy gradient algorithm, and the weights of

θ^{Q}

are updated using the gradients derived from the time delay (TD) error signal.

The DDPG algorithm operates in the following steps. First, the critic

Q (s, a| θ^{Q})

and actor

μ (s| θ^{μ})

networks are initialized arbitrarily. Along with this, the target networks

{θ^{Q}}^{'}

and

{θ^{μ}}^{'}

for the critic and actor are initialized with the values of

θ^{Q}

and

θ^{μ}

, respectively. An appropriate buffer value for experience replay is then set to store the data output from the environment.

At the start of an episode, the initial state

s_{t}

is observed.

θ^{μ}

receives the current state

s_{t}

as input and outputs the action

a_{t} = μ (s_{t}| θ^{μ}) + ε_{t}

, which is applied to the environment.

ε_{t}

is the noise for exploration, and it uses the Ornstein–Uhlenbeck process. This process is smooth and prevents abnormal system responses due to exploration. The environment responds with the next state

s_{t + 1}

and reward

r_{t}

, and the tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

is stored in the experience replay buffer.

A mini-batch is randomly sampled from the experience replay to update

θ^{Q}

such that the loss function

ℒ

(θ^{Q})

is minimized. The

Q

-function is updated as shown in Equation (7).

Q^{μ} (s_{t}, a_{t}) = E_{r_{t}, s_{t + 1}} [r (s_{t}, a_{t}) + γ Q^{μ} (s_{t + 1}, μ (s_{t + 1}))]

(7)

θ^{Q}

evaluates the

Q

-values for the given state and action, reducing the difference between the actual reward and the predicted

Q

-value.

θ^{μ}

is updated using the policy gradient as shown in Equation (8).

\nabla_{θ^{μ}} J \approx E_{(s_{t})} [\nabla_{a} Q (s, a| θ^{Q}) │_{a = μ (s)} \nabla_{θ^{μ}} μ (s| θ^{μ})]

(8)

Equation (8) calculates the gradient for the current policy

μ (s| θ^{μ})

, optimizing the policy network parameters

θ^{Q}

. This allows the agent to learn actions that yield higher expected rewards. The policy gradient

\nabla_{θ^{μ}} J

is used to update the policy network parameters in a direction that maximizes the expected reward

J

. The critic network

Q (s, a| θ^{Q})

evaluates the

Q

-value for state

s

and action

a

, and through the gradient of this

Q

-value, it assesses the effectiveness of the current

μ (s| θ^{μ})

. Based on this assessment, the policy network is updated. Subsequently, the parameters

θ^{μ}

of the actor network are updated by reflecting the gradient

\nabla_{a} Q (s, a| θ^{Q})

of the critic network. This adjustment enables the policy network to output better actions, thereby allowing the agent to receive higher rewards.

Finally, the target networks are updated using the target soft update method. Through this iterative process, the actor and critic networks gradually learn the optimal policy and

Q

-values.

DDPG, with its actor–critic architecture, enables stable and efficient learning. It is a powerful reinforcement learning algorithm specifically designed to solve continuous action control problems.

In this paper, to effectively control the ICC system, a DPG-IMC-based PID controller, which integrates the deep reinforcement learning DDPG algorithm with an IMC-based PID controller, is proposed, and its effectiveness is verified.

4. Deep Deterministic Policy Gradient-Based Internal Model Control-PID Control

4.1. IMC-Based PID Controller for Image-Based Combustion Control System

To effectively control the image-based combustion system of the ICC system, it is important to use an appropriate controller. One method is to use an IMC-based PID controller. Previous research applied an IMC-based PI controller to the ICC system and obtained a significant result with an ISE value of 10.1159. However, since flame images are used as input signals, including the derivative component of the PID controller can help predict and respond to rapid changes in the combustion process, thereby improving stability and responsiveness. Therefore, a PID controller is more suitable.

In this process, high-frequency noise due to intermittent prediction errors may occur, but it can be mitigated by applying appropriate filtering techniques. The derivative component enhances the system’s ability to respond to dynamic changes, reducing overshoot and settling time. Furthermore, despite the increased complexity of tuning the IMC controller, the advantages of achieving more precise and robust control using a PID controller outweigh these difficulties [35].

First, to design an IMC-based PID controller, the internal model is analyzed. The internal model transfer function estimated from the data in Equation (3) is in SOPZ form and is expressed as shown in Equation (9).

{\bar{G}}_{I C C} (s) = \frac{k_{p} (β s + 1)}{(τ_{a} s + 1) (τ_{b} s + 1)} (τ_{a} < τ_{b})

(9)

where

τ_{a} and τ_{b}

are the time constants of the system,

k_{p}

is the proportional gain, and

β

is the constant associated with zero. Consequently, the IMC controller can be expressed as shown in Equation (10), where

f_{i} (s)

represents the IMC filter for ICC system.

q (s) = {\bar{G}}_{I C C}^{- 1} f_{i} (s)

(10)

Since

f (s)

must be equal to or greater than the order of the numerator to achieve control, the order of the filter function is set to match the order of the internal model

{\bar{G}}_{I C C}

shown in Equation (11).

f_{i} (s) = \frac{η s + 1}{{(λ s + 1)}^{2}}

(11)

λ

and

η

are the time constants of the filter, and they need to be adjusted according to the required performance of the controller. They are parameters that regulate control performance and robustness. In this context,

η

is set to be equal to

λ

for the design of the PID controller.

By integrating the IMC controller

q (s)

with the internal model

{\bar{G}}_{I C C}^{- 1}

, a classic controller

K_{I C C} (s)

can be formed. This can be expanded using Equations (9) and (10), and can be expressed in the forms shown in Equations (12a) and (12b).

K_{I C C} (s) = \frac{q (s)}{1 - {\bar{G}}_{I C C} q (s)} = \frac{{\bar{G}}_{I C C}^{- 1} f_{i} (s)}{1 - {\bar{G}}_{I C C} {\bar{G}}_{I C C}^{- 1} f_{i} (s)}

(12a)

K_{I C C} (s) = \frac{1}{k_{p} λ (τ_{b} + τ_{s})} (1 + \frac{1}{(τ_{b} + τ_{s}) s} + \frac{τ_{b} τ_{s}}{τ_{b} + τ_{s}} s) \frac{1}{(β s + 1)} = \frac{1}{k_{p} λ (τ_{b} + τ_{s})} (1 + \frac{1}{(τ_{b} + τ_{s}) s} + \frac{τ_{b} τ_{s}}{τ_{b} + τ_{s}} s) f_{l} (s)

(12b)

The expanded Equation (12b) shows that

K_{I C C} (s)

takes the form of a PID controller. Here, the term

\frac{1}{(β s + 1)}

can be considered a low-pass filter, denoted as

f_{l} (s)

. The cutoff frequency

f_{c}

of this filter is calculated as

\frac{β}{2 π}

. When

β = 2.728423 \times 10^{3}

, the cutoff frequency is approximately 434 Hz.

In continuous-time systems, it is important to compare the primary operating frequency range of the system with the cutoff frequency of the filter. If the system primarily operates in the low-frequency range, a filter with a cutoff frequency of 434 Hz will have little to no impact on the system’s main operating frequency range. Since the filter’s cutoff frequency is much higher than the system’s main frequency range, the effect of the filter can be ignored.

Therefore, the impact of the low-pass filter

\frac{1}{(β s + 1)}

on the system’s frequency response is negligible because its cutoff frequency is much higher than the system’s main operating frequency range. Consequently, the term

\frac{1}{(β s + 1)}

can be disregarded in the analysis and design of the continuous-time controller

K_{I C C} (s)

.

K_{I C C}^{'} (s) = \frac{1}{k_{p} λ (τ_{b} + τ_{s})} (1 + \frac{1}{(τ_{b} + τ_{s}) s} + \frac{τ_{b} τ_{s}}{τ_{b} + τ_{s}} s) = K_{p} (1 + \frac{K_{i}}{s} + K_{d} s)

(13)

Comparing Equation (12b) with Equation (13), the control parameters can be considered as Equation (14).

K_{p} = \frac{1}{k_{p} λ (τ_{b} + τ_{s})}, K_{i} = \frac{1}{k_{p} λ (τ_{b} + τ_{s}) (τ_{b} + τ_{s})}, K_{d} = \frac{τ_{b} τ_{s}}{k_{p} λ (τ_{b} + τ_{s}) {(τ}_{b} + τ_{s})}

(14)

The control elements for the

I C C

system are summarized in Table 2.

Therefore, by adjusting the IMC filter constant

λ

, the values of

K_{p}

,

K_{i}, a n d K_{d}

can be set to optimize control performance.

4.2. Proposal of IMC-DPGA (Deep Policy Gradient Adaptive) Controller

Previous related studies have conducted empirical learning using deep reinforcement learning algorithms to select the PID parameters

K_{p}

,

K_{i}

, and

K_{d}

for optimal control performance. However, when using deep reinforcement learning to directly learn

K_{p}

,

K_{i}

, and

K_{d}

, excessive parameter fluctuations due to initial exploration and exploration noise can cause overshoot in the control output, negatively affecting the plant. In flame combustion-based systems like the ICC system, changes in air supply during the exploration phase can lead to incomplete combustion of the flame, contaminating the heat exchange surface and altering the system. Additionally, excessive overshot poses the risk of flame extinction, which can prevent further learning stages and trap the system in a non-progressive learning loop.

Moreover, if PID parameters are learned sporadically, the range of action variables can widen, potentially leading to the curse of dimensionality. However, in internal model control, the values of

K_{p}

,

K_{i}

, and

K_{d}

are determined by the internal model

{\bar{G}}_{I C C}

, and they vary organically within the range set by the

λ

[36]. By learning the

λ

to achieve optimal control performance, the number of action variables can be reduced from three to one, and the range can be limited. This can make the learning of the deep reinforcement learning agent more stable and faster. Additionally, since the control parameters are dynamically connected by the internal model, it is possible to prevent control instability caused by sporadic parameters, thereby ensuring stable control performance even during the learning process.

However, to adjust the optimal

λ

, the control system needs to be tuned at each unit value, which consumes a significant amount of time resources. Additionally, it is practically difficult to verify control performance down to small units (below 0.1), and the system must continuously respond to changes in external environmental conditions. Therefore, to apply the optimal

λ

to the control system in real time in response to system changes, this paper proposes an IMC-DPGA (deep policy gradient adaptive) controller using the DDPG algorithm.

The structure of the proposed IMC-DPGA controller is shown in Figure 4.

The IMC-DPGA control system shown in Figure 4 illustrates the structure for dynamically updating the

λ

of the IMC-based PID controller using the DDPG algorithm. This system aims to achieve optimal performance of the PID controller under changing environmental conditions. The image-based combustion control system handles a continuous process in real-time, where the DDPG algorithm efficiently learns the optimal policy within a continuous action space. This distinguishes it from other reinforcement learning algorithms that focus on discrete action space. The agent receives information from the ICC system, observes the state

s_{k}

, receives a reward

r_{k}

, and repeatedly determines and updates the action

a_{k}

, leading to a new state

s_{k + 1}

. These updates allow the agent to adapt to the environment and estimate the appropriate value of

λ

to improve control performance.

The value of

λ

updated by the agent is applied to the control parameters of the IMC-PID at regular intervals of

N

. The optimal

N

for learning may vary depending on the control environment, so it should be determined through additional parameter selection experiments. Based on this structure, the IMC-DPGA controller, which combines the DDPG agent and the IMC-PID controller, effectively adapts to the dynamic changes in the process environment and can stably control the flame. By periodically updating the

λ

through reinforcement learning, the system maintains optimal control performance, ensuring stability and efficiency. This approach provides an adaptive and intelligent control solution, overcoming the limitations of direct parameter adjustment.

4.3. Agent Environment Configuration

4.3.1. State and Action of Agent

The state vector of the IMC-DPGA for ICC system consists of the oxygen concentration error, the rate of change of the error, the current oxygen concentration, and the current

λ

, as shown in Equation (15).

s (k) = [\begin{matrix} e_{k} \\ ∆ e_{k} \\ O_{k} \\ λ_{k} \end{matrix}], a (k) = [λ_{k + 1}]

(15)

The oxygen concentration error

e_{k}

is defined as the difference between the target oxygen concentration and the current measured oxygen concentration, and it is used to evaluate the need for adjusting

λ

. The rate of change of the error

∆ e_{k}

represents the rate at which the oxygen concentration error changes over time, reflecting the system’s dynamic response to

λ_{k}

. The current oxygen concentration

O_{k}

directly reflects the current state of the system, helping to construct the state vector, and by including

λ_{k}

, it reflects the current adjustment level of the control input. This allows for more precise prediction and control.

By using this state vector, the dynamic characteristics of the system can be understood, and accurate control can be performed through predictive and adaptive control.

4.3.2. Reward

To ensure effective control, the reward function of the IMC-DPGA must be designed to have a positive correlation with performance. Specifically, the amplitude of the system output

y (t)

should be minimized, and the output should quickly converge to the target value. Therefore, the reward function should include both the time steps of the entire closed-loop trajectory and the error

e (t)

.

The reward function for controlling the oxygen concentration of the boiler combustion system can be designed to minimize the error defined as

e r r o r = O_{t a r g e t} - O_{c u r r e n t}

, and to reduce the system’s instability through the change in error

Δ e r r o r = {e r r o r}_{c u r r e n t} - {e r r o r}_{p r e v i o u s}

. The reward function reflecting this can be described as shown in Equation (16).

r (k) = - (\frac{1}{N} \sum_{k = 1}^{N} |e r r o r (k)| + |∆ e r r o r (k)|) = - (\frac{1}{N} \sum_{k = 1}^{N} |e (k)| + |∆ e (k)|)

(16)

In the above equation,

N

represents the number of steps per episode. The reward function’s adjustment of

N

can optimize the overall performance of the OFB system. Lowering the

N

value has the advantage of quick adaptation and immediate response, but it also increases computational complexity due to frequent updates and may lead to system instability due to excessive parameter fluctuations. Conversely, increasing the

N

value reduces the computational load and maintains a certain level of stability between updates, but if the update interval is too long, the accuracy may decrease due to overfitting.

To review the stability of the reward function’s learning, experiments will be conducted with

N

set to 1, 50, 100, and 200. Through these experiments, the system’s response and stability for each value of

N

will be evaluated, and the optimal

N

will be determined.

5. Training and Experiments

5.1. Experimental Setup

The reinforcement learning algorithm parameters set as initial conditions are presented in Table 3.

The network structure, mini-batch size, and learning rate are selected based on preliminary experiments that showed optimal performance. Table 3 lists the key parameters adopted for training. When determining parameters such as learning rate and batch size for experience replay, DeepMind’s DPG model was referenced [37], and slight adjustments were made based on benchmark values. These parameters were gradually refined through a process of trial and error. Through this process, it was found that the learning outcomes were sensitive to certain parameters, such as learning rate and network structure, but not to others, such as the experience replay buffer size. Ultimately, parameters were selected that did not lead to overfitting and did not place excessive demands on computational resources.

The actor network uses the Tanh activation function to limit the output range to −1 and 1, providing stability, while the critic network uses the ReLU activation function to introduce nonlinearity and increase learning speed. The Adam optimization algorithm was chosen because it provides fast and stable convergence by automatically adjusting the learning rate. Early stopping patience is set to 10 to prevent overfitting and allow early termination of the training process. The discount factor is set to 0.9 to balance considering future rewards while not neglecting present rewards.

The range of the

λ

updated by the DDPG agent is set to

λ \in [0.1, 2]

, with the initial value of

λ

set to 1. This range is set considering the system’s performance and stability. According to Equation (14), the range of PID parameters determined by the IMC internal model is

K_{p} \in [1.21, 22.22], K_{i} \in [0.64, 12.83]

, and

K_{d} \in [0.000425, 0.0085]

.

5.2. Threshold Analysis

In this section, experiments are conducted to select the optimal value of

N

, the number of steps per episode for the DDPG agent, by varying

N

. The experimental learning results for different values of

N

are shown in Figure 5.

Figure 5 shows the learning performance of the DDPG agent when the number of steps per episode

N

is 1, 50, 100, and 200, respectively. The number of steps for threshold setting was determined experimentally, starting from 1 and increasing in multiples until the number of episodes where overfitting occurs. The summarized results for the graph are presented in Table 4.

As can be seen from the graph and table, when

N = 1

(blue curve), the agent’s reward shows significant fluctuations during the learning process and the lowest final reward value −0.205 after the highest number of episodes 289. This can be interpreted as the negative impact on the reward due to the agent not having sufficient opportunities to explore because of the low number of steps per episode. This variability indicates instability and inefficiency in learning.

When

N = 50

(green curve), the fluctuations decrease compared to

N

, and the final reward value −0.135 is higher after fewer episodes 194, showing improved learning stability. This indicates that as the number of steps per episode increases, the agent has more opportunities to interact with the environment, leading to more effective exploration and increased learning efficiency.

For

N = 100

(red curve), the agent achieves the highest final reward value −0.05 in the fewest number of learning episodes 105. This suggests that exploration is more effective for the same reason as in

N = 50

, indicating optimal learning efficiency and performance.

In the case of

N = 200

(pink curve), there is an increase in the number of episodes 158 and a decrease in the reward value −0.67. This indicates that excessive exploration leads the agent to not find the optimal actions and spend unnecessary time, suggesting that the exploration-exploitation balance is disrupted and that this is not an appropriate value.

These results demonstrate the importance of appropriately selecting the number of steps per episode to optimize the learning performance of the DDPG agent. As

N

increases, the learning process tends to become smoother and more stable; however, an

N

value that is too large can lead to performance degradation.

N = 100

is shown to be the most efficient and high-performing number of steps, as it avoids the excessive exploration that prevents the agent from finding optimal actions and leads to unnecessary time consumption. Therefore, selecting the appropriate value of

N = 100

can maximize the performance of the DDPG controller.

5.3. Experiment and Result Analysis

In this section, experiments are conducted to apply the proposed IMC-DPGA controller to the ICC system for adaptive tuning. The figure shows the learning process of

λ

,

K_{p}, K_{i}

, and

K_{d}

per episode, with the number of steps per episode

N

set to 100 as in Figure 6.

The graph is a 3D representation of the kernel density estimation for the values output at each step of each episode. Kernel density estimation (KDE) is a non-parametric method for estimating the probability density function of data. It smoothly represents the distribution of given data, making it easier to identify patterns, and is often used to understand or visualize the underlying distribution of the data. This allows for a visual understanding of the data learned at each step of each episode.

When examining the overall learning trend of the control parameters, the initial episodes show high volatility and a wide density distribution, indicating exploration of various values. As learning progresses, the control parameter values tend to concentrate within a specific range, as indicated by the points with the highest density values. The increase in density value indicates convergence towards a stable optimal value.

From Table 5, the final values of

λ

,

K_{p}, K_{i}

, and

K_{d}

can be determined based on the mode value corresponding to the highest density in the graph distribution.

As a result of the learning process over the episodes, the value of

λ

fluctuates within the range of 0.435 to 1, and accordingly,

K_{p}

varies from 2.41 to 5.59,

K_{i}

ranges from 1.28 to 2.98, and

K_{d}

changes from 0.00085 to 0.002 in a similar trend. The detailed learning process of these changes can be observed in Figure 7. This figure represents a cross-sectional view of the KDE graph for

λ

in Figure 6.

The learning process can be divided into two phases. Phase 1 is the period of rapid change starting from the initial value of 1 to the 37th episode. Phase 2 is from the 38th episode to the end of the learning process, converging to 0.44, during which the volatility is very small, ranging from 0.456 to the final value of 0.44.

In Phase 1,

λ

rapidly changes within the range of 1 to 0.456. During this time,

K_{p}

changes from 2.41 to 5.4,

K_{i}

changes from 1.28 to 2.88, and

K_{d}

changes within the small range of 0.00189 to 0.002. This demonstrates that the IMC-DPGA can effectively and stably adapt to changes due to exploration through the internal model, especially for plants like ICC system where transient response significantly impacts stability.

Subsequently, in Phase 2, these parameters converge more stably, ensuring the final control performance of the system. As a result, the optimal

λ

converges to 0.44, and the corresponding values of

K_{p}, K_{i}

, and

K_{d}

converge to 5.48, 2.9, and 0.0019, respectively.

6. Compare Performance of Different Controllers

In this section, the real-time control performance of the proposed IMC-DPGA controller is evaluated by comparing it with several major PID control algorithms. The experiments are conducted on the

S 2

system to maintain a consistent oxygen concentration while the OFB system’s

S 1

is operating. The experimental process involves maintaining an initial 4% oxygen concentration in the

S 2

ICC system for 200 s and then comparing the results to verify control performance. At the 100 s mark, the setpoint for the oxygen concentration is changed from 4% to 5%, and control output data are collected. The collected data includes the value predicted by flame images, with a sampling time of 1 s. The transient response period and steady-state performance of each controller are compared. The algorithms selected for comparison are the Ziegler–Nichols tuning method, the Lambda tuning method, and the IMC-Maclaurin (IMC-MAC) closed-loop tuning method.

The Ziegler–Nichols tuning method is a classical approach that sets PID parameters using the critical gain and critical period, allowing for simple and quick initial settings. The Lambda tuning method sets PID parameters based on the system’s time constant, making it practical and easy to use. The IMC-MAC closed-loop tuning method combines the IMC tuning method with MAC’s optimization algorithm, providing high precision for complex systems. This comparison allows for the evaluation of the real-time control performance and suitability of various PID tuning methods. The results are shown in Figure 8.

The two graphs compare the performance of various control algorithms for regulating oxygen concentration. The graph on the right provides a detailed view of the transient response period from 85 to 125 s, allowing for an evaluation of the proposed IMC-DPGA controller’s performance in comparison with other existing control algorithms. Table 6 compares the maximum overshoot (

M_{p}

) and integral square error (ISE) to evaluate the step response performance of each controller.

Analysis of the results in Table 6 reveals differences in the step response performance of each controller. For the Z-N tuning method, the

M_{p}

is 0.1114, indicating a significantly large transient response. Additionally, the ISE is high at 11.1966, which suggests considerable residual oscillations and error in the system’s response. This implies that the response of the Z-N controller is unstable and prone to oscillations.

In the case of the

λ

-T tuning method, the

M_{p}

is 0.0819, showing an improvement in transient response compared to Z-N. However, the ISE remains high at 10.0912, indicating that residual oscillations have not been eliminated. This suggests that while the transient response has been reduced, the overall quality of the response is still lacking.

The IMC-MAC tuning method shows a significant improvement with an ISE of 8.1189, indicating a substantial reduction in error. However, the

M_{p}

is recorded at 0.1250, the highest among the methods, suggesting that the initial stability of the response is lacking due to the large transient response. In other words, while the error has decreased, the method exhibits a considerable transient phenomenon during the initial response.

Finally, the proposed IMC-DPGA tuning method demonstrates substantial improvements, with

M_{p}

and ISE values of 0.0631 and 7.7278, respectively. This indicates that both the overall error and transient response have been greatly improved. Notably, the

M_{p}

is the lowest, meaning the transient response is minimized, which signifies that the system is highly stable and converges to the target value rapidly.

Additionally, Figure 9 represents the steady-state response for oxygen concentration targets of 4% and 5%.

The graph is a boxplot of the output data in the steady-state regions at the control targets of 4% and 5%. This allows for the assessment of the stability of each controller in the steady-state.

Table 7 quantifies the data from the graph, showing the median, upper adjacent (U.A), and lower adjacent (L.A) of the output data for each controller.

The Z-N controller shows similar medians and data distributions at both control targets of 4% and 5%. The

λ

-T controller also exhibits a similar distribution, indicating stability comparable to that of the Z-N controller. The IMC-MAC controller, however, shows a significantly larger data distribution, suggesting lower stability. This implies that while IMC-MAC demonstrates a fast response speed during the transient response period, it experiences significant oscillations in the steady-state, resulting in lower stability. The IMC-DPGA, compared to the other controllers, shows the lowest data distribution and closely follows the control target with its median, indicating the highest control stability in the steady-state. This confirms that IMC-DPGA ensures faster response speeds while providing superior stability compared to other controllers. Specifically, the superior performance of the IMC-DPGA compared to the IMC-MAC demonstrates the effectiveness of the tuning method, which allows for the adaptive real-time adjustment of the value of

λ

according to the internal model by combining the DDPG algorithm with the IMC structure.

7. Conclusions

The tightening of atmospheric pollutant emission regulations in the maritime sector has spurred efforts to reduce emissions from combustion boilers. Understanding the correlation between control variables and atmospheric pollutants and controlling a calculated model can reduce these emissions. However, existing boiler combustion measurement-control systems have high time constants and struggle to achieve appropriate control in the face of dynamic changes in models due to various variables.

Thus, using flame images as a means to measure oxygen concentration and employing an image-based combustion control system that can additionally control the air volume in existing combustion systems can reduce measurement delay times and excessive combustion state changes, enabling stable real-time control.

In this paper, the IMC-DPGA (internal model control–deep policy gradient adaptive) controller is proposed, which combines the IMC-PID controller, known for its excellent model-based control, with the DDPG algorithm, which allows continuous exploration learning, and is applied to an image-based combustion control system. Because PID control parameters are linked by the internal model of IMC, it can prevent transient responses caused by sporadic changes in each parameter during the learning phase. Additionally, unlike traditional RL-based PID parameter tuning methods, the action variable is reduced from three dimensions to one by using lambda

(λ)

, the IMC filter, saving computational resources and enabling stable and fast learning.

By setting and controlling the PID parameters based on the threshold value of 100 steps

(N)

per episode established through experimentation, a reward value of −0.05 was achieved in just 105 episodes. Furthermore, comparison experiments in step response with other controllers showed that the IMC-DPGA controller demonstrated the fastest response speed, lowest overshoot, and minimal oscillation compared to existing PID controllers, proving its stability and effectiveness.

The experiments in this study were conducted on actual operating ships, verifying their practicality. Additionally, the image-based combustion control system can be easily integrated into existing ships at low cost, providing an immediate reduction in atmospheric pollutants.

However, increasing the target oxygen concentration can suppress atmospheric pollutants through excess air but decrease boiler performance efficiency. Therefore, future research must develop optimal control strategies that balance pollutant reduction and boiler performance efficiency. To achieve this, combining multi-objective optimization techniques with the IMC-DPGA control algorithm will be essential to respond to real-time changes in combustion conditions and simultaneously optimize pollutant emissions and energy efficiency. Furthermore, since the improvement in the learning agent’s performance directly translates to enhanced controller performance, further research on improving the agent model’s performance through transfer learning is necessary.

Author Contributions

Conceptualization, C.-M.L. and B.-G.J.; methodology, C.-M.L.; formal analysis, C.-M.L.; writing—original draft preparation, C.-M.L.; writing—review and editing, B.-G.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Research promotion program through the National Korea Maritime and Ocean University Research Fund in 2023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further in-queries can be directed to the corresponding authors.

Acknowledgments

First and foremost, we extend our deepest gratitude to everyone who has played a role in the successful completion of this journal. We also wish to express our sincere thanks to the esteemed reviewers for their meticulous evaluation, insightful feedback, and expert guidance throughout the peer review process. Additionally, we would like to extend our heartfelt appreciation to Jung Byung-Gun for his invaluable mentorship, unwavering support, and profound insights that greatly contributed to this work. Lastly, we are immensely thankful to the editors for their dedication, hard work, and commitment to advancing knowledge in our field.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Nomenclature

${C O}_{2}$	Carbon dioxide
${N O}_{x}$	Nitrogen oxides
${S O}_{2}$	Sulfur dioxides
${S O}_{x}$	Sulfur oxides
Greek symbols
$β$	constant associated with the zero
$a_{t}$	Action of actor (variable for time)
$e_{k}$	Target oxygen concentration error
$f (s)$	IMC filter
$f_{c}$	Cutoff frequency
$f_{i} (s)$	IMC filter for ICC system
$f_{l} (s)$	Low-pass filter
$G (s)$	Transfer function of controlled process
$G_{S 2}$	Actual system transfer function
${\bar{G}}_{I C C}$	The internal model transfer function of ICC system
${\bar{G}}_{S 2}$	ICC system transfer function
$J$	Expected reward
$K_{I C C} (s)$	Classic controller of ICC system
$K_{d}$	Derivation gain
$K_{i}$	Integral gain
$K_{p}$	Proportional gain
$L (θ^{Q})$	Loss function
$λ, η$	Time constants of the IMC filter
$M_{p}$	Maximum peak error
$N$	Step number per episode
$O_{k}$	Current oxygen concentration
$r_{t}$	Reward (variable for time)
$s_{t + 1}$	Next state
$u (t)$	Control input
$y (t)$	Amplitude of the system output
$θ^{Q}$	Critic network
${θ^{Q}}^{'}$	Target network for critic
$θ^{μ}$	Actor network
${θ^{μ}}^{'}$	Target network for actor
$τ_{a}, τ_{b}$	Time constants of the system
$τ_{d}$	Derivative parameter
$τ_{i}$	Integral parameter
$Q (s)$	IMC controller
$s_{t}$	State of actor (variable for time)
$ε_{t}$	Noise for exploration (variable for time)
$\nabla_{θ^{μ}} J$	Policy gradient for expected reward
Index
A/D	Analog-to-digital
CMOS	Complementary metal-oxide-semiconductor
DDPG	Deep deterministic policy gradient
DRL	Deep reinforcement learning
DPS	Dynamic positioning system
DQN	Deep Q-networks
HSV	Hue, saturation, and value
ICC	Image-based combustion control
ISE	Integral of squared error
IMC	Internal model control
MAC	Maclaurin
MSE	Mean squared error
KDE	Kernel density estimation
L.A	Lower adjacent
OFB	Oil-fired boiler
PID	Proportional–integral–derivation
PI	Proportional–integral
$R^{2}$	R-squared
SEF	Saturation extraction filter
SOLAS	The International Convention for the Safety of Life at Sea
SOPZ	Second-order plus zero-pole
SVM	Support vector machine
TD	Time delay
USB	Universal serial bus
U.A	Upper adjacent
Z-N	Ziegler–Nichols

References

MarkWide Research. Global Marine Boilers Market: Analysis, Industry Size, Share, Research Report, Insights, COVID-19 Impact, Statistics, Trends, Growth, and Forecast 2024–2032; MarkWide Research: Torrance, CA, USA, 2024. [Google Scholar]
Shelyapina, M.G.; Rodríguez-Iznaga, I.; Petranovskii, V. Materials for CO₂, SO_x, and NO_x Emission Reduction. In Handbook of Nanomaterials and Nanocomposites for Energy and Environmental Applications; Springer: Cham, Switzerland, 2020; pp. 2429–2458. [Google Scholar]
Tadros, M.; Ventura, M.; Soares, C.G. Review of current regulations, available technologies, and future trends in the green shipping industry. Ocean Eng. 2023, 280, 114670. [Google Scholar] [CrossRef]
Zhao, J.; Wei, Q.; Wang, S.; Ren, X. Progress of ship exhaust gas control technology. Sci. Total Environ. 2021, 799, 149437. [Google Scholar] [CrossRef] [PubMed]
Nemitallah, M.A.; Nabhan, M.A.; Alowaifeer, M.; Haeruman, A.; Alzahrani, F.; Habib, M.A.; Elshafei, M.; Abouheaf, M.I.; Aliyu, M.; Alfarraj, M. Artificial intelligence for control and optimization of boilers’ performance and emissions: A review. J. Clean. Prod. 2023, 417, 138109. [Google Scholar] [CrossRef]
Chen, J.; Chang, Y.; Cheng, Y.; Hsu, C. Design of image-based control loops for industrial combustion processes. Appl. Energy 2012, 94, 13–21. [Google Scholar] [CrossRef]
Krishnamoorthi, M.; Agarwal, A.K. Combustion instabilities and control in compression ignition, low-temperature combustion, and gasoline compression ignition engines. In Gasoline Compression Ignition Technology: Future Prospects; Springer: Berlin/Heidelberg, Germany, 2022; pp. 183–216. [Google Scholar]
Sujatha, K.; Venmathi, M.; Pappa, N. Flame Monitoring in power station boilers using image processing. Ictact J. Image Video Process. 2012, 2, 427–434. [Google Scholar]
Omiotek, Z.; Kotyra, A. Flame image processing and classification using a pre-trained VGG16 model in combustion diagnosis. Sensors 2021, 21, 500. [Google Scholar] [CrossRef]
Lee, C.; Jung, B.; Choi, J. Experimental Study on Prediction for Combustion Optimal Control of Oil-Fired Boilers of Ships Using Color Space Image Feature Analysis and Support Vector Machine. J. Mar. Sci. Eng. 2023, 11, 1993. [Google Scholar] [CrossRef]
Lee, C. Combustion Control of Ship’s Oil-Fired Boilers based on Prediction of Flame Images. J. Mar. Sci. Eng. 2024, 12, 1474. [Google Scholar] [CrossRef]
Noye, S.; Martinez, R.M.; Carnieletto, L.; De Carli, M.; Aguirre, A.C. A review of advanced ground source heat pump control: Artificial intelligence for autonomous and adaptive control. Renew. Sustain. Energy Rev. 2022, 153, 111685. [Google Scholar] [CrossRef]
Qi, R.; Tao, G.; Jiang, B. Fuzzy System Identification and Adaptive Control; Springer: Cham, Switzerland, 2019. [Google Scholar]
Yaseen, H.M.S.; Siffat, S.A.; Ahmad, I.; Malik, A.S. Nonlinear adaptive control of magnetic levitation system using terminal sliding mode and integral backstepping sliding mode controllers. ISA Trans. 2022, 126, 121–133. [Google Scholar] [CrossRef]
Mahmud, M.; Motakabber, S.; Alam, A.Z.; Nordin, A.N. Adaptive PID controller using for speed control of the BLDC motor. In Proceedings of the 2020 IEEE International Conference on Semiconductor Electronics (ICSE), Kuala Lumpur, Malaysia, 28–29 July 2020; pp. 168–171. [Google Scholar]
Nohooji, H.R. Constrained neural adaptive PID control for robot manipulators. J. Frankl. Inst. 2020, 357, 3907–3923. [Google Scholar] [CrossRef]
Wang, J.; Zhu, Y.; Qi, R.; Zheng, X.; Li, W. Adaptive PID control of multi-DOF industrial robot based on neural network. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 6249–6260. [Google Scholar] [CrossRef]
Dubey, V.; Goud, H.; Sharma, P.C. Role of PID control techniques in process control system: A review. In Data Engineering for Smart Systems: Proceedings of SSIC 2021; Springer: Singapore, 2022; pp. 659–670. [Google Scholar]
Kanungo, A.; Choubey, C.; Gupta, V.; Kumar, P.; Kumar, N. Design of an intelligent wavelet-based fuzzy adaptive PID control for brushless motor. Multimed. Tools Appl. 2023, 82, 33203–33223. [Google Scholar] [CrossRef]
Chen, S. Review on supervised and unsupervised learning techniques for electrical power systems: Algorithms and applications. IEEJ Trans. Electr. Electron. Eng. 2021, 16, 1487–1499. [Google Scholar] [CrossRef]
Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Lee, D.; Lee, S.J.; Yim, S.C. Reinforcement learning-based adaptive PID controller for DPS. Ocean Eng. 2020, 216, 108053. [Google Scholar] [CrossRef]
Carlucho, I.; De Paula, M.; Acosta, G.G. An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots. ISA Trans. 2020, 102, 280–294. [Google Scholar] [CrossRef]
Siraskar, R. Reinforcement learning for control of valves. Mach. Learn. Appl. 2021, 4, 100030. [Google Scholar] [CrossRef]
Lawrence, N.P.; Stewart, G.E.; Loewen, P.D.; Forbes, M.G.; Backstrom, J.U.; Gopaluni, R.B. Optimal PID and antiwindup control design as a reinforcement learning problem. IFAC-PapersOnLine 2020, 53, 236–241. [Google Scholar] [CrossRef]
Lakhani, A.I.; Chowdhury, M.A.; Lu, Q. Stability-preserving automatic tuning of PID control with reinforcement learning. arXiv 2021, arXiv:2112.15187. [Google Scholar] [CrossRef]
Ding, Y.; Ren, X.; Zhang, X.; Liu, X.; Wang, X. Multi-phase focused pid adaptive tuning with reinforcement learning. Electronics 2023, 12, 3925. [Google Scholar] [CrossRef]
Datta, A. Adaptive Internal Model Control; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Zaporozhets, A.O.; Zaporozhets, A.O. Research of the process of fuel combustion in boilers. In Control of Fuel Combustion in Boilers; Springer: Cham, Switzerland, 2020; pp. 35–60. [Google Scholar]
Chen, J.; Chang, Y.; Cheng, Y. Performance design of image-oxygen based cascade control loops for boiler combustion processes. Ind. Eng. Chem. Res. 2013, 52, 2368–2378. [Google Scholar] [CrossRef]
Xiao, G.; Gao, X.; Lu, W.; Liu, X.; Asghar, A.B.; Jiang, L.; Jing, W. A physically based air proportioning methodology for optimized combustion in gas-fired boilers considering both heat release and NOx emissions. Appl. Energy 2023, 350, 121800. [Google Scholar] [CrossRef]
Li, Y.; Zhang, T.; Das, S.; Shamma, J.; Li, N. Non-asymptotic system identification for linear systems with nonlinear policies. IFAC-PapersOnLine 2023, 56, 1672–1679. [Google Scholar] [CrossRef]
Tan, H. Reinforcement learning with deep deterministic policy gradient. In Proceedings of the 2021 International Conference on Artificial Intelligence, Big Data and Algorithms (CAIBDA), Xi’an, China, 28–30 May 2021; pp. 82–85. [Google Scholar]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
Nise, N.S. Control Systems Engineering; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
Rivera, D.E. Internal Model Control: A Comprehensive View; Arizona State University: Tempe, AZ, USA, 1999; pp. 85287–86006. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]

Figure 1. Overview of the boiler control system with the image-based combustion control system.

Figure 2. Equipment configuration for boiler ICC system experiment.

Figure 3. Architecture of actor–critic reinforcement learning with experience replay in DDPG.

Figure 4. DDPG-based architecture for image-based combustion control system with IMC-PID integration.

Figure 5. Training results for IMC-DPGA according to step number per episode,

N

.

Figure 5. Training results for IMC-DPGA according to step number per episode,

N

.

Figure 6. Parameter-wise KDE of IMC-DPGA training process ((A)

λ

(B)

K_{p}

(C)

K_{i}

(D)

K_{d}

) at N = 100.

Figure 6. Parameter-wise KDE of IMC-DPGA training process ((A)

λ

(B)

K_{p}

(C)

K_{i}

(D)

K_{d}

) at N = 100.

Figure 7. Cross-sectional KDE for detailed analysis of IMC-DPGA training process.

Figure 8. Comparison of control strategies for oxygen concentration step change.

Figure 9. Comparison of 4% and 5% steady-state responses for various controllers.

Table 1. Specifications of boiler and burner for OFB.

Boiler	Boiler Drum Type	Steam Production	Working Steam Pressure
Boiler	Cylindrical Water Tube	3000 kg/h	5.5~7 kg/cm²
Burner	Fuel Type	Fuel Oil Consumption	Air Supply Volume
Burner	LSMGO, 0.1% sulfur (DMA)	Min/Max: 68.5/205.5 kg/h	Min/Max: 1650~3700 m³/h

Table 2. Internal model and control elements in the ICC System.

${\bar{G}}_{I C C} (s)$	$f (s)$	$K_{I C C}^{'} (s)$
$\frac{k_{p} (β s + 1)}{(τ_{a} s + 1) (τ_{b} s + 1)}$	$\frac{η s + 1}{{(λ s + 1)}^{2}}$	$K_{p} (1 + \frac{K_{i}}{s} + K_{d} s)$
$K_{p}$	$K_{i}$	$K_{d}$
$\frac{1}{k_{p} λ (τ_{b} + τ_{s})}$	$\frac{1}{k_{p} λ {(τ_{b} + τ_{s})}^{2}}$	$\frac{τ_{b} τ_{s}}{k_{p} λ {(τ_{b} + τ_{s})}^{2}}$
$k_{p}$ $= 2.1865704 \times 10^{- 4}$ $, β$ $= 2.728423 \times 10^{3}$ $, τ_{a}$ $= 3.51 \times 10^{- 4}$ $, τ_{b}$ $= 1.887649$

Table 3. Training parameters used for the DDPG Agent.

Parameters	Actor	Critic
Network structure	[50 25 1]	[50 25 25 1]
Learning rate	10⁻⁴	10⁻³
Activation function	Tanh	ReLU
Optimization function	Adam	Adam
Early stopping patience	10
Mini-batch size	64
Discount factor	0.9
Replay buffer size	10⁴

Table 4. Training termination episodes and rewards for different

N

in IMC-DPGA training.

Table 4. Training termination episodes and rewards for different

N

in IMC-DPGA training.

$Step Count per Episode, N$	Episode	Last Reward
1	289	−0.205
50	194	−0.135
100	105	−0.05
200	158	−0.67

Table 5. Parameter-wise KDE result details from the IMC-DPGA training.

Control Paremeters	Range of Mode	Density	Mode
$λ$	0.435~1	63.27	0.44
$K_{p}$	2.41~5.59	5.16	5.48
$K_{i}$	1.28~2.98	9.73	2.90
$K_{d}$	0.00085~0.002	14,229	0.0019

Table 6. Response of the tuning method according to changes in the oxygen concentration target value.

Tuning Method	$M_{p}$	$I S E$
Z-N	0.1114	11.1966
$λ$ -T	0.0819	10.0912
IMC-MAC	0.1250	8.1189
IMC-DPGA	0.0631	7.7278

Table 7. Steady-state analysis of oxygen concentration at 4% and 5% for various controllers.

Tuning Method	4% Steady-State Response			5% Steady-State Response
Tuning Method	Median	U.A	L.A	Median	U.A	L.A
Z-N	4.0269	4.1069	3.932	5.022	5.112	4.9304
$λ$ -T	4.0331	4.136	3.9056	4.9863	5.0819	4.8698
IMC-MAC	4.0054	4.1029	3.8696	5.024	5.125	4.9024
IMC-DPGA	3.9968	4.0498	3.9444	5.0188	5.0631	4.974

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, C.-M.; Jung, B.-G. Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning. J. Mar. Sci. Eng. 2024, 12, 1603. https://doi.org/10.3390/jmse12091603

AMA Style

Lee C-M, Jung B-G. Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning. Journal of Marine Science and Engineering. 2024; 12(9):1603. https://doi.org/10.3390/jmse12091603

Chicago/Turabian Style

Lee, Chang-Min, and Byung-Gun Jung. 2024. "Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning" Journal of Marine Science and Engineering 12, no. 9: 1603. https://doi.org/10.3390/jmse12091603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Control of Ships’ Oil-Fired Boilers Using Flame Image-Based IMC-PID and Deep Reinforcement Learning

Abstract

1. Introduction

2. Image-Based Boiler Combustion Control System

2.1. Experimental Setup for Image-Based Combustion Control (ICC) System

2.2. Data-Driven System Modeling

3. Preliminaries

3.1. Internal Model Control-Based PID Control

3.2. Reinforcement Learning–Deep Deterministic Policy Gradient

4. Deep Deterministic Policy Gradient-Based Internal Model Control-PID Control

4.1. IMC-Based PID Controller for Image-Based Combustion Control System

4.2. Proposal of IMC-DPGA (Deep Policy Gradient Adaptive) Controller

4.3. Agent Environment Configuration

4.3.1. State and Action of Agent

4.3.2. Reward

5. Training and Experiments

5.1. Experimental Setup

5.2. Threshold Analysis

5.3. Experiment and Result Analysis

6. Compare Performance of Different Controllers

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI