Next Article in Journal
Optimization of Electric Vehicle Charging Control in a Demand-Side Management Context: A Model Predictive Control Approach
Previous Article in Journal
Study on the Optimization of Site Selection for Emergency Supply Reserve Warehouses Based on Multi-Attribute Weighted Intelligent Gray Target Decision-Making Evaluation Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Aortic Pressure Control Based on Deep Reinforcement Learning for Ex Vivo Heart Perfusion

Department of Instrument Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(19), 8735; https://doi.org/10.3390/app14198735
Submission received: 25 August 2024 / Revised: 20 September 2024 / Accepted: 23 September 2024 / Published: 27 September 2024
(This article belongs to the Section Biomedical Engineering)

Abstract

:
In ex vivo heart perfusion (EVHP), the control of aortic pressure (AoP) is critical for maintaining the heart’s physiologic aerobic metabolism. However, the complexity of and variability in cardiac parameters present a challenge in achieving the rapid and accurate regulation of AoP. In this paper, we propose a method of AoP control based on deep reinforcement learning for EVHP in Langendorff mode, which can adapt to the variations in cardiac parameters. Firstly, a mathematical model is developed by coupling the coronary artery and the pulsatile blood pump models. Subsequently, an aortic pressure control method based on the Deep Deterministic Policy Gradient (DDPG) algorithm is proposed. This method enables the regulation of the blood pump and the realization of closed-loop control. The control performance of the proposed DDPG method, the traditional proportional–integral–derivative (PID) method, and the fuzzy PID method are compared by simulating single and mixed changes in mean aortic pressure target values and coronary resistance. The proposed method exhibits superior performance compared to the PID and fuzzy PID methods under mixed factors, with 68.6% and 66.4% lower settling times and 70.3% and 54.1% lower overshoot values, respectively. This study demonstrates that the proposed DDPG-based method can respond more rapidly and accurately to different cardiac conditions than the conventional PID controllers.

1. Introduction

Heart transplantation remains the gold-standard treatment for end-stage heart failure. However, the issue of donor heart preservation has seriously affected the development of heart transplantation [1,2]. Ex vivo heart perfusion (EVHP) has received considerable attention in the field of heart transplantation due to its potential for minimizing ischemic injury [3]. Furthermore, it is difficult to replicate real hemodynamic conditions in vitro with conventional mock circulatory systems [4]. As a standard technique for EVHP, Langendorff perfusion mimics the physiology of the heart by connecting a heart to the EVHP system and perfusing it retrogradely along the aorta to the coronary arteries, thereby controlling the physiological activity of the heart [5]. In EVHP, maintaining optimal aortic perfusion pressure is crucial for sustaining the physiological aerobic metabolism of the heart [6]. However, the cardiac parameters can vary significantly due to changes in temperature and perfusion solution. The complexity of and variability in cardiac parameters make it challenging to achieve rapid and accurate aortic pressure control.
Becker et al. [7] employed feedback control to regulate flow changes during the perfusion of isolated livers, maintaining the perfusion pressure at a constant level to avoid the side effects of liver autoregulation. In the study of Langendorff perfusion of porcine hearts, Duignan et al. [8] performed servo regulation, setting a target mean aortic pressure (MAP) to ensure that the range of pressure variations remains within reasonable limits. These studies optimize the physiological environment of the organs and reduce organ damage by controlling constant pressure. Campos-Delgado et al. [9] designed a closed-loop servo control system using a PD controller to regulate renal perfusion pressure. However, these PID controllers require the fine-tuning of multiple parameters to adapt to changes in perfusion conditions. Huang et al. [10] achieved physiological perfusion by maintaining mean aortic pressure values using a fuzzy logic control method. However, the fuzzy control is highly dependent on expert experience and knowledge. As a living organ, the parameters of the organ are real time, variable, and complex. It is difficult for these control methods to adaptively control the pressure according to different perfusion conditions. Recently, Xin et al. [11] proposed the model reference adaptive control method for the regulation of aortic pressure in Langendorff perfusion, while Yao et al. [12] further proposed the semi-parametric adaptive control method for the regulation of aortic pressure, with the aim of maintaining the aerobic metabolism of the heart. This method reduces the time required for model updates while ensuring real-time control. It should be noted that a centrifugal pump is used in simulations and experiments. However, a pulsatile pump such as the TransMedics Organ Care System (TransMedics, Inc, Andover, MA, USA) is currently employed in the field of EVHP to deliver blood via pulsatile pressure [13,14]. Pulsatile blood flow mimics the physiological cardiac pattern of coronary perfusion, which is thought to reduce vascular resistance and thus be more conducive to microcirculation and organ perfusion [15]. However, the results of research on intelligent aortic pressure control strategies for pulsatile pumps with variable cardiac parameters in EVHP have yet to be published.
Intelligent strategies are capable of discerning the dynamic relationships and interactions within a specific organism, thereby regulating the level of the corresponding variables in a biological system [16]. With the rapid development of artificial intelligence, deep reinforcement learning has emerged as a powerful tool for dynamic control systems, such as blood glucose control, due to its strong learning capabilities [17]. It is necessary to explore the potential of intelligent control methods for aortic pressure in the perfusion process of an isolated heart, with the aim of improving the quality of heart preservation. In this paper, a closed-loop control method is developed based on the Deep Deterministic Policy Gradient (DDPG) algorithm, which is designed to regulate the MAP value within reasonable limits, preventing excessive fluctuations. This control method aims to meet the perfusion requirements in EVHP.
In order to describe the hemodynamic behavior in Langendorff mode, a lumped parameter model is developed to describe the coupling between the coronary artery and the pulsatile blood pump. An interaction model based on the perfusion conditions is constructed to learn model parameter variation in the isolated heart during heart perfusion in real time under conditions of variable cardiac parameters to deal with conditions of variations in coronary resistance. A closed-loop control method based on deep reinforcement learning is proposed to achieve a smooth and rapid response in terms of aortic pressure and flow regulation.

2. Materials and Methods

2.1. Lumped Parameter Model of the EVHP System in Langendorff Mode

In Langendorff perfusion, oxygenated perfusate is pumped into the aorta, the aortic valve closes, and the perfusate enters the coronary arteries retrogradely, providing adjustable coronary blood flow [18]. Langendorff perfusion is widely used because it is easy to use and works with a wide range of species [19].
A lumped parameter model can characterize hemodynamics over time, and hemodynamic characteristics can provide an understanding of the state of the cardiovascular system, which is very important for clinical diagnosis [20]. The EVHP model is simulated using MATLAB/SIMULINK version R2022b (The MathWorks, Natick, MA, USA) in order to replicate real physiological conditions. The EVHP model used in this study, which neglects the effect of the oxygenator, consists of a pulsatile pump model and a coronary perfusion model, as shown in Figure 1. The coronary perfusion model, which is based on an article regarding our previous research [21], consists of systemic impedance, the coronary artery, and the myocardial vessels. The blood flow in the coronary part mainly originates from the aortic root and eventually converges in the vein. The intramyocardial pressure and the internal pressure of the pulsatile pump are indicated by external pressure elements. In Langendorff perfusion, the heart is empty-beating, the left ventricular volume is essentially constant, and the aortic valve is closed [22], so for simplicity, we set the P I M C to 0. The parameters of the coronary perfusion model are shown in Table 1.
During the perfusion process, the pulsatile pump delivers oxygenated blood into a cannula that is connected to the aorta, thereby achieving coronary perfusion. As illustrated in Figure 2, the pump flow Q i n t is based on a sine wave with a pulsating flow rate, which is expressed by Equation (1).
Q i n t = v t · sin 2 π f t t d e l a y + v 0 t
where f = H R / 60 , H R denotes the heart rate, v t denotes the fluctuating gain of the pulsatile blood pump flow control, v 0 ( t ) denotes the constant part of the pulsatile blood pump flow that determines the average flow rate of the output, and t d e l a y denotes the time difference between the heart and the pump.
In this paper, the lumped parameter model of a pulsatile blood pump is employed to simulate the dynamics of the blood flow rate,   Q i n t can also be represented as follows:
Q i n t = C s a c d P O t d t
where C s a c is the blood lumen compliance inside the blood pump, and P O ( t ) is the blood pressure at the blood pump outlet valve.
Since heart valves work in a similar way to diodes, it is expected that a diode with forward conduction and infinite reverse resistance will be used to represent the valve in the lumped parameter model, as shown in Figure 1, where D I and D O denote the inlet and outlet valves. When P I t > P O t , the inlet valve opens and Q I t is shown in Equation (3); when P O t > P A t , the outlet valve opens and Q A t is shown in Equation (4).
L I d Q I t d t = P I t P O t Q I t R I
L O d Q A t d t = P O t P A t Q A t R O
where L I is the inlet inertia of the blood pump, Q I is the flow after the inlet valve of the blood pump, R I is the inlet resistance of the blood pump, L O is the outlet inertia of the blood pump, and R O is the outlet resistance of the blood pump.

2.2. Reinforcement Learning for Pulsatile Pump Control

The control of aortic pressure during Langendorff perfusion is highly species-dependent, as different species require different coronary flow rates. In the heart, coronary blood flow is closely related to myocardial energy delivery, and an increase in coronary resistance limits coronary blood flow and oxygen delivery, whereas a decrease in resistance increases blood flow and oxygen delivery to the heart [23]. Coronary vascular resistance tends to change with perfusion time and perfusate temperature; White et al. [24] found that there is a large difference in coronary artery resistance of isolated hearts perfused at three different temperatures (5 °C, 25 °C, and 35 °C) and that resistance decreases rapidly with increasing temperature. And there is variability in coronary artery resistance in different hearts [25]. Therefore, different perfusion strategies are being developed based on different coronary resistances to develop better perfusion decision protocols. Deep reinforcement learning combines the advantages of deep neural networks and reinforcement learning and is being explored in the medical field to learn the inherent laws of medical control problems and improve control effectiveness. The aim of this paper is to achieve control of AoP during perfusion based on the adaptive capability of deep reinforcement learning. Deep reinforcement learning can learn the coronary resistance of different hearts online, thus reducing the damage to the heart during the perfusion process. During heart perfusion, the demand for coronary blood flow must be met without providing excessive blood flow. We add a saturation module to detect and limit peaks in pump output to avoid overperfusion of the heart, so the algorithm is abbreviated as PD-DDPG in this paper.
In this paper, we use the output pump speed of the DDPG agent to control the output pressure and the flow rate of the heart perfusion model, simulate the sensors to detect its AoP and flow variations, and preprocess the signals to feed the computed state and reward values back to the agent. In clinical applications, noise may be present in the aortic pressure measurement part, leading to errors in the control process, which requires noise reduction in the measured pressure values in the pre-processing stage of MAP estimation in order to improve the adaptability and robustness of the control method to the measurement noise. To simulate the different states of the heart, we stochastically initialize the parameters of the heart model. Figure 3 shows the schematic diagram of the control algorithm used in this paper. The state, action, and reward are as follows.
  • State
During the perfusion process, it is important to maintain an adequate MAP to minimize the delivery of maintenance solutions. In addition to the MAP value, we choose the error value (ΔMAP), and the sum of the error values of MAP is selected as the state parameters and kept within a reasonable range. The state parameters are as follows:
s t = M A P k , M A P k , M A P k
where t is the sampling time and k is the number of sampling cycles. The parameters can be obtained from the following equations:
M A P k = 1 T k t t + t P A t d t
M A P k = M A P k M A P r e f k
where t is the duration of the entire cardiac cycle ( Δ t = 60 / H R ), T k denotes the number of AoP sampling values throughout the cardiac cycle, and M A P r e f is the reference value of the MAP, which is usually set according to the metabolic state of the heart during the perfusion period.
2.
Action
Blood pump flow is related to aortic pressure and coronary flow. The aortic pressure value is varied by changing the flow rate of the pulsatile blood pump to achieve control under different perfusion conditions. The action parameters are as follows:
a t = v * t , v 0 * t
where v * t and v 0 * t denote the normalized v t and v 0 ( t ) , which is the output value of the agent.
The pump action affects the aortic blood flow rate, and thus, if the flow rate is too high, myocardial edema can easily occur. Conversely, if the flow rate is too low, insufficient oxygenation can easily occur. To avoid either too high or too low a flow rate, a saturation function σ s is introduced to limit the pump output and ensure the safety of the control system.
σ s t = s a t v 1 , s a t v 2 , , s a t v n
s a t v i = s m a x                             i f   v i s m a x                             v i                       i f   s m i n < v i < s m a x                 s m i n                             i f   v i s m i n                                        
where n is the total number of changes in perfusion conditions, i is the number of changes in perfusion conditions, and s m a x and s m i n denote the saturation limit.
3.
Reward
In order to evaluate the efficiency of the MAP value control, it is included in the reward as a reference for a comprehensive assessment of the perfusion impact. The immediate reward is shown in Equation (11), while the value of β is shown in Equation (12).
r s t , a t = β M A P k
β M A P [ k ] = 5             i f   M A P k 2 1       e l s e
The cumulative discount bonus per episode is illustrated in Equation (13), in which γ represents the discount factor, taking values between 0 and 1.
R = t = 0 T γ t r s t , a t
The Deep Deterministic Policy Gradient (DDPG) algorithm is a model-free, non-strategic actor–critic network with a continuous action space. It is a combination of the Deep Q Network (DQN) and the Deterministic Policy Gradient (DPG) algorithm [26]. The optimal action in a given state is selected by combining a policy function with a value function, with the aim of obtaining the maximum reward.
The critic network comprises two input layers, the first of which is determined by the number of state values, followed by a fully connected layer with 256 nodes and a fully connected layer with 128 nodes; the second input layer is determined by the number of action values, followed by a fully connected layer with 128 hidden nodes. The two input layers are connected to the final fully connected layer, which serves as the output. The output action of the critic network is the evaluated value of the final network result. In order to ensure that the output result of the network is an action value for evaluating the output action of the actor network, the final action value Q is output using the tanh activation function. The loss function is shown in Equation (14).
L θ Q = 1 N t Q s t , a t | θ Q y t 2
where N denotes the number of experiences in the memory cache, Q denotes the critic network, and θ Q represents the weight associated with the critic network’s representation of the Q function. The critic network Q s t , a t | θ Q is utilized to illustrate the extent to which the gains from the actor network concerning a specific input s t , a t are approximated by a function of the target value. y t represents the target value, and the formula is shown in Equation (15), where Q denotes the target critic network. And Equation (16) is the updated equation for the parameter θ Q and τ ≪ 1.
y t = r s t , a t + γ Q s t + 1 , μ ( s t + 1 ) | θ Q
θ Q = τ θ Q + 1 τ θ Q
The actor network is constituted by a fully connected layer network with two hidden layers, which serves to facilitate the expedient comprehension of an optimal policy. The hidden layers comprise 256 neurons and the activation function is the rectified linear unit (ReLU) function. The output layer contains nodes that correspond to the number of actions of the agent, which is 2. The activation function is the tanh function. The gradient is adjusted by gradient updating in order to maximize the objective function J of the learning actor network μ s t θ μ as illustrated in Equation (17):
θ μ J 1 N t [ a Q s , a | θ Q | s = s t , a = μ s t θ μ μ s θ μ | s = s t ]
where μ ( s t | θ μ ) is a parameterized policy function and θ μ is a parameter of the actor network, which is updated as shown in Equation (18):
θ μ = τ θ μ + 1 τ θ μ
In the last layer of the network, the tanh activation function is used to constrain the values of the network’s output action to the range of −1 to 1. The network output actions are converted to actions within the range of continuous actions. The DDPG agent parameter settings are shown in Table 2.
During the initialization of each training round, the values of HR, M A P r e f , and R C A are randomly assigned values from a uniform distribution within the ranges specified in Table 3. In this paper, the step function variation is employed to cope with the extreme variation in perfusion state parameters. The target mean aortic perfusion pressure in the experimental setting is mainly 20–80 mmHg [6]; thus, the range of stochastic M A P r e f is constrained. Following the training phase, the strategy can be validated to adapt to the variations under different perfusion conditions. The network of this method can converge in less than 400 timesteps, and the simulations are run on an Intel Xeon W-2245 3.9 GHz processor with 32 GB of RAM, which requires approximately 2 h of training to complete the interaction with the simulated environment. In the test, the trained model is used to predict the control behavior for a variety of situations.

3. Results

3.1. Model Veritification

Hatami et al. [27] mention that an AoP value of at least 30 mmHg should be maintained in order to achieve coronary perfusion in Langendorff mode. To validate the model, the M A P r e f value is set at 50 mmHg, which aligns with the perfusion conditions in the Langendorff-mode perfusion experiment [18,28]. In this paper, we analyze the AoP and AoF waveforms under the control of the deep reinforcement learning method with the Q i n t value being adjusted. The results are shown in Figure 4, which demonstrates that the model can accurately describe the blood flow during perfusion within the normal physiological range. The mean AoF value is 10.5 mL/s. The mean values of AoP and AoF are within the range of the results of the experimental studies by Xin et al. [18]. The model demonstrates the ability to regulate the expected aortic pressure value, which may contribute to the avoidance of underperfusion or overperfusion.

3.2. Heartbeat Change Conditions

It has been demonstrated that an increase in HR causes inadequate perfusion of the endocardium, making it susceptible to ischemia, which in turn affects the myocardial perfusion status [29]. In order to examine the impact of HR on the performance of the AoP control method, in this paper, we simulate the effect of AoP tracking by setting R C A to 1.2 m m H g · s / m L and M A P r e f to 50 mmHg. Furthermore, we simulate the two cases of HR increase. A 20 bpm increase at 25 s, followed by a 10 bpm decrease at 50 s, is implemented. The AoP tracking results are illustrated in Figure 5, and the settling times of the control method in this paper are 1.25 s and 1.33 s, respectively. This indicates that the control method in this paper has a rapid response time and can also achieve a stable change.

3.3. AoP Change Conditions

In order to assess the ability of the proposed controller to respond effectively to an extreme situation, namely a sudden change in target aortic pressure, we set R C A to 1.2 m m H g · s / m L and HR to 100 bpm. Moreover, we define step sizes of 10 mmHg, 20 mmHg, and 30 mmHg. In order to provide a basis for comparison, we conduct a series of experiments with the proposed method, as well as two commonly used methods from the control field: the PID and the fuzzy PID control methods. We modify parts (c) and (d) of Figure 3 to apply a PID controller and a fuzzy PID controller, respectively, while maintaining the configuration of parts (a) and (b). In the case of the PID method, the proportionality constant K p is set to 0.94, the integral constant K i is set to 0.30, and the derivative constant K d is set to −0.10, following a process of parameter tuning. In the case of the fuzzy PID method, the parameters are optimized in accordance with specific fuzzy rules through a process of several iterations, resulting in the selection of K p = 0.83, K i = 0.10, and K d = 0.10 as coefficients for a step size of 30 mmHg. In the PD-DDPG method, the network can converge in less than 400 timesteps.
The results of the simulation, in terms of the AoP and mean AoF, are presented in Figure 6. The results demonstrate that the PID control method requires a control time of 26.9 s and exhibits a pressure overshoot of 15.6 mmHg when the M A P r e f undergoes a sudden change of 30 mmHg. This is mainly because the parameters of the PID method cannot be modified according to the varying perfusion conditions. In comparison, the fuzzy PID method exhibits superior performance. The pressure smoothing time is 19.3 s, the overshoot is 10.8 mmHg, and the overshoot pressure is lower than that of the PID method. In contrast, the PD-DDPG method exhibits a slight overshoot of 1.3 mmHg when controlling the aortic pressure change. Moreover, the AoP could be rapidly smoothed within 4.2 s, and the mean AoF overshoot is less than 1.1 mL/s. This indicates that the PD-DDPG method is capable of maintaining smooth pressure and flow changes within a relatively short period of time. This is primarily due to the advantages of the agent in learning the optimal control strategy.

3.4. RCA Change Conditions

During EVHP experiments, the coronary arteries exhibit a loss of autoregulatory function, accompanied by a reduction in coronary vascular resistance of approximately 50% within a six-hour period [30]. Moreover, Xin et al. [18] found a similar trend toward a decline in coronary resistance during Langendorff perfusion, with functional evaluation conducted within the initial hour. The ability of coronary resistance vessels to dilate in response to elevated myocardial oxygen demand is of critical importance for maintaining adequate myocardial oxygenation. Accordingly, simulations incorporating varying reductions in coronary resistance are conducted to assess the performance of this control method. The HR is set at 100 bpm, the M A P r e f value is maintained at 60 mmHg, and the model regulates flow through the aorta in accordance with the level of coronary resistance. As illustrated in Figure 7, when the R C A value varies between 1.2 and 0.6 m m H g · s / m L , the regulation of MAP by the PD-DDPG method is completed within 4.3 s, and the flow exhibits a smooth and rapid transition, with a flow overshoot of 0.42 mL/s. In contrast, all other control methods require 9.5 s to regulate the aortic pressure. It can be observed that a reduction in R C A results in an increase in AoF, which maintains the AoP value at a stable level. The mean flow fluctuation time of the PD-DDPG method is significantly lower than that of the other methods.

3.5. Mixed Conditions

Figure 8 illustrates the comparison of the tracking results of AoP between the PD-DDPG method proposed in this paper and the control effects of the PID method and the fuzzy PID methods in mixed-condition experiments. The stability of the method proposed in this paper in mixed factors is validated through the selection of diverse combinations of conditions, including stepwise increases and decreases in the target MAP value and increases and decreases in the coronary resistance. The proposed method in this paper is observed to achieve a rapid and stable pressure tracking effect, with an insignificant overshoot value of MAP within a range of 3.5 mmHg, in comparison to the traditional PID method and the fuzzy PID method. Furthermore, when the target MAP value and the trend in the change in coronary artery resistance are consistent, the tracking results of the PID and fuzzy PID methods demonstrate a significant overshoot of approximately 10 mmHg, which may potentially lead to damage to the myocardium and impact the perfusion results. In contrast, the pressure tracking results of the PD-DDPG method are relatively optimal.
In order to facilitate a comparative analysis of the control performance of the three control methods, namely PD-DDPG, fuzzy PID, and PID, under mixed conditions, four performance indicators are selected for evaluation. The following parameters of rise time, settling time, overshoot, and mean absolute percentage error (MAPE) are considered over the 20 s following the step change. The evaluation results are presented in Figure 9. With the exception of the rise time, the values of the remaining three indicators exhibit considerable variation. The settling time for the PD-DDPG method is 2.56 s, while the PID and fuzzy PID methods require 8.16 s and 7.62 s, respectively. The overshoot of the PD-DDPG control of aortic pressure is 6.57%, while the other two methods exhibit values of 22.09% and 14.31%, respectively. The proposed method also exhibits the lowest MAPE value of 2.46%, indicating superior accuracy in control outcomes compared to the other two methods. Therefore, the PD-DDPG method not only demonstrates precise control of the aortic pressure but also facilitates rapid and smooth pressure changes.

4. Discussion

Some of the current papers on the perfusion of isolated hearts in Langendorff mode artificially regulate aortic pressure or servo regulate aortic pressure [8,31]. However, it should be noted that individual heart conditions may vary in heart perfusion, which may affect the tuning parameters. Accordingly, we propose a reinforcement learning-based method to learn the law of the perfusion pressure under the influence of the pumping conditions and the heart’s conditions and to optimize the perfusion conditions for the objective of prolonging heart preservation. In light of the limitations of manual and PID control in adapting to uncertain time-varying cardiac parameters, Xin et al. [11] developed a model-adaptive reference method for AoP regulation, which proved to be an effective solution. The method is simulated for two different cardiac parameter conditions and heart rate irregularities. Furthermore, Yao et al. [12] present a semi-parametric adaptive control method for regulating AoP in a set of cardiac parameters and vascular resistance increase conditions. The robustness of the DDPG method proposed in this paper is validated by simulating and testing single and mixed conditions of heart rate variability, target aortic pressure step size, and coronary flow resistance variability. In order to validate the effectiveness of the control method in this experiment, the DDPG algorithm is compared with conventional PID and fuzzy PID methods to test the algorithm’s adaptability to different perfusion conditions. Moreover, the algorithm’s performance is evaluated through the utilization of step change signals, which allow for the assessment of the algorithm’s performance at different extremes due to the effect of EVHP. The simulation results demonstrate that the proposed method can effectively achieve rapid and stable aortic pressure control.
In this paper, the adaptability of the algorithm under different perfusion conditions is improved by using the DDPG agent to learn the AoP control strategy under different initial conditions, and then realizing real-time control by changing its output gain. Reinforcement learning enables more flexible and precise control of aortic pressure in Langendorff perfusion, which plays a key role in maintaining cardiac function and optimizing cardiac metabolism. There is evidence that excessive pressure can damage endothelial cells, so smaller overshoots may help prolong cardiac preservation [30]. The method proposed in this paper produces a small pressure overshoot, within 4 mmHg, in both single and mixed conditions. And the simulation results are compared with the pressure and flow ranges of Transmedics, and it is found that the aortic flow values are a little lower at the corresponding pressures, but the main objective of this paper is to preliminarily validate the feasibility of the algorithm based on the mathematical model of heart perfusion. When there is an increasing abrupt change in the M A P r e f value, the mean AoF value also increases, which is consistent with the previous simulation results [11]. Changes in coronary resistance are influenced by temperature, pH, and partial pressure of the perfusate, which can lead to changes in aortic pressure. Aortic pressure control is essential to avoid overperfusion or myocardial ischemia, especially in cardiac perfusion studies. During the control process, only one pressure transducer is required to measure the aortic pressure, thus reducing the damage to the donor heart. Moreover, the method can be extended to different types of blood pumps, such as the Impella (Abiomed, Danvers, MA, USA), which incorporates an optical pressure sensor to measure the proxy aortic pressure [32]. The network of the method in this paper converges in less than 400 timesteps and is suitable for deployment in a simulation environment. The samples obtained from the EVHP experiments are very valuable and can therefore be used in the evaluation phase of the method. The perfusion model in this paper focuses on perfusion conditions in the Langendorff mode. In the pump-supported working mode, a proportion of the blood is pumped to the left atrium and aortic pressure is maintained [18]. Therefore, by adding models such as the left atrium and the left ventricle, and by adjusting the direction of blood flow from the blood pump, the method can be extended to the conditions of the working mode.
This paper employs mathematical modeling to examine the feasibility of deep reinforcement learning control methods as a preliminary investigation into improved pressure control methods for EVHP. The simulation results demonstrate the effectiveness and robustness of the method presented in this paper with regard to Langendorff perfusion. However, the model does not fully reflect the actual conditions of EVHP, including variations in blood biochemical parameters, markers of cardiac injury, neurohumoral regulation, and other factors that may reduce the reliability of control. It is recommended that the control of AoP should take into account cardiac metabolism, which is not considered in this paper due to the limitations of the model. In clinical applications, metabolism-related parameters can be input into the model as state values and the reward function can be optimized by weighted averaging. However, this approach increases the number of states, which increases computational cost and time. Alternatively, it can be considered to optimize the control strategy for aortic pressure by evaluating the relationship between the metabolic parameter and the target aortic pressure, so that the metabolic parameter is included in the model considerations. Therefore, the next step is to apply the algorithm to the mechanical circulatory system to verify its reliability. Although the maximum output flow rate of the pump is limited to ensure the safety of the control system, if there is a system failure during the clinical trials of the EVHP, attempts can be made to initialize the system or use manual control to ensure the safety of the system output.

5. Conclusions

The method proposed in this paper provides a future strategy for solving the problem of AoP control during Langendorff-mode perfusion. Deep reinforcement learning can learn various heart rate and coronary artery resistance information online, thus reducing the damage suffered during the process of heart perfusion. The flexibility and stability of the PD-DDPG algorithm are verified by simulating a variety of conditions, including changes in HR, increases in M A P r e f , and decreases in R C A , and comparing the reinforcement learning method with the PID method. The aortic pressure control method presented in this paper adapts to variations in cardiac parameters, thereby achieving a smooth and rapid response in aortic pressure and blood flow. This may provide an optimal perfusion pressure control strategy that could prove beneficial in reducing myocardial damage caused by the perfusion process. Furthermore, future studies should investigate how to enhance the compatibility and safety of the control system and apply it to the EVHP system.

Author Contributions

Conceptualization, S.W., M.Y., Y.L. and J.Y.; methodology, S.W., M.Y., Y.L. and J.Y.; software, S.W., Y.L. and J.Y.; validation, S.W., M.Y. and Y.L.; formal analysis, S.W., M.Y., Y.L. and J.Y.; investigation, S.W., Y.L. and J.Y.; data curation, S.W., Y.L. and J.Y.; writing—original draft preparation, S.W.; writing—review and editing, S.W., M.Y., Y.L. and J.Y.; project administration, M.Y.; funding acquisition, M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant 2022YFC2402601.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Moonsamy, P.; Axtell, A.L.; Ibrahim, N.E.; Funamoto, M.; Tolis, G.; Lewis, G.D.; D’Alessandro, D.A.; Villavicencio, M.A. Survival after Heart Transplantation in Patients Bridged with Mechanical Circulatory Support. J. Am. Coll. Cardiol. 2020, 75, 2892–2905. [Google Scholar] [CrossRef]
  2. Jawitz, O.K.; Raman, V.; DeVore, A.D.; Mentz, R.J.; Patel, C.B.; Rogers, J.; Milano, C. Increasing the United States heart transplant donor pool with donation after circulatory death. J. Thorac. Cardiovasc. Surg. 2020, 159, e307–e309. [Google Scholar] [CrossRef]
  3. Wang, L.; MacGowan, G.A.; Ali, S.; Dark, J.H. Ex situ heart perfusion: The past, the present, and the future. J. Heart Lung Transplant. 2021, 40, 69–86. [Google Scholar] [CrossRef]
  4. Rocchi, M.; Ingram, M.; Claus, P.; D’hooge, J.; Meyns, B.; Fresiello, L. Use of 3D anatomical models in mock circulatory loops for cardiac medical device testing. Artif. Organs 2023, 47, 260–272. [Google Scholar] [CrossRef]
  5. Messer, S.; Ardehali, A.; Tsui, S. Normothermic donor heart perfusion: Current clinical experience and the future. Transpl. Int. 2015, 28, 634–642. [Google Scholar] [CrossRef]
  6. Hatami, S.; Freed, D.H. Machine Perfusion of Donor Heart: State of the Art. Curr. Transplant. Rep. 2019, 6, 242–250. [Google Scholar] [CrossRef]
  7. Becker, D.; Hefti, M.; Schuler, M.J.; Borrego, L.B.; Hagedorn, C.; Muller, X.; Graf, R.; Dutkowski, P.; Tibbitt, M.W.; Onder, C.; et al. Model Assisted Analysis of the Hepatic Arterial Buffer Response during Ex Vivo Porcine Liver Perfusion. IEEE Trans. Biomed. Eng. 2020, 67, 667–678. [Google Scholar] [CrossRef]
  8. Duignan, T.; Guariento, A.; Doulamis, I.P.; Kido, T.; Regan, W.L.; Saeed, M.; Hoganson, D.M.; Emani, S.M.; Del Nido, P.J.; McCully, J.D.; et al. A Multi-Mode System for Myocardial Functional and Physiological Assessment during Ex Situ Heart Perfusion. J. Extra Corpor. Technol. 2020, 52, 303–313. [Google Scholar] [CrossRef]
  9. Campos-Delgado, D.U.; Bonilla, I.; Rodríguez-Martínez, M.; Sánchez-Briones, M.E.; Ruiz-Hernández, E. Closed-Loop Control of Renal Perfusion Pressure in Physiological Experiments. IEEE Trans. Biomed. Eng. 2013, 60, 1776–1784. [Google Scholar] [CrossRef]
  10. Huang, F.; Ruan, X.; Fu, X. Pulse-Pressure–Enhancing Controller for Better Physiologic Perfusion of Rotary Blood Pumps Based on Speed Modulation. ASAIO J. 2014, 60, 269–279. [Google Scholar] [CrossRef]
  11. Xin, L.; Yao, W.; Peng, Y.; Qi, N.; Xie, S.; Ru, C.; Badiwala, M.; Sun, Y. Model Reference Adaptive Control for Aortic Pressure Regulation in Ex Vivo Heart Perfusion. IEEE Trans. Control Syst. Technol. 2021, 29, 884–892. [Google Scholar] [CrossRef]
  12. Yao, W.; Xin, L.; Du, D.; Song, H.; Badiwala, M.; Sun, Y. Semiparametric Model-Based Adaptive Control for Aortic Pressure Regulation in Ex Situ Heart Perfusion. IEEE Trans. Ind. Electron. 2023, 70, 6131–6140. [Google Scholar] [CrossRef]
  13. Truby, L.K.; Casalinova, S.; Patel, C.B.; Agarwal, R.; Holley, C.L.; Mentz, R.J.; Milano, C.; Bryner, B.; Schroder, J.N.; Devore, A.D. Donation after Circulatory Death in Heart Transplantation: History, Outcomes, Clinical Challenges, and Opportunities to Expand the Donor Pool. J. Card. Fail. 2022, 28, 1456–1463. [Google Scholar] [CrossRef]
  14. Pahuja, M.; Case, B.C.; Molina, E.J.; Waksman, R. Overview of the FDA’s Circulatory System Devices Panel virtual meeting on the TransMedics Organ Care System (OCS) Heart—Portable extracorporeal heart perfusion and monitoring system. Am. Heart J. 2022, 247, 90–99. [Google Scholar] [CrossRef]
  15. Gómez-Hospital, J.A.; Ferreiro, J.L. Percutaneous circulatory support in high-risk PCI: Pulsatile or continuous flow devices? Int. J. Cardiol. 2022, 366, 80–81. [Google Scholar] [CrossRef]
  16. Ahmed, Z.; Mohamed, K.; Zeeshan, S.; Dong, X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database 2020, 2020, baaa010. [Google Scholar] [CrossRef]
  17. Zhu, T.; Li, K.; Herrero, P.; Georgiou, P. Basal Glucose Control in Type 1 Diabetes Using Deep Reinforcement Learning: An In Silico Validation. IEEE J. Biomed. Health Inform. 2021, 25, 1223–1232. [Google Scholar] [CrossRef]
  18. Xin, L.; Gellner, B.; Ribeiro, R.V.P.; Ruggeri, G.M.; Banner, D.; Meineri, M.; Rao, V.; Zu, J.; Badiwala, M.V. A New Multi-Mode Perfusion System for Ex Vivo Heart Perfusion Study. J. Med. Syst. 2017, 42, 25. [Google Scholar] [CrossRef]
  19. Bell, R.M.; Mocanu, M.M.; Yellon, D.M. Retrograde heart perfusion: The Langendorff technique of isolated heart perfusion. J. Mol. Cell. Cardiol. 2011, 50, 940–950. [Google Scholar] [CrossRef]
  20. Garber, L.; Khodaei, S.; Keshavarz-Motamed, Z. The Critical Role of Lumped Parameter Models in Patient-Specific Cardiovascular Simulations. Arch. Comput. Methods Eng. 2022, 29, 2977–3000. [Google Scholar] [CrossRef]
  21. Zhu, Y.; Yang, M.; Zhang, Y.; Meng, F.; Yang, T.; Fang, Z. Effects of Pulsatile Frequency of Left Ventricular Assist Device (LVAD) on Coronary Perfusion: A Numerical Simulation Study. Med. Sci. Monit. 2020, 26, e925367. [Google Scholar] [CrossRef]
  22. Xin, L.; Yao, W.; Peng, Y.; Lu, P.; Ribeiro, R.; Wei, B.; Gellner, B.; Simmons, C.; Zu, J.; Sun, Y.; et al. Primed Left Ventricle Heart Perfusion Creates Physiological Aortic Pressure in Porcine Hearts. ASAIO J. 2020, 66, 55–63. [Google Scholar] [CrossRef]
  23. Goodwill, A.G.; Baker, H.E.; Dick, G.M.; McCallinhart, P.E.; Bailey, C.A.; Brown, S.M.; Man, J.J.; Tharp, D.L.; Clark, H.E.; Blaettner, B.S.; et al. Mineralocorticoid receptor blockade normalizes coronary resistance in obese swine independent of functional alterations in Kv channels. Basic Res. Cardiol. 2021, 116, 35. [Google Scholar] [CrossRef]
  24. White, C.W.; Avery, E.; Müller, A.; Li, Y.; Le, H.; Thliveris, J.; Arora, R.C.; Lee, T.W.; Dixon, I.M.C.; Tian, G.; et al. Avoidance of Profound Hypothermia auring Initial Reperfusion Improves the Functional Recovery of Hearts Donated after Circulatory Death. Am. J. Transplant. 2016, 16, 773–782. [Google Scholar] [CrossRef]
  25. Konst, R.E.; Guzik, T.J.; Kaski, J.-C.; Maas, A.H.E.M.; Elias-Smale, S.E. The pathogenic role of coronary microvascular dysfunction in the setting of other cardiac or systemic conditions. Cardiovasc. Res. 2020, 116, 817–828. [Google Scholar] [CrossRef]
  26. Stevens, T.S.W.; Tigrek, R.F.; Tammam, E.S.; Sloun, R.J.G.v. Automated Gain Control Through Deep Reinforcement Learning for Downstream Radar Object Detection. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August 2021; pp. 1780–1784. [Google Scholar]
  27. Hatami, S.; White, C.W.; Ondrus, M.; Qi, X.; Buchko, M.; Himmat, S.; Lin, L.; Cameron, K.; Nobes, D.; Chung, H.-J.; et al. Normothermic Ex Situ Heart Perfusion in Working Mode: Assessment of Cardiac Function and Metabolism. J. Vis. Exp. 2019, 143, e58430. [Google Scholar] [CrossRef]
  28. Aupperle, H.; Garbade, J.; Ullmann, C.; Schneider, K.; Krautz, C.; Dhein, S.; Gummert, J.F.; Schoon, H.-A. Comparing the ultrastructural effects of two different cardiac preparation- and perfusion-techniques in a porcine model of extracorporal long-term preservation. Eur. J. Cardio-Thorac. Surg. 2007, 31, 214–221. [Google Scholar] [CrossRef] [PubMed]
  29. Ge, X.; Simakov, S.; Liu, Y.; Liang, F. Impact of Arrhythmia on Myocardial Perfusion: A Computational Model-Based Study. Mathematics 2021, 9, 2128. [Google Scholar] [CrossRef]
  30. Qi, X.; Hatami, S.; Bozso, S.; Buchko, M.; Forgie, K.A.; Olafson, C.; Khan, M.; Himmat, S.; Wang, X.; Nobes, D.S.; et al. The evaluation of constant coronary artery flow versus constant coronary perfusion pressure during normothermic ex situ heart perfusion. J. Heart Lung Transplant. 2022, 41, 1738–1750. [Google Scholar] [CrossRef]
  31. Kaffka genaamd Dengler, S.E.; Mishra, M.; van Tuijl, S.; de Jager, S.C.; Sluijter, J.P.; Doevendans, P.A.; van der Kaaij, N.P. Validation of the slaughterhouse porcine heart model for ex-situ heart perfusion studies. Perfusion 2024, 39, 555–563. [Google Scholar] [CrossRef] [PubMed]
  32. Chang, B.Y.; Moyer, C.; Katerji, A.E.; Keller, S.P.; Edelman, E.R. A Scalable Approach to Determine Intracardiac Pressure from Mechanical Circulatory Support Device Signals. IEEE Trans. Biomed. Eng. 2021, 68, 905–913. [Google Scholar] [CrossRef] [PubMed]
Figure 1. A lumped parameter model for an EVHP system in Langendorff mode. R —vascular resistance, C —vascular compliance, L —vascular inertia, D —valve, I —pump inlet, O —pump outlet, A —aorta, C A —coronary artery, I M C —intramyocardial vessels, Q i n —pump flow, P —pressure, Q A a o r t i c f l o w ( A o F ), R s —system resistance, C s —aortic compliance, L s —system flow inertia, R M P , R M 1 , and R M 2 —myocardial resistances.
Figure 1. A lumped parameter model for an EVHP system in Langendorff mode. R —vascular resistance, C —vascular compliance, L —vascular inertia, D —valve, I —pump inlet, O —pump outlet, A —aorta, C A —coronary artery, I M C —intramyocardial vessels, Q i n —pump flow, P —pressure, Q A a o r t i c f l o w ( A o F ), R s —system resistance, C s —aortic compliance, L s —system flow inertia, R M P , R M 1 , and R M 2 —myocardial resistances.
Applsci 14 08735 g001
Figure 2. The blood flow curve of the pulsatile pump.
Figure 2. The blood flow curve of the pulsatile pump.
Applsci 14 08735 g002
Figure 3. Schematic diagram of the novel EVHP coupled reinforcement learning control system.
Figure 3. Schematic diagram of the novel EVHP coupled reinforcement learning control system.
Applsci 14 08735 g003
Figure 4. Hemodynamic dynamics of physiological control methods in this paper. (a) AoP; (b) AOF.
Figure 4. Hemodynamic dynamics of physiological control methods in this paper. (a) AoP; (b) AOF.
Applsci 14 08735 g004
Figure 5. Tracking result during step changes in HR. (a) MAP tracking results; (b) HR: 80 bpm to 100 bpm; (c) HR: 100 bpm to 90 bpm.
Figure 5. Tracking result during step changes in HR. (a) MAP tracking results; (b) HR: 80 bpm to 100 bpm; (c) HR: 100 bpm to 90 bpm.
Applsci 14 08735 g005
Figure 6. Tracking result with a step change in M A P r e f . (a,c,e): The MAP tracking results for M A P r e f values from 30 to 40, 50, and 60 mmHg; (b,d,f): The mean AoF tracking results for M A P r e f values from 30 to 40, 50, and 60 mmHg.
Figure 6. Tracking result with a step change in M A P r e f . (a,c,e): The MAP tracking results for M A P r e f values from 30 to 40, 50, and 60 mmHg; (b,d,f): The mean AoF tracking results for M A P r e f values from 30 to 40, 50, and 60 mmHg.
Applsci 14 08735 g006
Figure 7. Tracking result for step changes in R C A . (a,c,e): The MAP tracking results for R C A value from 1.2 to 1.0, 0.8, and 0.6 m m H g · s / m L ; (b,d,f): The mean AoF tracking results for R C A value from 1.2 to 1.0, 0.8, and 0.6 m m H g · s / m L .
Figure 7. Tracking result for step changes in R C A . (a,c,e): The MAP tracking results for R C A value from 1.2 to 1.0, 0.8, and 0.6 m m H g · s / m L ; (b,d,f): The mean AoF tracking results for R C A value from 1.2 to 1.0, 0.8, and 0.6 m m H g · s / m L .
Applsci 14 08735 g007
Figure 8. Tracking result with step changes in M A P r e f   a n d   R C A . (a) M A P r e f : 30 to 50 mmHg, R m : 0.5 to 0.9 m m H g · s / m L ; (b) M A P r e f : 30 to 50 mmHg, R C A : 0.9 to 0.5 m m H g · s / m L ; (c) M A P r e f : 50 to 30 mmHg, R C A : 0.5 to 0.9 m m H g · s / m L ; (d) M A P r e f : 50 to 30 mmHg, R C A : 0.9 to 0.5 m m H g · s / m L .
Figure 8. Tracking result with step changes in M A P r e f   a n d   R C A . (a) M A P r e f : 30 to 50 mmHg, R m : 0.5 to 0.9 m m H g · s / m L ; (b) M A P r e f : 30 to 50 mmHg, R C A : 0.9 to 0.5 m m H g · s / m L ; (c) M A P r e f : 50 to 30 mmHg, R C A : 0.5 to 0.9 m m H g · s / m L ; (d) M A P r e f : 50 to 30 mmHg, R C A : 0.9 to 0.5 m m H g · s / m L .
Applsci 14 08735 g008
Figure 9. The comparative results of PID, fuzzy PID, and PD-DDPG methods.
Figure 9. The comparative results of PID, fuzzy PID, and PD-DDPG methods.
Applsci 14 08735 g009
Table 1. The parameters of the coronary perfusion model.
Table 1. The parameters of the coronary perfusion model.
ParameterValueParameterValue
RS 0.6   m m H g · s / m L LCA 0.007   m m H g · s 2 / m L
CS 0.5   m L / m m H g R M P 37.5   m m H g · s / m L
LS 0.12   m m H g · s 2 / m L R M 1 9.75   m m H g · s / m L
CCA 0.6   m L / m m H g CIMC 0.001   m L / m m H g
Table 2. Hyperparameters of PD-DDPG.
Table 2. Hyperparameters of PD-DDPG.
ParameterValue
Max episodes1000
Minibatch size64
Learning rate of actor0.01
Learning rate of critic0.01
Stop training value500
Discount factor0.995
Table 3. The parameter for initialization.
Table 3. The parameter for initialization.
ParameterThe Range of ValueUnit
HR[50, 120]bpm
M A P r e f [20, 80]mmHg
R C A [0.5, 1.2] m m H g · s / m L
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Yang, M.; Liu, Y.; Yu, J. Aortic Pressure Control Based on Deep Reinforcement Learning for Ex Vivo Heart Perfusion. Appl. Sci. 2024, 14, 8735. https://doi.org/10.3390/app14198735

AMA Style

Wang S, Yang M, Liu Y, Yu J. Aortic Pressure Control Based on Deep Reinforcement Learning for Ex Vivo Heart Perfusion. Applied Sciences. 2024; 14(19):8735. https://doi.org/10.3390/app14198735

Chicago/Turabian Style

Wang, Shangting, Ming Yang, Yuan Liu, and Junwen Yu. 2024. "Aortic Pressure Control Based on Deep Reinforcement Learning for Ex Vivo Heart Perfusion" Applied Sciences 14, no. 19: 8735. https://doi.org/10.3390/app14198735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop