1. Introduction
Cancer is a broad term for a group of malignant tumors, and the number of tumor types included has surpassed 100. Malignant tumors are prone to recurrence and metastasis, and many malignant tumors develop quickly and can cause serious diseases or death by spreading throughout the body. Cancer will be the second most common cause of death worldwide, with an estimated 18.1 million new cases and 9.6 million deaths in 2018 [
1]. Every year, the complex and widespread disease of cancer places a tremendous economic burden on nations all over the world, and many patients struggle to pay for costly treatment. For example, one out of every two cancer patients in China borrowed money for treatment. In total, 52% of cancer patients surveyed said that they were in financial difficulties as a result of the disease, 44% said they only had financial difficulties in borrowing money for cancer treatment, 8% said they not only experienced borrowing money for cancer treatment, but they also had to discontinue treatment due to a lack of funds, and 1 out of every 5.5 patients who borrowed money for medical treatment borrowed more than 50,000 RMB [
2]. The economic burden of malignancies has cost China
$221.4 billion in 2015, accounting for 17.7% of government public health care spending [
3]. The majority of cancers that are detected early can be cured with treatments such as surgery [
4], radiotherapy [
5], or chemotherapy [
6]. Based on these considerations, individualized dosing is required for both patients and countries.
Designing drug dosing regimens using mathematical models of cancer is one way to address the problem of personalized dosing. Scholars have been studying mathematical models of cancer since their inception. Scholars strive for accurate mathematical models of tumors because mathematical modeling can be used to understand the changing processes of many complex problems. Mathematical models of cancer propose specific indicators to evaluate the progression of the disease, such as the quantity of tumor cells [
7] or the size of the tumor [
8], and they also model additional biological indicators that can affect the tumor’s progression or that are pertinent to the patient. The statistics have shown that the results of mathematical models are frequently congruent with the results of clinical trials. In order to create the best medicine dosing regimens, we will use these mathematical models, which have been tested and have high economic value. Patients may be provided various doses of chemotherapy drugs based on the equipment and knowledge of the local medical institution. On the basis of the Riccati Equation, Cimen et al. [
9] proposed an ideal optimization method for optimizing the medicine dose. Johanna et al. [
10] proposed a controller based on linear state feedback, in which the tumor volume can be constantly decreased up to the final stable state. Batmani et al. [
11] introduced a patient state feedback-based particular controller that determines the ideal dosage of chemotherapy medications by choosing the proper states and weights in the cost function. Valle et al. [
12] devised individual-specific treatment protocols based on LaSalle’s invariance principle, and used the localization method of compact invariant sets to determine the upper and lower bounds of cell populations. In order to stop the growth of tumors, Sharifi M. et al. [
13] proposed a compound adaptive controller. Shindi et al. [
14] proposed a method that combines swarm intelligence algorithms and optimal control to eliminate tumors. Singha [
15] proposed to apply the Adomian decomposition method to solve the optimal control problem, and numerically analyzed the solution of the mathematical model. Das et al. [
16] established a theory based on quadratic control to optimize medicine dose in treating patients in a way that minimizes toxic damage. Dhanalakshmi et al. [
17] used the Gronwall inequality and Lyapunov stability to develop an optimal controller for cancer that can effectively cope with fuzzy mathematical models.
However, open-loop control methods, which are frequently used in treatment protocols developed using mathematical theory, demand exact mathematical models to produce outstanding outcomes [
18,
19]. In order to generate treatment protocols, the mathematical models must satisfy a number of theorems. These problems determine that using these methods in a clinical situation diminishes their effectiveness [
8]. Moreover, many treatment regimens require patients to take chemotherapy medications on a daily basis without taking into account a particular amount of rest time, i.e., intermittent dosing; and, finally, the minimal dose of chemotherapy drugs is impacted by the equipment of the local health facilities, and may not achieve the best results for some dosing regimens. We therefore need a method that can generate a treatment plan with acceptable efficacy for multiple discrete doses.
Artificial intelligence is becoming a part of everyday life thanks to the information era. Among these, reinforcement learning (RL) is a well-liked technology with applications in a variety of industries, including game playing [
20], autonomous vehicles [
21], and network deployment optimization [
22], among others. A group of methodological frameworks for learning, predicting, and making decisions are referred to as RL. If a problem can be formulated as a sequential choice problem or transformed into one, RL can either solve it or offer a better suitable solution. Because it continuously seeks better feedback or lessens negative input, RL is well suited to solving sequential challenge [
23]. Eventually, this results in a pattern of activities that optimizes the cumulative reward. Other machine learning algorithms tend to evaluate only the current best solution, whereas RL considers the long-term reward and is not limited to the current optimal solution, making it more suitable for real-world applications. The personalized dosage is a sequential decision problem, where RL chooses various actions depending on the environment to treat patients with various physical states and biological indicators in order to maximize the long-term benefits.
With the use of this characteristic, dynamic treatment regimens may be created that accurately adjust drug dosages in response to the patient’s many biological indicators, the patient’s underlying disease, and the change of the patient’s cancer status. For individualized medicine dosing, RL has been deeply researched. Zhao et al. [
24] first applied RL to chemotherapeutic drug dose optimization based on mathematical models of tumors, and placed restrictions on drug toxicity. Padmanabhan et al. [
25] defined the state of the tumor based on the number of tumor cells, developed the appropriate treatment plan for the patient using the Q-learning algorithm, and reduced the toxicity caused by the drug by adjusting the maximum dose of the drug. In order to optimize the use of temozolomide, Yauney et al. [
26] added an extra reward to the RL framework, using the extra reward to constrain the toxicity of the treatment protocol. Yazdjerdi et al. [
8] used the Q-learning algorithm to administer patients’ daily doses of anti-vascular treatment medication, which lowered the tumor volume and the endothelial volume of the tumor. Zade et al. [
27] validated the high efficacy of the treatment regimen produced by the algorithm by using the Q-learning algorithm to optimize the timing and dose of temozolomide, comparing it with state-of-the-art traditional treatments. Ebrahimi et al. [
28] used the RL framework to design the optimal strategy for radiation therapy, and validated the effectiveness of the controller using the case of non-small cell lung cancer. Adeyiola et al. [
29] proposed to model tumors using Markov decision-making to optimize drug doses; this method of modeling has some subjective elements and may not objectively evaluate the patient’s status. Chamani et al. [
30] proposed conservative Q-Learning (CQL) for determining the initial drug dose and evaluating the final drug dosage using clinical experience of professionals.
However, RL, as a popular approach to cope with the sequence problem, is particularly good at optimizing specific goals, but when optimizing medicine dosage, a multi-objective problem occurs. If simply tumor suppression or elimination is desired, the medication concentration in the patient’s body will be high, causing the normal cell population to be similarly removed and the immune system to be compromised, perhaps leading to fungal or viral infections. Severe organ damage in the patient’s body is another risk associated with high medication concentrations. In order to achieve a balance between multiple objectives in optimization, we should not only focus on eliminating tumors as the end goal, but also consider the variables that may have an impact on the patient’s quality of life while undergoing treatment [
31,
32]. Patients should therefore receive a multi-objective optimized treatment plan [
33].
In this paper, we proposed a Multi-Objective Deep Q-Network based on Multi-Indicator Experience Replay (MIER-MO-DQN) to address the problem of exact medicine dose regulation. First, we provide a mixed evaluation model based on the integration principle, which, by adding two different nonlinear decision techniques on the conventional multi-objective decision methods, enables multi-objective deep reinforcement learning to make more logical decisions. Second, in contrast to the conventional experience replay, we design a novel experience replay for multi-objective deep reinforcement learning that takes into account more indications and establishes a multi-objective format in order to increase the network’s stability and speed of convergence. Third, while still preserving the level of health that the patient considers acceptable, our algorithm is capable of choosing the most effective treatment plan out of a variety of drug doses for patients with various physical conditions. Our method performs better when compared to existing multi-objective deep reinforcement learning methods.
This paper is organized as follows, with
Section 2 detailing the mathematical model of cancer chemotherapy. In
Section 3, the detailed MIER-MO-DQN implementation and modeling problem are presented. The experimental results are presented in
Section 4, along with a discussion of the suggested model’s robustness.
Section 5 concludes our study.
2. Mathematical Model of the Tumor
A mathematical model of cancer was put forth by Pillis et al. [
7]. Tumor cells, effector-immune cells, circulating lymphocytes, and the concentration of chemotherapeutic drugs are the primary elements in this mathematical model. Once the initial values of each element have been established, the model uses a series of ordinary differential equations to simulate the changes that each modeled element produces over time, as shown in (
1)–(
4).
where
,
,
, and
stand for the quantity of tumor cell populations, the quantity of effector-immune cells, the quantity of circulating lymphocytes, and the chemotherapeutic drug concentration, respectively.
t stands for time. The effector-immune cells and drug concentration both reduce
in (
1) by two polynomials,
and
, while
is rising at
. In (
2),
increases at the rate of
d and decreases at the rate of
, while being created and eliminated by
and
, respectively. The drug concentration decreases
at the rate of
. The variations in the circulating lymphocyte population are described by (
3), where
proliferates at a rate of
k, naturally dies at a rate of
, and is damaged by the medication at a rate of
. In (
4),
denotes the rate at which a medication degrades in the body, and
is the dose of the drug consumed at time
t. The initial values of
,
,
, and
were
,
,
, and
.
Table 1 displays the values of the parameters in (
1)–(
4) and their associated descriptions [
7,
12,
15,
16,
34].