Advancing Patient Care with an Intelligent and Personalized Medication Engagement System

Ismail, Ahsan; Naeem, Muddasar; Syed, Madiha Haider; Abbas, Musarat; Coronato, Antonio

doi:10.3390/info15100609

Open AccessArticle

Advancing Patient Care with an Intelligent and Personalized Medication Engagement System

by

Ahsan Ismail

¹

,

Muddasar Naeem

^2,*

,

Madiha Haider Syed

³

,

Musarat Abbas

¹ and

Antonio Coronato

²

¹

Department of Electronics, Quaid-i-Azam University Islamabad, Islamabad 44000, Pakistan

²

Research Center on ICT Technologies for Healthcare and Wellbeing, Università Telematica Giustino Fortunato, 82100 Benevento, Italy

³

Institute of Information Technology, Quaid-i-Azam University Islamabad, Islamabad 44000, Pakistan

^*

Author to whom correspondence should be addressed.

Information 2024, 15(10), 609; https://doi.org/10.3390/info15100609

Submission received: 27 August 2024 / Revised: 15 September 2024 / Accepted: 2 October 2024 / Published: 4 October 2024

(This article belongs to the Special Issue Computer Vision, Pattern Recognition and Machine Learning in Italy)

Download

Browse Figures

Versions Notes

Abstract

:

Therapeutic efficacy is affected by adherence failure as also demonstrated by WHO clinical studies that 50–70% of patients follow a treatment plan properly. Patients’ failure to follow prescribed drugs is the main reason for morbidity and mortality and more cost of healthcare services. Adherence to medication could be improved with the use of patient engagement systems. Such engagement systems can include a patient’s preferences and beliefs in the treatment plans, resulting in more responsive and customized treatments. However, one key limitation of the existing engagement systems is their generic applications. We propose a personalized framework for patient medication engagement using AI methods such as Reinforcement Learning (RL) and Deep Learning (DL). The proposed Personalized Medication Engagement System (PMES) has two major components. The first component of the PMES is based on an RL agent, which is trained on adherence reports and later utilized to engage a patient. The RL agent, after training, can identify each patient’s patterns of responsiveness by observing and learning their response to signs and then optimize for each individual. The second component of the proposed system is based on DL and is used to monitor the medication process. The additional feature of the PMES is that it is cloud-based and can be utilized anywhere remotely. Moreover, the system is personalized as the RL component of PMES can be trained for each patient separately, while the DL part of the PMES can be trained for a given medication plan. Thus, the advantage of the proposed work is two-fold, i.e., RL component of the framework improves adherence to medication while the DL component minimizes medication errors.

Keywords:

adherence to medication; Deep Learning; eHealth systems; drug verification; patient engagement; reinforcement learning

1. Introduction

Patient medication engagement is recently considered to be an integral aspect of health care and an important part of safe people-centered services. Engaged patients can make better decisions about their health care options [1]. Furthermore, resources are better utilized if they are allocated to patients’ priorities, and this is critical for the sustainability of global health systems [2]. Users of health services are increasingly demanding more responsive, intelligent, and user-friendly healthcare systems. Patients expect doctors to engage them in the decision-making process even though they may vary substantially in their preferences for such participation [3].

Moreover, a failure to adhere to medications is associated with a considerable increase in the risk of heart failure or hospitalization for heart attack, cardiovascular disease death, or premature death from any cause [4]. For example, a recent observational study in Italy using a medication-adherence report scale indicates that 71.43% patients aged 65 years or older were always adherent [5]. It is evident from studies that patients who are more adherent to their antihypertensive medications are 30% to 45% more likely to obtain blood pressure control compared to nonadherent [1].

Some factors have been investigated that support or deter people from being willing and able to be actively involved in improving patient safety. These factors include health care infrastructure (e.g., primary care or secondary care), tasks (e.g., if a required patient safety behavior challenges practitioners’ abilities), health care staff (e.g., knowledge and attitudes), health conditions (e.g., illness severity) and patients (e.g., demographic characteristics) [6]. In addition, a major factor hindering patient engagement is the patient’s perception of their status and role subordinate to medical professionals. For example, a person may fear being perceived as “difficult”, or they may consider a passive role as a means of actively protecting their personal safety. Such challenges may be mitigated by providing better communication and giving education to both patients and healthcare providers to understand healthcare as a partnership between the healthcare care provider and patient [7].

Therefore, the use of patient engagement systems not only improves adherence to treatment plans but also results in fewer adverse events and better outcomes. Such active patient engagement during the medication period can increase patient satisfaction, foster trust, and help to enhance patient-provider relationships [8]. Moreover, the deployment of patient engagement systems can reduce healthcare costs by minimizing hospital readmissions and optimizing resource utilization [9]. Thus, the resulting patient participation in the medication process ensures the long-term viability of global health systems by effective resource utilization and encouraging informed treatment decisions [10]. The use of computer vision and the latest technologies could improve the various aspects of the healthcare sector [11].

Artificial intelligence (AI) techniques have been progressively employed within the healthcare domain for several purposes, including assisting patients to adhere to medicine regimens [12]. AI-based machine learning techniques appeared to have useful effects in medicine [13]. AI-powered applications aiming to enhance communication between patients and doctors, monitor pill consumption, empower patients, and finally, improve adherence to medications could result in better medical outcomes and consequently impact the quality of life of many patients [14]. This work utilizes multiple AI techniques, such as Reinforcement Learning (RL) and Deep Learning (DL), to realize an intelligent healthcare infrastructure to engage patients in the medication process.

The contribution of this work is the development of a novel Patient Medication Engagement System (PMES), which consists of multiple intelligent components. The main contributions of the proposed work are summarized below:

Modeling the patient medication engagement problem into a Markov Decision Process (MDP).
Use of RL algorithms to solve formulated MDP problem. The RL agent of PMES observes the patient’s patterns of responsiveness, learns a patient’s response to signs, and then optimizes for each individual.
The role of the RL agent is then to engage the patient during the medication period and improve medication adherence.
The second component of PMES is the DL Verification Agent (DLVA), which is built using the CNN model. The salient feature of DLVA is its dynamic nature, as it can be used anywhere and, more importantly, can be trained for a given number of medicines.
The function of personalized DLVA is to assist in monitoring a patient’s drug-taking activity and minimize the chance of taking the wrong drugs.

After the introduction of this work in Section 1, we present an overview of some related research works in Section 2. As the proposed framework utilizes different AI technologies, so Section 3. Then, the architecture of the proposed PMES and the working of different components are given in Section 4. At the end, we present results and analysis in Section 5 and the conclusion in Section 7.

2. Related Work

In this section, we will present the related works by highlighting their limitations concerning our work. At the end of this section, we will briefly highlight our novel contribution after reporting the relevant work.

A novel hybrid model for epileptic seizure detection is proposed in [15] by utilizing a combination of particle swarm optimization and genetic algorithm for fine-tuning the parameters of support vector machines for the classification of Electroencephalogram (EEG) data. Although the combination of different ML techniques improves the classification accuracy, the proposed system does not assist the patients according to their personal needs and does not provide a verification mechanism. Similar work is carried out in [16] using the combination of EEG and the Internet of Things (IoT) for epileptic episode detection at early stages. The proposed work also uses the Convolutional Neural Network (CNN) model within an IoT-powered EEG monitoring setup for EEG seizure identification. The framework demonstrates an average accuracy rate of 98.48%, but it cannot be utilized as a patient medication system.

Bayesian approach of Upper Confidence Bound is used in [17] to provide medication reminders to the patients based on a patient’s disability. The proposed system is useful for reminding the patient to schedule medication by generating personalized messages but cannot engage a patient for medication and does not provide a monitoring facility for whether a patient completes the drug-taking process or not. Similarly, the architecture consists of multiple technologies and sensors proposed in [18] provides medication-reminder messages and subsequent monitoring mechanisms, but this work also fails to engage a patient for medication activity.

A wearable device embedded with an acceleration sensor is employed in [19] to monitor patients within residential centers. The authors use federated learning to detect epileptic seizures by amalgamating trained models from several residential centers. The proposed system, which boasted over 60 h of autonomy, shows good detection rates during preliminary simulations. Similar work is carried out in [20] for seizure detection using the random forest method. The system can predict seizures and send alerts to caregivers, nurses, and hospital management before occurrence and thus assist doctors in making treatment strategies. However, both these frameworks are far away from the purpose of a personalized medication setup.

A self-management drug-taking mechanism using a smartphone is introduced in [21]. The proposed system can remind the patient about the scheduled time of medicine and later monitor drug-taking activity in real time. However, this system is not personalized and provides reminder services to different patients in a similar way. The use of a smartphone for drug reminding purposes is also done in [22], where a mobile phone text message reminder App is developed to assist patients’ medication process. Again, the given system is not able to consider the personalized behavior of a patient. Moreover, authors in [23] examined many security frameworks for mHealth and eHealth to develop a new reference model using new mechanisms like service management systems, secure remote procedure calls, and capabilities but the framework cannot serve to improve the medication adherence.

A framework for processing medical images is developed in [24] using the DL technique. The number of hidden layers in the neural network is controlled by employing the highest effect factor and processing. The classification of the extracted features in a new way is performed to diagnose diseases. A lion mating optimization-inspired routing protocol is proposed in [25] for body area sensor networks and IoT healthcare applications. The proposed routing scheme helps in improving health services by minimizing local search issues and searching for optimal dynamic cluster-based routing solutions.

We found a few systems being utilized to remind medication to the patients. However, only reminding people to take their medicine is not enough. We need to understand why some people forget or do not take their medicine as prescribed. Such important health-related issue demands an infrastructure for patient medication engagement. Therefore, the contribution of our proposed medication engagement system is multi-objective. Our proposed solution first considers the personalized behavior of a patient, learns the behavior, and then able to engages a patient during the whole medication process in a personalized manner. Then, our proposed PMES also provides a monitoring and verification facility, which not only improves medication adherence but also minimizes errors in taking the wrong medicines. The third important feature of PMES is its dynamic nature, as it can be used remotely anywhere, and the verification component of PMES can be trained for a given set of medicines.

3. Technical Background

This section provides a brief introduction to different AI techniques that have been used in the development of PMES.

3.1. Neural Networks

Neural networks (NNs) are computational models inspired by the human brain’s ability to process information. They comprise interconnected units known as neurons, organized into layers: input, hidden, and output. These neurons engage in a learning process where they adjust their behavior based on the data they receive [26]. Inputs are represented by numerical values; these are fed into neurons to convey features of the data. Each input is assigned a weight, signifying its impact on the neuron’s output. These weights are adjusted during training to optimize the network’s performance. A constant value known as bias is added to the weighted sum of inputs, allowing the neuron to learn even when inputs are zero. It helps in avoiding scenarios where the network would be overly influenced by certain features [27]. The activation function is applied as a non-linear transformation to the weighted sum and enables neurons to learn intricate patterns in the data. This non-linearity is crucial for the network to capture complex relationships and avoid being limited to linear mappings.

3.2. Convolutional Neural Networks (CNNs)

CNNs are a type of NN tailored for grid-like data, such as images or audio. They use filters or convolutions to extract meaningful features from the input. A typical CNN includes convolutional layers, pooling layers, and fully connected layers. Convolutional Layers: These layers apply filters to extract features from the input data, like edges or textures [28]. Pooling layers are used to reduce the dimensionality of the data, preventing overfitting and simplifying the information. Fully connected layers use the features extracted by convolutional layers to make final predictions.

3.3. Transfer Learning

Transfer learning is a technique where a pre-trained model is used as a starting point for a new task. The idea is to leverage the knowledge the model gained from a previous task to improve performance on a new task. The pre-trained model, often a deep neural network trained on a large dataset like ImageNet, is modified for the new task by adding or replacing layers. Only the weights of the new layers are trained using the new dataset [29]. Transfer learning is useful when the new dataset is too small to train a model from scratch. The new dataset is different but has some similarities to the original dataset. It has proven effective for tasks like image classification, object detection, and natural language processing, as it reduces the time and resources needed for training while enhancing performance on new tasks.

3.4. Reinforcement Learning

Reinforcement Learning (RL) is a type of machine learning where a computer program learns by taking actions in a given environment. It is similar to playing a game where you try different moves to obtain the highest score. In RL, the program explores actions, learns from the results, and aims to make smart decisions to maximize its overall success.

The key components of the RL framework are agent, environment, states, actions, and reward. An RL agent is the program or “player” that makes decisions in the virtual world. The environment is the virtual world where the agent takes action. Each environment contains different situations or states, and the choices the agent can make in those situations are known as actions. The environment returns a reward against each action taken in a state.

The agent starts by randomly trying actions and learning from the outcomes. At first, it does not know much about the environment, but with each attempt, it figures out what works and what does not. Over time, it becomes better at choosing the best actions in different situations. The ultimate goal is to have a strategy (policy) for making the best decisions in any state of the environment. For example, consider the Multi-Arm Bandit (MAB) problem where we have many slot machines. The agent’s goal is to learn which machine gives the best reward. It is a simplified RL scenario with only one state and the agent balances between trying new machines (exploration) and sticking to the best one (exploitation).

3.5. Contextual Bandit

Contextual Bandit(CB) extends the idea of MAB by considering additional information (context

X_{i}

) that influences the outcomes. The agent must consider both the situation and the potential actions as demonstrated in Figure 1. In contrast to traditional RL setup where the environment can have many states and actions, bandit problems contain one state with multiple actions and not terminal states. These kinds of environments can be better solved using Bayesian RL techniques. Bayesian Techniques in Simple Terms like Epsilon-Greedy, Thompson Sampling (TS), and Upper Confidence Bound (UCB), which are explained next.

3.5.1. Epsilon-Greedy

Epsilon-greedy is a simple yet powerful bandit algorithm used in RL to balance exploration with exploitation. We consider the problem of multiple slot machines (bandits) where an agent needs to maximize the reward by choosing the right slot. The agent needs to decide whether to play the machine that has the highest payout or try a different one to potentially discover a better option. The working of Epsilon-Greedy is explained as follows:

Action Selection

With probability $1 - ϵ$ , select the action with the highest estimated reward.
With probability $ϵ$ , randomly select an action.

P (Select action i) = \{\begin{matrix} 1 - ϵ, & if i has the highest estimated reward \\ ϵ, & otherwise \end{matrix}

Classifiers or regressors, acting as oracles, are trained to understand the rewards associated with each action based on contextual information (

X_{i}

). We also incorporate oracles represented by Logistic Regression models to enhance the Epsilon-Greedy agent’s decision-making. Individual logistic regressor oracles are initialized for each action in the Epsilon-Greedy case. Context and reward history are also set up initially. For binary rewards, the reward history is populated with −1, indicating that a specific action was not utilized in a particular round. This approach is applied to train an Epsilon-Greedy algorithm. The use of −1 in the reward history allows for distinguishing which action received a reward in each round.

Classifier Training

Logistic Regression Estimate for action i:

Estimated Probability of Success = \frac{1}{1 + e^{- θ_{i}^{T} x}}

Update classifier using observed rewards.

3.5.2. Thomson Sampling

Thomson Sampling is a powerful Bayesian approach for making decisions in contextual bandit problems. In this setting, the "one-armed bandit" metaphor expands to multiple options whose rewards depend not only on the choice made but also on additional information, the context. The algorithm maintains parameters (alpha and beta) for a Beta distribution for each action. After observing rewards, the parameters are updated based on the number of successes and failures. Here is how Thomson Sampling works in a contextual bandit setting:

Beta Distribution Parameters Update

Parameters for action i after observing

S_{i}

successes and

F_{i}

failures:

\begin{matrix} α_{i}^{'} & = α_{i} + S_{i}, \\ β_{i}^{'} & = β_{i} + F_{i} . \end{matrix}

Action Selection

Sample from the Beta distribution for each action i:

Sampled Reward \sim Beta (α_{i}^{'}, β_{i}^{'})

Select action with the highest sampled reward.

3.5.3. Upper Confidence Bounds-UCB

In the context of contextual bandit problems, the Upper Confidence Bound (UCB) algorithm is used to balance exploration and exploitation. The key idea is to estimate the uncertainty or confidence associated with each action’s expected reward and prioritize actions with higher uncertainty. The algorithm calculates upper confidence bounds for each action based on the empirical mean and exploration bonus. The empirical mean and the number of selections for each action are updated based on the observed rewards. Here is how UCB works in contextual bandit problems:

Upper Confidence Bounds (UCB)

Exploration reward for action i at time t:

Exploration Bonus (i, t) = c \sqrt{\frac{log (t + 1)}{N_{i} + ϵ}}

Upper Confidence Bound for action i at time t:

UCB (i, t) = Empirical {Mean}_{i} + Exploration Bonus (i, t)

Action Selection

Select the action with the highest Upper Confidence Bound:

Action = arg max_{i} UCB (i, t)

Empirical Mean and Number of Selections Update

After observing reward r for action i:

\begin{matrix} N_{i} & = N_{i} + 1, \\ Empirical {Mean}_{i} & = \frac{N_{i}}{N_{i} + 1} \cdot Empirical {Mean}_{i} + \frac{r}{N_{i} + 1} . \end{matrix}

4. Patient Medication Engagement System

4.1. RL and DL for Adaptive Patient Engagement

The proposed Patient Medication Engagement System (PMES) has two components, i.e., an RL-based patient engagement agent and a DL-based drug verification agent. The working of these AI-empowered agents is shown in Figure 2. These intelligent components work together to assist the patients. We will present the introduction and working of each component of the PMES in the next subsections.

4.2. RL Engagement Agent

The role of this intelligent component of the PMES is to engage patients in the medication. The working of the RL agent is demonstrated in Figure 3. The RL agent first learns the behavior of each patient during the training phase. The RL agent takes into account various parameters such as time of day, adherence, and a patient’s response to previous reminders. After training, the agent can send engagement messages. The engagement messages are personalized and based on each patient’s behavior and medication timing. These engagement messages develop and encourage a positive interaction between the PMES and the patient. Thus, the ultimate objective of this personalized assistance is to improve adherence to medication. Next, we demonstrate the formulation of this patient medication scenario into an RL setup. Contextual Bandits (CB) can be considered to be a simplified form of RL or an extension of multi-armed bandits. Different terms are used for CB like one-step RL, associative RL, bandits with side information, learning with bandit feedback or partial feedback, associative bandits, and multi-world testing. We consider an environment where the RL agent has the option to select different actions, and each action returns a reward. The reward depends on the current context, and it is either 0 or 1, depending on certain conditions of the given environment.

The context for our RL agent in the environment, is the behavior of the patients during their medication schedule. The RL agent considers the patient’s age, time of day, adherence, and a patient’s response to previous reminders during the decision-making process. This combination of information forms a state for the RL agent. Then the actions (engagement messages) are generated and communicated to the patients by the PMES before the scheduled drug time. The RL agent measures the reward using the “Medication-Adherence Score” indicating the proportion of prescribed medicines taken correctly. The output from the DL verification component works as the input to the RL agent in reward calculation. For instance, the adherence score would be 70% if a patient takes 7 out of 10 prescribed drugs correctly. Currently, the systems can perform weekly updates to incorporate the patient data and update the decision-making process.

We used 100 synthetic patient profiles, and every profile contains 128 records. We used these data to train our RL agent. A unique ID is assigned to each patient profile, and the dataset contains diverse information such as day, time of the day, adherence score, and age. A sequence of preprocessing is carried out on the raw medication data to improve its quality before analysis. For example, columns like ‘DayType’ and ‘PatientID’ were dropped from the DataFrame. The categorical variable ‘TimeOfDay’ was one-hot encoded to transform it into numerical features, to incorporate it into ML models. Similarly, the normalization is carried out on columns ‘Age’ and ’PriorAdherenceScore’ using the StandardScaler from scikit-learn to ensure uniform scales across continuous features.

The training is performed over a lot of trials. In each trial, a context

X_{i}

is given to the RL agent. The agent receives feedback after an action is taken, and after many trials, the agent can learn the optimal strategy, as also demonstrated in Figure 1. We also used oracles represented by Logistic Regression models along with the Epsilon-Greedy RL agent to improve the decision-making process. An interface is created for communication, which makes training and evaluation more efficient and smoother.

We experimented with the RL part of the PMES with Bayesian algorithms like Upper Confidence Bound (UCB), Epsilon-Greedy, and Thompson Sampling (TS). UCB updates the averages and the number of selections based on its observation and does not employ conventional training like classifiers. TS updates the parameters of the Beta distribution for every action during the training process. The Epsilon-Greedy agent checks with the Logistic Regression based on the context. The agent estimates the score using beta distribution if the classifier is not trained and selects the action with the maximum estimated reward.

4.3. DL Verification Agent

The second component of PMES is the Deep Learning (DL) enabled monitoring framework. The role of this component is to monitor the medication process once the RL agent sends an engagement message to the patient. The DL Verification Agent (DLVA) is designed as a dynamic component and can assist patients remotely. A web application interface is designed using the Flask framework for patient interaction with the PMES. The working of the DLVA is summarized as:

The user first needs to upload 20 or more images of every drug that the user must consume during a treatment process.
Once images are uploaded by the user, then the personalized model for the given data is trained as DLVA of PMES.
Once the personalized model is trained, the DLVA can be used as the verification component of the PMES along with the engagement component.

The unique and dynamic feature of the DLVA is that it can be trained for each patient separately, on the given number of medicines and assist patients in a personalized manner.

We need to utilize preprocessing techniques because of the smaller number of images for each medicine (data) uploaded by a patient/user. We use resizing, label collection, and standard normalization on the uploaded dataset to adjust the dimensions of the images, organize and categorize the images, and scale the pixel values of images to a standard range, respectively. After preprocessing the patient’s uploaded data, we apply an augmentation tool to improve the size and quality of the dataset. The augmentation operations that we carry out include brightness range adjustments, horizontal flip, zoom range (changing the scale of the image), shear range (altering the shape of the image), width and height shifts, rotation, and fill mode. The operations of preprocessing and augmentation help to present a more robust dataset for training to DLVA.

After the data organization, the next step is the choice of a DL model. Among available well-known frameworks such as PyTorch, Keras, and TensorFlow, we selected the Keras framework to design a Convolutional Neural Network (CNN) model for DLVA. Keras has features like open-source availability, extensibility, user-friendliness, rapid experimentation with DNN, and sequential API that simplifies the process of building a CNN model.

4.4. Model Summary

The architecture of the personalized or custom model is demonstrated in Figure 4. The model consists of three convolution layers with 32, 64, and 128 filters, each followed by max-pooling layers, respectively. The Rectified Linear Unit (ReLU) activation function is applied to every convolution layer. A Flatten layer is used after the convolution layers, and then we employ two dense layers with ReLU activation. A dropout of 0.5 is used to address the overfitting after the first dense layer. The last dense layer uses SoftMax activation to generate probabilities for every category (medicine). The input size for the first convolution layer is set to (64, 64, 3).

The uploaded dataset by the user is divided as 85% for training and 15% for validation. Categorical cross-entropy serves as the loss function, and a learning rate of 0.001 and a momentum of 0.9 with the Adam optimizer for optimization are utilized. Finally, the model is trained on both the training and validation datasets to improve the accuracy.

4.5. Integration of RL and DL Components

In our proposed system, the DL component functions as a feedback mechanism for the RL agent. Specifically, the DL system is designed to monitor patient drug-taking activity. The output of DLVA is used as an input to the RL agent, which leverages this information to predict the optimal times to send engagement messages to users.

The feedback provided by the DLVA is essential for refining the RL model’s decision-making process, enabling the system to adapt in real time to patient behavior. This approach allows the system to dynamically adjust its engagement strategies, providing a more personalized and effective solution for improving patient adherence to prescribed treatments. The integration of DL for drug verification and RL for behavior adaptation creates a robust and adaptive framework, particularly useful in healthcare applications where patient engagement is critical for treatment outcomes.

5. Results and Analysis

This section provides a demonstration of the simulation results and their explanation. As the PMES consists of an RL engagement agent and DLVA, therefore, we will first present information about hyperparameters and then results related to each component separately for the better understanding of the readers.

5.1. Hyperparameters for Training

DL Hyperparameters The hyperparameters used during the training process for the Deep Learning (DL) agent are as follows:

Learning Rate: 0.001
The learning rate controls how much to adjust the weights of the model with respect to the loss gradient. A learning rate of 0.001 is commonly used as it strikes a balance between fast convergence and stability. A rate too high might lead to overshooting the optimal weights, while a rate too low can result in slow convergence.
Momentum: 0.9
Momentum helps accelerate the optimizer in the relevant direction and dampens oscillations. A momentum of 0.9 is used to smooth the optimization process by accumulating past gradients, which helps in navigating through local minima and speeds up convergence.
Batch Size: 16
Batch size refers to the number of training examples utilized in one iteration of model training. A batch size of 16 provides a balance between the computational efficiency and the stability of the gradient estimates. Smaller batch sizes can offer a more accurate estimate of the gradient but may lead to noisier updates.
Epochs: 40
The number of epochs represents the number of times the entire training dataset is passed through the model. Training for 40 epochs allows the model sufficient iterations to learn from the data, but this should be monitored to avoid overfitting. The chosen number is based on empirical testing to achieve a good balance between underfitting and overfitting.
Loss Function: Categorical Cross-Entropy
Categorical cross-entropy is used for multi-class classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1. This loss function helps in quantifying how well the model’s predicted probabilities match the true class labels, and its optimization helps in improving the accuracy of the model.

RL Hyperparameters

Thompson Sampling:

Beta Distribution Parameters $(α, β)$ : Initialized to (1, 1) for each action arm. Updated based on observed rewards, reflecting empirical success rates.

Epsilon-Greedy:

Epsilon ( $ϵ$ ): Set to 0.2. Balances exploration (20%) and exploitation (80%) of the best-known action.

Upper Confidence Bound (UCB):

Confidence Parameter (c): Set to 0.8. Adjusts exploration bonus added to empirical mean rewards to balance exploration and exploitation.

5.2. Metrics for RL Engagement Agent

The two metrics that we consider evaluating the performance of RL engagement agents are cumulative mean reward and regret. The RL Bayesian approaches that we use during experiments are Upper Confidence Bound (UCB), Epsilon-Greedy, and Thomson Sampling (TS). We used synthetic data for experiments and evaluation purposes.

We first compare the three approaches using the parameter of cumulative mean reward as shown in Figure 5. It is evident that the performance of UCB is superior then the Epsilon-Greedy and TS against cumulative reward. This better performance in terms of cumulative mean reward indicates that the agent learns the behavior of the patients faster and thus can assist a patient with suitable and correct engagement communication. Similarly, the Epsilon-Greedy performance falls behind the UCB but is better than the TS method.

The second metric that also measures the performance of an RL agent but in an opposite way is the Regret function. The demonstration of this metric against three used RL approaches is given in Figure 6. This metric highlights the difference between the chosen action and the possible best action in a given state. As far as the performance of the three methods, we can see from Figure 6 that, again, the UCB has a better regret index as compared to the other two methods. The lower value of UCB indicates that it makes better decisions, and there are fewer occasions when the best action and the action chosen by UCB agents are different. Moreover, the performance of Epsilon-Greedy is lower than the UCB but better than the TS scheme.

5.3. Results of DLVA

We performed experiments and demonstrated the usability of DLVA using seven medicines. The two well-known metrics for the evaluation of a CNN model are prediction/classification accuracy and loss function for both training and validation data. We performed the experiments as a user of PMES and evaluated the DLVA component of PMES using seven different drugs. As explained in Section 4.3, the user of PMES needs to upload the images of prescribed drugs so that a personalized CNN model can be trained for those drugs. Therefore, we uploaded seven drugs using the GUI of the PMES.

The demonstration of DLVA is shown in Figure 7. The explanation at each phase of DLVA working is anticipated as:

(1): After accessing the PMES, a user must upload a certain number of images of each drug that he/she must take during a medication period, as also illustrated in Figure 7a. For example, if a patient has a prescription for 4 medicines from a doctor, then that patient must upload images of 4 medicines on DLVA.
(2): The next step is to indicate a medicine schedule, which includes the name of the uploaded medicine, scheduled time of consumption, and the number of pills at one consumption. The phase is highlighted in Figure 7b.
(3): Then, the training of the CNN model on the user’s uploaded data is started on the administrative part of the PMES. Once the model completes training, the personalized CNN model is ready and available for patient medication verification. Figure 7c demonstrates that the DLVA is ready for use and starts sending reminder alerts at the scheduled time.
(4): The last Figure 7d shows the verification step of the DLVA to assist the patient in taking the correct drug. The performance of the DLVA can be evaluated using the parameters of accuracy and loss function. For instance, the accuracy and loss function graphs for the model, which is trained for seven medicines, are shown in Figure 8 and Figure 9, respectively. We can see that the CNN model gives an accuracy of 98% and a loss function of 0.08 on validation data.

5.4. Comparison with the State of the Art

In this subsection, we present the limitations of the state of the art and compare our solution with them in terms of services. We summarize this comparison in Table 1.

Lack of Personalization: Many previous systems, such as those in [17,21,22], offer medication reminders but lack personalized engagement. They provide generalized alerts that do not adapt to individual patient behavior, thus failing to address variations in adherence patterns.

No Active Engagement: While some systems send reminders, they do not actively engage the patient throughout the medication process. For example, Ref. [18] offers monitoring but does not incorporate mechanisms to motivate or support patients in adhering to their regimens.

Absence of Monitoring for Medication Adherence: Several related works [17,18,21] focus on reminding patients but lack verification mechanisms to ensure patients actually take their medications correctly. This leaves room for errors and non-adherence.

Limited Scope: Systems like those proposed in [19,20] are aimed at seizure detection and alerting caregivers, but they are not designed to manage chronic medication adherence, especially for conditions like hypertension, cardiovascular diseases, etc.

Lack of Intelligent Adaptation: The systems in [16,24] employ sophisticated machine learning techniques for other medical problems (e.g., seizure detection, image processing) but do not adapt to the specific medication needs of individual patients or offer dynamic adjustments based on patient feedback or behaviors.

Inadequate Monitoring and Verification: While some frameworks provide monitoring (e.g., [18,24]), they lack advanced AI-driven verification mechanisms to ensure patients take the correct dosage of their medication, leaving room for potential health risks.

6. Discussion

This section presents some of the challenges and their possible solutions.

6.1. Challenges for Real-World Deployment

The proposed solution shows promising results, but still, some challenges need to be discussed. We highlight some of the challenges that we observed during our activities next:

Personalized Model Training for Each Patient
Complexity of Data Collection: A significant challenge is the difficulty of gathering sufficient and relevant data for each individual patient. This process is often time-consuming and complex, complicating the training of accurate, personalized models.
High Computational Requirements: Another major challenge is the substantial computational resources needed to train separate models for each patient. This demand can strain available resources in healthcare settings.
Medication Verification
Handling Data Variability: Ensuring the system can accurately verify medications despite variations in packaging, lighting, and image quality presents a considerable challenge. Managing these inconsistencies is crucial for reliable medication verification.
Real-Time Processing Demands: Implementing a system that performs medication verification in real time poses a challenge due to the need for efficient processing and integration with existing healthcare systems.
Adherence Score Collection
Privacy Management: One of the major challenges is safeguarding sensitive patient data while collecting adherence scores. Ensuring privacy and data protection throughout the adherence monitoring process is a critical concern.
Maintaining Accuracy: Collecting precise adherence scores is challenging due to potential issues such as inconsistent data collection times and patient non-compliance. Addressing these factors to ensure accuracy in adherence measurements is a complex task.
Scalability
Handling Large-Scale Deployment: Scaling the cloud-based system to support deployment across multiple healthcare facilities presents challenges. The system must be capable of managing increased data volume and user load while maintaining performance and reliability. Efficiently scaling resources and ensuring consistent service quality across various facilities requires robust infrastructure planning and dynamic resource allocation strategies.
Security and Privacy
Ensuring Data Security: Protecting patient data in a cloud-based system involves addressing security concerns such as unauthorized access and data breaches. Implementing strong encryption protocols, secure authentication methods, and regular security audits are essential measures to safeguard patient information.
Compliance with Regulations: The system must adhere to regulatory requirements such as HIPAA (Health Insurance Portability and Accountability Act) or GDPR (General Data Protection Regulation) for data privacy and security. Ensuring compliance with these regulations adds a layer of complexity to the system’s design and operation.

6.2. Addressing Scalability and Security

To address challenges highlighted in the previous subsection, the following strategies are proposed:

Scalable Infrastructure
Cloud Solutions: Leveraging scalable cloud infrastructure such as AWS, Azure, or Google Cloud to handle varying loads and facilitate seamless expansion. Utilizing cloud services that offer auto-scaling capabilities ensures the system can adapt to increasing demands across multiple healthcare facilities.
Load Balancing: Implementing load balancing techniques to distribute traffic evenly across servers, therefore improving system performance and reliability.
Robust Security Measures
Data Encryption: Employing advanced encryption techniques both at rest and in transit to ensure the confidentiality and integrity of patient data.
Access Controls: Implementing stringent access controls and authentication mechanisms to prevent unauthorized access to sensitive information.
Regulatory Compliance: Ensuring the system complies with relevant data protection regulations, including conducting regular security audits and updates to meet evolving compliance requirements.
Continuous Monitoring and Improvement
Performance Monitoring: Implementing continuous monitoring tools to track system performance, detect potential issues, and optimize resource usage.
Feedback Mechanisms: Establishing feedback channels with healthcare facilities to gather insights and make iterative improvements to the system based on real-world usage and needs.

7. Conclusions and Future Work

This research introduced the Personalized Medication Engagement System (PMES), an AI-based infrastructure designed to improve patient adherence to medications, minimize medication errors, and potentially reduce healthcare costs. The PMES leverages Reinforcement Learning (RL) and Deep Learning (DL) to create a dynamic, personalized, and remotely accessible system for patient engagement and medication monitoring. Our experiments, conducted using 100 synthetic data profiles for RL engagement agents and 7 medicines for the DL-based Visual Assessment (DLVA), demonstrated promising results in terms of reliability and efficiency. These outcomes underscore the potential of AI-powered solutions in healthcare, particularly in the domain of medication management. However, it is crucial to acknowledge the current limitations of the PMES and outline potential solutions and future enhancements: Weekly Update Limitation: The current system’s capability to update patient records on a weekly basis may be insufficient for long-term treatments or conditions requiring more frequent monitoring. To address this limitation, we propose the following enhancements: (a) Implement a more flexible update schedule that allows for daily or even real-time updates. This could involve: Developing a more efficient data processing pipeline to handle increased data flow. Designing a user-friendly interface for patients to input data more frequently. Integrating with wearable devices or smart pill dispensers for automated, real-time data collection. (b) Create an adaptive update frequency based on individual patient needs and treatment complexity. This might include: Implementing an algorithm that adjusts update frequency based on the patient’s adherence history and medication criticality. Allowing healthcare providers to set custom update schedules for each patient. Expansion to Complex Treatment Plans: To enhance the PMES’s applicability to more complex treatment scenarios, we propose: (a) Extend the RL agent’s capabilities to handle multi-medication regimens: Develop a more sophisticated reward function that accounts for interactions between multiple medications. Implement a hierarchical RL structure to manage different aspects of complex treatment plans. (b) Enhance the DL component to monitor and predict potential drug interactions: Incorporate a comprehensive drug interaction database into the DLVA system. Develop predictive models for potential side effects or adverse reactions in complex medication combinations. (c) Integrate a decision support system for healthcare providers: Create visualization tools for complex treatment plans and patient adherence patterns. Develop AI-driven recommendations for treatment plan adjustments based on patient response and adherence data. Long-term Efficacy Assessment: To evaluate and improve the system’s effectiveness for chronic conditions, we suggest: (a) Conduct longitudinal studies with diverse patient populations to assess the sustained impact of PMES on medication adherence and health outcomes. (b) Implement a continuous learning mechanism in the RL agent to adapt to changes in patient behavior and treatment requirements over extended periods. Integration with Existing Healthcare Systems: To maximize the potential of PMES in real-world healthcare settings, future work should focus on: (a) Developing standardized APIs for seamless integration with various Electronic Health Record (EHR) systems. (b) Creating secure data-sharing protocols that comply with healthcare data protection regulations (e.g., HIPAA) to facilitate collaboration between patients, caregivers, and healthcare providers. (c) Designing modules for integration with pharmacy systems to automate medication refill processes based on adherence data. By addressing these limitations and pursuing these enhancements, we aim to evolve the PMES into a more robust, versatile, and widely applicable tool for improving medication adherence across diverse patient populations and treatment scenarios. Our future research will prioritize implementing these improvements and conducting comprehensive clinical trials to validate the system’s effectiveness in real-world healthcare settings. This enhanced PMES has the potential to significantly impact patient care by providing personalized, AI-driven support for medication management, ultimately leading to improved health outcomes and reduced healthcare costs.

Author Contributions

Conceptualization, M.N. and A.C.; Data curation, M.A.; Formal analysis, M.H.S.; Investigation, M.H.S. and M.A.; Methodology, A.I. and M.N.; Project administration, M.A. and A.C.; Resources, A.C.; Software, A.I.; Supervision, M.A.; Validation, M.H.S.; Writing—original draft, A.I. and M.N.; Writing—review and editing, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by the Project of National Relevance “Innovative mathematical modeling for cell mechanics: global approach from micro-scale models to experimental validation integrated by reinforcement learning”, financed by European Union-Next-GenerationEU-National Recovery and Resilience Plan-NRRP-M4C1-I 1.1, CALL PRIN 2022 PNRR D.D. 1409 14-09-2022—(Project code P2022MXCJ2, CUP D53D23018940001) granted by the Italian MUR.

Data Availability Statement

Simulated data available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Piercefield, E.W.; Howard, M.E.; Robinson, M.H.; Kirk, C.E.; Ragan, A.P.; Reese, S.D. Antihypertensive medication adherence and blood pressure control among central Alabama veterans. J. Clin. Hypertens. 2017, 19, 543–549. [Google Scholar] [CrossRef] [PubMed]
Barello, S.; Graffigna, G.; Vegni, E. Patient engagement as an emerging challenge for healthcare services: Mapping the literature. Nurs. Res. Pract. 2012, 2012, 905934. [Google Scholar] [CrossRef] [PubMed]
Bizimana, P.C.; Zhang, Z.; Asim, M.; El-Latif, A.A.A.; Hammad, M. Learning-based techniques for heart disease prediction: A survey of models and performance metrics. Multimed. Tools Appl. 2024, 83, 39867–39921. [Google Scholar] [CrossRef]
Ho, P.M.; Magid, D.J.; Shetterly, S.M.; Olson, K.L.; Maddox, T.M.; Peterson, P.N.; Masoudi, F.A.; Rumsfeld, J.S. Medication nonadherence is associated with a broad range of adverse outcomes in patients with coronary artery disease. Am. Heart J. 2008, 155, 772–779. [Google Scholar] [CrossRef] [PubMed]
Liquori, G.; De Leo, A.; Di Simone, E.; Dionisi, S.; Giannetta, N.; Ganci, E.; Trainito, S.P.; Orsi, G.B.; Di Muzio, M.; Napoli, C. Medication adherence in chronic older patients: An Italian observational study using Medication Adherence Report Scale (MARS-5I). Int. J. Environ. Res. Public Health 2022, 19, 5190. [Google Scholar] [CrossRef]
Davis, R.E.; Jacklin, R.; Sevdalis, N.; Vincent, C.A. Patient involvement in patient safety: What factors influence patient participation and engagement? Health Expect. 2007, 10, 259–267. [Google Scholar] [CrossRef]
Doherty, C.; Stavropoulou, C. Patients’ willingness and ability to participate actively in the reduction of clinical errors: A systematic literature review. Soc. Sci. Med. 2012, 75, 257–263. [Google Scholar] [CrossRef]
Izonin, I.; Ribino, P.; Ebrahimnejad, A.; Quinde, M. Smart technologies and its application for medical/healthcare services. J. Reliab. Intell. Environ. 2023, 9, 1–3. [Google Scholar] [CrossRef]
Paragliola, G.; Naeem, M. Risk management for nuclear medical department using reinforcement learning algorithms. J. Reliab. Intell. Environ. 2019, 5, 105–113. [Google Scholar] [CrossRef]
Krist, A.H.; Tong, S.T.; Aycock, R.A.; Longo, D.R. Engaging patients in decision-making and behavior change to promote prevention. Inf. Serv. Use 2017, 37, 105–122. [Google Scholar] [CrossRef]
Germanese, D.; Colantonio, S.; Del Coco, M.; Carcagnì, P.; Leo, M. Computer Vision Tasks for Ambient Intelligence in Children’s Health. Information 2023, 14, 548. [Google Scholar] [CrossRef]
Shah, S.I.H.; Coronato, A.; Naeem, M.; De Pietro, G. Learning and assessing optimal dynamic treatment regimes through cooperative imitation learning. IEEE Access 2022, 10, 78148–78158. [Google Scholar] [CrossRef]
Bottrighi, A.; Pennisi, M. Exploring the State of Machine Learning and Deep Learning in Medicine: A Survey of the Italian Research Community. Information 2023, 14, 513. [Google Scholar] [CrossRef]
Fiorino, M.; Naeem, M.; Ciampi, M.; Coronato, A. Defining a Metric-Driven Approach for Learning Hazardous Situations. Technologies 2024, 12, 103. [Google Scholar] [CrossRef]
Subasi, A.; Kevric, J.; Abdullah Canbaz, M. Epileptic seizure detection using hybrid machine learning methods. Neural Comput. Appl. 2019, 31, 317–325. [Google Scholar] [CrossRef]
Yedurkar, D.P.; Metkar, S.; Al-Turjman, F.; Yardi, N.; Stephan, T. An IoT Based Novel Hybrid Seizure Detection Approach for Epileptic Monitoring. IEEE Trans. Ind. Inform. 2024, 20, 1420–1431. [Google Scholar] [CrossRef]
Naeem, M.; Coronato, A.; Paragliola, G. Adaptive treatment assisting system for patients using machine learning. In Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 22–25 October 2019; pp. 460–465. [Google Scholar]
Coronato, A.; Naeem, M. Ambient Intelligence for Home Medical Treatment Error Prevention. In Proceedings of the 2021 17th International Conference on Intelligent Environments (IE), Dubai, United Arab Emirates, 21–24 June 2021; pp. 1–8. [Google Scholar]
Lupión, M.; Sanjuan, J.F.; Medina-Quero, J.; Ortigosa, P.M. Epilepsy Seizure Detection Using Low-Cost IoT Devices and a Federated Machine Learning Algorithm. In Ambient Intelligence—Software and Applications—13th International Symposium on Ambient Intelligence; Springer: Cham, Switzerland, 2022; pp. 229–238. [Google Scholar]
Sugumar, D.; Suriya, K.; Suraj, T.A.M.; Jose, J.A.V.; Kavitha, K. Seizure Detection using Machine Learning and Monitoring through IoT Devices. In Proceedings of the 2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM), Istanbul, Turkey, 26–28 October 2023; pp. 1–6. [Google Scholar]
Hayakawa, M.; Uchimura, Y.; Omae, K.; Waki, K.; Fujita, H.; Ohe, K. A Smartphone-based Medication Self-management System with Real-time Medication Monitoring. Appl. Clin. Inform. 2013, 4, 37–52. [Google Scholar] [CrossRef]
He, Y.; Tan, E.H.; Wong, A.L.A.; Tan, C.C.; Wong, P.; Lee, S.C.; Tai, B.C. Improving medication adherence with adjuvant aromatase inhibitor in women with breast cancer: Study protocol of a randomised controlled trial to evaluate the effect of short message service (SMS) reminder. BMC Cancer 2018, 18, 727. [Google Scholar] [CrossRef]
Vithanwattana, N.; Karthick, G.; Mapp, G.; George, C.; Samuels, A. Securing future healthcare environments in a post-COVID-19 world: Moving from frameworks to prototypes. J. Reliab. Intell. Environ. 2022, 8, 299–315. [Google Scholar] [CrossRef]
Alarood, A.A.; Faheem, M.; Al-Khasawneh, M.A.; Alzahrani, A.I.; Alshdadi, A.A. Secure medical image transmission using deep neural network in e-health applications. Healthc. Technol. Lett. 2023, 10, 87–98. [Google Scholar] [CrossRef]
Faheem, M.; Butt, R.A.; Raza, B.; Alquhayz, H.; Abbas, M.Z.; Ngadi, M.A.; Gungor, V.C. A multiobjective, lion mating optimization inspired routing protocol for wireless body area sensor network based healthcare applications. Sensors 2019, 19, 5072. [Google Scholar] [CrossRef] [PubMed]
Yu, J.; Guo, L.; Zhang, J.; Wang, G. A survey on graph neural network-based next POI recommendation for smart cities. J. Reliab. Intell. Environ. 2024, 10, 299–318. [Google Scholar] [CrossRef]
Chen, L.; Wang, D. Cost Estimation and Prediction for Residential Projects Based on Grey Relational Analysis–Lasso Regression–Backpropagation Neural Network. Information 2024, 15, 502. [Google Scholar] [CrossRef]
Gąsienica-Józkowy, J.; Cyganek, B.; Knapik, M.; Głogowski, S.; Przebinda, L. Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices. Information 2024, 15, 474. [Google Scholar] [CrossRef]
Song, J.; Dong, H.; Chen, Y.; Zhang, X.; Zhan, G.; Jain, R.K.; Chen, Y.W. Early Recurrence Prediction of Hepatocellular Carcinoma Using Deep Learning Frameworks with Multi-Task Pre-Training. Information 2024, 15, 493. [Google Scholar] [CrossRef]

Figure 1. RL agent in contextual bandit environment.

Figure 2. Proposed Patient Medication Engagement System.

Figure 3. RL-based Medication Engagement Component of PMES.

Figure 4. ModelSummary.

Figure 5. Mean Reward at each round.

Figure 6. Regret.

Figure 7. The GUI of PMES, live demonstration. (a) Upload the images to train the model; (b) Add the medicine schedules time; (c) Reminder display on Medication Time; and (d) Pillbox images predictions by trained model.

Figure 8. Performance in terms of accuracy.

Figure 9. Performance in terms of loss function.

Table 1. Comparison of PMES with Existing Studies.

Study	Approach	Personalization	Remote Monitoring	AI Methods	Medication Error Prevention	Patient Engagement
Subasi et al. [15]	EEG-based classification	x	x	PSO, GA, SVM	x	x
Yedurkar et al. [16]	IoT-powered EEG system	x	✓	CNN, IoT	x	x
Naeem et al. [17]	Medication reminders	✓		Bayesian UCB	x	x
Coronato et al. [18]	Ambient sensors	✓	✓	Sensor-based, IoT	x	x
Lupion et al. [19]	Wearable device	x	✓	Federated learning	x	x
Sugumar et al. [20]	Seizure detection	x	✓	Random forest	x	x
Uchimura et al. [21]	Smartphone-based		✓	Rule-based	x	x
He et al. [22]	Mobile text reminders	x		Rule-based	x	x
PMES (Our Work)	RL + DL	✓	✓	RL, DL	✓	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ismail, A.; Naeem, M.; Syed, M.H.; Abbas, M.; Coronato, A. Advancing Patient Care with an Intelligent and Personalized Medication Engagement System. Information 2024, 15, 609. https://doi.org/10.3390/info15100609

AMA Style

Ismail A, Naeem M, Syed MH, Abbas M, Coronato A. Advancing Patient Care with an Intelligent and Personalized Medication Engagement System. Information. 2024; 15(10):609. https://doi.org/10.3390/info15100609

Chicago/Turabian Style

Ismail, Ahsan, Muddasar Naeem, Madiha Haider Syed, Musarat Abbas, and Antonio Coronato. 2024. "Advancing Patient Care with an Intelligent and Personalized Medication Engagement System" Information 15, no. 10: 609. https://doi.org/10.3390/info15100609

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Patient Care with an Intelligent and Personalized Medication Engagement System

Abstract

1. Introduction

2. Related Work

3. Technical Background

3.1. Neural Networks

3.2. Convolutional Neural Networks (CNNs)

3.3. Transfer Learning

3.4. Reinforcement Learning

3.5. Contextual Bandit

3.5.1. Epsilon-Greedy

Action Selection

Classifier Training

3.5.2. Thomson Sampling

Beta Distribution Parameters Update

Action Selection

3.5.3. Upper Confidence Bounds-UCB

Upper Confidence Bounds (UCB)

Action Selection

Empirical Mean and Number of Selections Update

4. Patient Medication Engagement System

4.1. RL and DL for Adaptive Patient Engagement

4.2. RL Engagement Agent

4.3. DL Verification Agent

4.4. Model Summary

4.5. Integration of RL and DL Components

5. Results and Analysis

5.1. Hyperparameters for Training

RL Hyperparameters

5.2. Metrics for RL Engagement Agent

5.3. Results of DLVA

5.4. Comparison with the State of the Art

6. Discussion

6.1. Challenges for Real-World Deployment

6.2. Addressing Scalability and Security

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI