Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data

Sarkar, Reetika; Sun, Jianping

doi:10.3390/math12172705

Open AccessArticle

Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data

by

Reetika Sarkar

and

Jianping Sun

^*

Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27412, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2705; https://doi.org/10.3390/math12172705

Submission received: 20 July 2024 / Revised: 22 August 2024 / Accepted: 28 August 2024 / Published: 30 August 2024

(This article belongs to the Special Issue New Developments in Statistical Design and Analysis of Clinical Trials)

Download

Browse Figures

Versions Notes

Abstract

:

Safety signal detection is an integral component of Pharmacovigilance (PhV), which is defined by the World Health Organization as “science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other possible drug related problems”. The purpose of safety signal detection is to identify new or known adverse events (AEs) resulting from the use of pharmacotherapeutic products. While post-marketing spontaneous reports from different sources are commonly utilized as a data source for detecting these signals, there are underlying challenges arising from data complexity. This paper investigates the implementation of the Apriori algorithm, a popular method in association rule mining, to identify frequently co-occurring drugs and AEs within safety data. We discuss previous applications of the Apriori algorithm for safety signal detection and conduct a detailed study of an improved method specifically tailored for this purpose. This enhanced approach refines the classical Apriori method to effectively reveal potential associations between drugs/vaccines and AEs from post-marketing safety monitoring datasets, especially when AEs are rare. Detailed comparative simulation studies across varied settings coupled with the application of the method to vaccine safety data from the Vaccine Adverse Event Reporting System (VAERS) demonstrate the efficacy of the improved approach. In conclusion, the improved Apriori algorithm is shown to be a useful screening tool for detecting rarely occurring potential safety signals from the use of drugs/vaccines using post-marketing safety data.

Keywords:

drug safety; data mining; association rule mining; Apriori; safety data; disproportionality measures; pharmacovigilance; co-occurrence analysis

MSC:

62P10

1. Introduction

1.1. Safety Data Analysis

Drug safety, also known as Pharmacovigilance, is a field in pharmaceutical science. Its primary objective is to collect, detect, assess, monitor, and prevent adverse events (AE, i.e., side effects) caused by medicines, vaccines, and other types of pharmaceutical productions [1]. The history of drug safety studies in the US dates back to the Elixir Sulfanilamide disaster in 1937, in which one hundred and five patients died from the therapeutic use of sulfanilamide. As a consequence, the US Congress passed the 1938 Federal Food, Drug, and Cosmetic Act, which required proof of safety before the release of a new drug [2]. The US Kefauver–Harris Amendment, i.e., the Drug Efficacy Amendment, was passed by Congress in 1962 as a response to the Thalidomide tragedy, in which over ten thousand children were born with defects due to their mothers using thalidomide to treat morning sickness during pregnancy [3]. Drug safety is a critical component spanning the entire life cycle of drug development from the preclinical and early phases to the post-market stage. It has received significant attention from both governments and pharmaceutical companies globally due to the potential for substantial losses arising from drug safety concerns. For example, the nonsteroidal anti-inflammatory drug Rofecoxib (Vioxx) was withdrawn from the market in 2004 due to increased cardiovascular risks, leading to significant financial losses for its manufacturer Merck, largely due to legal expenses [4,5].

However, the analysis of post-market drug safety data is often challenging because of their intrinsic complexity, specifically:

High dimensionality: The number of AEs can be large (hundreds or even thousands), especially during the post-approval marketing phase when the medicine becomes available for broad populations. However, only a few of these AEs are significant for new discoveries about the product’s clinical safety.
Sparsity: Most types of AEs are rare, especially in the stage of post-market surveillance, due to factors such as selective participant profiles in clinical trials, rare events in large populations, long-term effects and drug interactions, and more.
Weak signal: Certain AEs may exhibit a low signal strength related to the drug or vaccine under investigation, which could potentially impact the efficacy of the methodologies employed to detect drug–AE associations.
Complex correlation: AEs may demonstrate complex correlation structure among themselves, either positive or negative, which poses significant challenges in identifying AE signals associated with a drug or vaccine.

Considering the crucial nature of early detection of drug/vaccine-associated AEs, it has been an ongoing endeavor to establish methods that can enhance the analysis of post-market safety data. The purpose of safety signal detection from post-marketing surveillance data is to identify AEs that are associated with particular drugs or vaccines. Notable approaches that have been proposed include the Double False Discovery Rate (DFDR) approach by Mehrotra and Heyse (2012) [6], the two-stage hierarchical testing approach by Tan et al. (2020) [7], the hierarchical Bayesian mixture model for binary outcomes by Berry and Berry (2004) [8], the Poisson likelihood-based approach by Xia et al. (2011) [9], the Multivariate Bayesian Logistic Regression (MBLR) by DuMouchel (2012) [10], the Empirical Bayesian Geometric Mean (EBGM) or Multi-item Gamma Poisson Shrinker (MGPS) approach by DuMouchel [11,12], and the Bayesian Confidence Propagation Neural Network (BCPNN) by Bate et al. (1998) [13].

1.2. Apriori-Based Methods for Safety Signal Detection

The inherent attributes of post-market safety data render data mining tools highly advantageous in the context of safety signal detection. Among various data mining tools, the Apriori method exhibits substantial utility in detecting drug/vaccine-AE associations.

The Apriori method, proposed by Agrawal and Srikant (1994) [14], is a foundational algorithm in the field of data mining. It is known for its wide application in frequent itemset mining and association rule learning, especially in the field of marketing analysis. The Apriori approach is an efficient screening tool, as it prunes the search space of associations in a dataset based on the “a priori” property of frequency, i.e., if certain combinations of items in a dataset are infrequent, then any larger combination built upon those items will also be infrequent [15].

In the classical Apriori algorithm, two parameters are employed to measure the strength of the association rule. The first parameter is called “Support,” which measures how often data items in a rule occur together in a transaction. The second parameter is “Confidence”, which measures the reliability of the inference made by the rule.

Kuo et al. (2009) [16] applied the classical Apriori method to detect Adverse Drug Reactions (ADRs) from a dataset containing reports on thirteen patients, five drugs, and thirteen AEs. They also proposed that Apriori could be used to perform association analysis on characteristics of patients, drugs consumed, primary diagnosis, co-morbid conditions, and ADRs experienced, which could in turn be leveraged further to study combination of medications and patient characteristics that could lead to Adverse Drug Reactions (ADRs).

Although the study by Kuo et al. shows the ability of the traditional Apriori method to detect drug/vaccine-associated AEs through efficient searching, it has certain limitations. Their study only focused on a small dataset, in contrast to the expansive nature of actual post-market safety databases, which often contains thousands of reports on many drugs and hundreds of AEs. Furthermore, the authors employed the methodology merely as an illustrative example, without performing a comprehensive and systematic evaluation of the validity and efficiency of the Apriori approach, especially when applied to large-scale datasets.

According to Harpaz et al. (2010) [17], Confidence is not an appropriate parameter for the application of Apriori in surveillance data, as frequent AEs will have larger Confidence values while infrequent AEs will have smaller Confidence even when strongly associated with a drug. There have been other studies aimed at leveraging the efficiency of Apriori in the context of extensive spontaneous reports databases by using modified Apriori-based approaches involving the use of disproportionality measures, such as the Proportional Reporting Ratio (PRR) [18,19,20,21], Relative Reporting Ratio (RR) [17,22,23], and Reporting Odds Ratio (ROR) [23,24] as alternatives to Confidence.

It has been observed that the use of Confidence as the second parameter in the Apriori algorithm can be unreliable for identifying rare AEs in safety signal detection. Furthermore, there has been a lack of comprehensive comparative analyses between Confidence and disproportionality measures in this context. In response, this paper presents a detailed comparative study to evaluate the performance of Confidence and disproportionality measures as secondary parameters in the implementation of Apriori for safety signal detection. This study is particularly focused on the challenges associated with post-market drug safety data, as outlined in Section 1.1.

To clarify the methodology, in this paper we refer to the traditional use of Confidence in Apriori as ‘Classical Apriori’. In contrast, the modified approach incorporating disproportionality measures as the secondary parameter is termed ‘Improved Apriori’.

The rest of this article is organized as follows. In Section 2, we first discuss the classical Apriori method and the disproportionality measures proposed as an improvement to classical Apriori. We then present the numerical studies designed to compare the performance of classical Apriori with different improved Apriori methods. The corresponding results from our numerical studies, including both simulations and real data analysis, are summarized in Section 3. This is followed by a discussion of our inferences and observations from our studies in Section 4. Finally, we present our concluding remarks and potential future work in Section 5 of the article.

2. Method

2.1. Classical Apriori Method

The classical Apriori algorithm employs an iterative approach, known as “bottom-up” search, in which frequent subsets are extended one item at a time and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found.

Consider a transaction/reports dataset with n total transactions out of which

n_{A}

transactions contain item A,

n_{B}

transactions contain item B, and

n_{A B}

transactions contain both items A and B. The Support of the association rule

A \to B

is the joint proportion of transactions A and B [25], i.e.,

S u p p o r t (A \to B) = P (A \cap B) = p_{A B} = \frac{n_{A B}}{n} .

(1)

For a Boolean depiction of a transactions dataset,

S u p p o r t (A \to B) = P ({A = 1} \cap {B = 1}) .

The Confidence of the rule is denoted by the conditional probability of B given A, i.e., the proportion of occurrences of B among the reports containing occurrences of A in the dataset:

C o n f i d e n c e (A \to B) = \frac{S u p p o r t (A \to B)}{S u p p o r t (A)} = p_{B | A} = \frac{n_{A B}}{n_{A}} .

(2)

Itemsets/association rules from the dataset are extracted according to user-defined thresholds for Support and Confidence,

m i n s u p

and

m i n c o n f

, respectively, both of which are probability values that can lie between 0 and 1. The Support can also be expressed as the number of transactions including items A and B, i.e.,

n_{A B}

. In the following paper, we use the count of reports or transactions to measure the Support. The algorithm first searches for rules with Support that exceeds

m i n s u p

as the candidate itemsets. Then, for each rule in the candidate itemsets, only those rules with Confidence larger than

m i n c o n f

are selected.

2.2. Improved Apriori Method

As pointed out by Harpaz et al. [17] and other studies mentioned in Section 1.2, substituting of Confidence with disproportionality measures can be useful for identifying safety signals for rare adverse event occurrences.

Disproportionality measures are used to detect safety signals from drugs in spontaneous reporting databases. For any drug–AE combination, the disproportionality measures quantify the extent of disproportionality between the observed and expected number of reports of an AE to a drug compared to the generality of the database. In particular, PRR, RR, and ROR can be computed from a

2 \times 2

contingency table, shown in Table 1. In Table 1, a represents the number of reports containing both the drug and the AE of interest, b signifies reports containing the drug of interest and all other AEs, c denotes the number of reports including the AE of interest with all other drugs, and d is the number of reports containing all other AEs and drugs in the database.

The Proportional Reporting Ratio (PRR), defined in Equation (3), is the ratio between the proportion of a suspected AE (for example, fever) among individuals who consume a particular drug (for example, Drug A) and the proportion of this AE among subjects that consume medications other than Drug A. The PRR can have values between 0 and ∞; a PRR value exceeding 1 suggests an elevated prevalence of reports featuring the suspected AE (e.g., fever) in association with the specified drug (e.g., Drug A) relative to reports of this AE occurring with alternative medications [24,26].

PRR = \frac{a / (a + b)}{c / (c + d)} = \frac{a (c + d)}{c (a + b)}

(3)

The Relative Reporting Ratio (RR) represents the ratio between a rule’s observed frequency and a baseline expected frequency for all reports in the database [17], and can be used to assess the strength and significance of that rule. It is denoted as

RR = \frac{\Pr (AE \cap Drug)}{\Pr (AE) . \Pr (Drug)} = N . \frac{a}{(a + c) (a + b)},

(4)

where

N = a + b + c + d

is the total number of observed records. Values of RR are between 0 and ∞. To illustrate the association between the AE of fever and Drug A, for example, RR quantifies the extent to which occurrences of Drug A and fever together are more frequent than would be expected under the assumption of independence between Drug A and fever. Consequently, an RR value exceeding 1 indicates that the co-occurrence of Drug A and fever is more frequently reported than would be expected if they were independent of each other. Therefore, elevated RR offers compelling evidence of a significant association between the AE and the drug.

The Reporting Odds Ratio (ROR), defined in Equation (5), is the ratio between the reporting odds of a suspected AE among individuals exposed to a specific drug and the reporting odds of this AE among subjects that are not exposed to the specific drug. It measures the likelihood of an AE being reported in the presence of a certain drug compared to the likelihood of that AE being reported in the absence of that drug.

ROR = \frac{(a / b)}{(c / d)} = \frac{a d}{b c}

(5)

According to Rothman et al. [24], the ROR is analogous to the relative risk in a case-control study, and can be used for signal detection even with rare AEs. Again considering the example of association between fever and Drug A, the ROR offers a metric for assessing the likelihood of fever being reported in association with Drug A compared to its being reporting with other drugs in the database. ROR can take values between 0 and ∞, with a value of ROR larger than 1 providing a stronger indication of an association between Drug A and the occurrence of fever.

Moreover, different levels of thresholds have been suggested for each of these disproportionality measures; in [23,26,27], the authors suggest that there should be at least three reports (threshold count for Support) for a suspected drug–AE combination, while an RR threshold of 2 was used in [17,22,23] and the PRR threshold was set to 2 in [13,20,28,29]. All of these threshold values have been considered in our comparative studies of the improved Apriori method.

2.3. Numerical Studies

To examine and compare the performance of the improved Apriori approach to the classical approach, i.e., Apriori using Confidence as the second parameter, we conducted extensive simulation studies under various parameter settings and simulation designs. Our simulation studies, along with an additional analysis of real-world vaccine safety data, aimed to assess the best choice of secondary parameter for implementation of the Apriori approach in screening important drug–AE associations for complex data scenarios.

2.3.1. Simulation Studies

In our simulation studies, we conducted comparative studies to evaluate the performance of Apriori across different levels of AE frequencies, potential exhibition of correlations among AEs, and strength of signals for drug–AE associations.

In particular, AEs were generated in the simulation studies as binary random variables from a Bernoulli distribution, with a value of 1 denoting a reported AE and 0 otherwise. In the simulations, the success probability for the Bernoulli distribution was set to

p = 0.1, 0.3, and 0.5

, reflecting different AE frequencies (rare, moderate, and common, respectively). Moreover, to account for potential correlations among AEs we generated correlated binary AEs based on the approach described in Lunn and Davies [30]. Specifically, the levels of correlations between AEs were set to

ρ = 0

for linearly independent AEs and to

ρ = \pm 0.3, \pm 0, 5, and \pm 0.7

for weak, moderate, and strong correlations, respectively. A binary indicator variable

(D)

used to denote whether a particular drug or vaccine was taken (

D = 1

) or not (

D = 0

) was generated according to a logistic regression model, as described in Equation (6), where

β

varies under different simulation scenarios to mimic the strength of association between a drug and an AE.

l o g i t (D) = β_{1} A E_{1} + \dots + β_{k} A E_{k}

(6)

In addition, in our simulation studies we considered three different scenarios.

In the first scenario, we considered a situation with one drug and three independent AEs, among which only the first AE is truly associated with drug 1; that is, the indicator variable for drug is generated as

l o g i t (D_{1}) = β A E_{1}

, where

β

is set to be 1, 10, 50, and 100. The purpose of this setting is to study the effect of signal strength and AE frequency on the ability of each of the four second parameters (Confidence, PRR, RR, and ROR) to correctly identify the associated drug–AE pairs.

The second setting comprised three drugs and five AEs, with drug 1 associated with both AE 1 and AE 2; that is,

l o g i t (D_{1}) = β_{1} A E_{1} + β_{2} A E_{2}

, where

β_{1} = β_{2} = 1, 10, 50, and 100

. The indicator variables (

D_{2}

and

D_{3}

) for drugs 2 and 3 were generated from a Bernoulli distribution with a success probability of

p = 0.1, 0.3, and 0.5

. In this setting, we consider AE 1 and AE 2 correlated with a positive correlation

ρ

, as described above. This study is used to examine the performance of the method for each of the four second parameters in the presence of AE correlations.

Finally, in scenario 3 the setup includes five drugs and ten AEs, where drug 1 is associated with AEs 1–5, drug 2 is associated with AEs 1–3, and drug 3 is associated with AE 1; that is, the indicator variables for drugs 1–3 are generated from models

l o g i t (D_{1}) = β A E_{1} + β A E_{2} + β A E_{3} + β A E_{4} + β A E_{5}

,

l o g i t (D_{2}) = β A E_{1} + β A E_{2} + β A E_{3}

, and

l o g i t (D_{3}) = β A E_{1}

, respectively, while the other two drug variables are generated from Bernoulli

(p = 0.1, 0.3, and 0.5)

. In setting 3, multiple AE correlation scenarios are considered. We first set AEs 1–5 to be correlated at the positive

ρ

values described above, referred to as the `positive correlation’ setting in the results section. For the scenario referred to as the `negative correlation’ setting in the results section, we let AEs 1–3 be positively correlated and AEs 4 and 5 be negatively correlated, with both having absolute correlation values

| ρ |

as before. In both cases, the remaining AEs 6–10 were set to be independent. The negative correlation scenario takes into account both the direction of correlation and the magnitude of correlation.

In the Apriori searching process, we first searched drug–AE pairs that passed the minimum support threshold, then searched pairs from the previous step that passed the threshold of the second parameter. The selected “frequent 2-itemsets”, i.e., drug–AE pairs, are the ones believed to be associated with each other. Because in this paper the focus is to detect two-item association rules that include one drug and one AE, we do not search all two-item rules when applying Apriori method; as a result, in the simulation studies we only search 3 out of 6, 15 out of 28, and 50 out of 105 two-item rules in the three simulation scenarios, respectively, which results in a pruning of

50 %

,

46.43 %

, and

52.38 %

, respectively.

Moreover, when applying the Apriori method to the simulated datasets, the threshold values for the first parameter (Support) were set to be 0.05 for rare AEs, 0.15 for moderately frequent AEs, and 0.30 for common AEs. We also compared the disproportionality measures (PRR, RR, and ROR) along with Confidence for detecting frequent itemsets. The threshold values defined for these secondary parameters were

0.4

,

0.5

,

0.6

, and

0.7

for Confidence and 1,

1.2

,

1.5

, and 2 for PRR, RR, and ROR.

Finally, for each of the above three simulation scenarios we generated

N = 500

reports for a simulated dataset with 1000 replications.

2.3.2. VAERS Data

To further illustrate the implementation of the improved Apriori algorithm, we also applied it to real spontaneous reports data, namely, the Vaccine Adverse Event Reporting System (VAERS) [31]. The Vaccine Adverse Event Reporting System (VAERS), jointly overseen by the Centers for Disease Control and Prevention (CDC) and the US Food and Drug Administration (FDA), is designed to identify potential safety issues related to vaccines licensed in the United States. The VAERS operates by receiving and examining reports detailing adverse events that occur subsequent to a person receiving a vaccination.

We used the reports between years 2010 to 2019 in the VAERS database as an example, which include 746 patient reports on 1094 AEs for a total of 47 vaccines. In addition to the information on vaccines administered and AEs reported, the VAERS data also include other variables, such as the sex and age of the patient. Table 2 displays a subset of data from the VAERS dataset for January 2010 as an example.

3. Results

In this section, we report the results from our numerical study, separated into simulations and VAERS data analysis.

3.1. Simulation Study

To evaluate and compare the performance of the proposed improved Apriori and classical Apriori approaches, we used Sensitivity (

S_{n}

), Specificity (

S_{p}

), and Overall Accuracy (OA) [32,33] as the criteria. These measures can be calculated based the

2 \times 2

contingency table shown below in Table 3, in which the columns represent the truly associated and non-associated pairs and the rows indicate the pairs selected and not selected by the method. In the table, TP signifies true positives, FP stands for false positives, FN represents false negatives, and TN denotes true negatives.

Sensitivity, also known as true positive rate, is defined as

S_{n} = \frac{a}{a + c} = \frac{T P}{T P + F N} .

Specificity, that is true negative rate, is calculated as

S_{p} = \frac{d}{b + d} = \frac{T N}{F P + T N} .

The overall accuracy is then defined as

OA = \frac{a + d}{a + b + c + d} = \frac{T P + T N}{T P + F P + F N + T N} .

3.1.1. Setting 1: One Drug and Three AEs

Setting 1 involves a scenario with one drug and three AEs in which only one AE is associated with the drug. The purpose of this setting is to examine the impact of the AE frequency p and signal strength

β

.

Figure 1 shows the comparison of sensitivity among four choices of second parameter when

β

is fixed at 1. Overall, the ROR shows the most robust sensitivity results, while the sensitivity for the other three parameter choices all decrease sharply when the threshold increases. In addition, when considering rare AEs, i.e., AE frequency

p = 0.1

, Confidence provides very low sensitivity, which is consistent with the findings in Harpaz et al. [17,22], while PRR and ROR perform much better under this situation. PRR shows the ability to provide higher sensitivity when the AE is rare and the threshold is high. Although the sensitivity for the ROR decreases when the AE is rare and the threshold is high, the decline is slow enough that the ROR can still provide high sensitivity for rare AE scenarios.

Regarding specificity, Figure 2 shows that all four choices of second parameter have higher specificity when the thresholds increase. This is as expected; when the threshold increases, fewer drug–AE pairs are detected as associated; hence,

S_{n}

decreases while

S_{p}

increases. In addition, rare AEs tend to have lower specificity when compared to the scenarios with more frequent AEs.

The performance in terms of overall accuracy is illustrated in Figure 3, showing the combined performance of sensitivity and specificity; both PRR and ROR demonstrate better overall accuracy compared with Confidence and RR.

It is not surprising to see that both sensitivity and specificity increase for all four choices when the signal strength

β

increases. Because the simulations show similar trends at other

β

values, the corresponding figures for the remaining

β

values are listed in the Supplementary Materials in order to save space.

3.1.2. Setting 2: Three Drugs and Five AEs

The second simulation setting includes three drugs and five AEs, of which Drug 1 is associated with AEs 1 AE 2. To explore the impact of correlations among AEs, in this setting we also set AEs 1 and 2 to be correlated with a correlation of

ρ

= 0.3, 0.5, 0.7. The results of the simulations when the signal strength

β

is fixed at 1 are shown below.

Figure 4 and Figure 5 show the sensitivity under Setting 2 when

β = 1

and the AE frequency p = 0.1 and 0.5, respectively. Both figures illustrate that sensitivity improves when the correlation between AEs increases. This is as per expectations, as a stronger positive correlation leads to increased co-occurrences of the correlated AEs, and as a result more drug–AE pairs are identified as being associated. Additionally, similar to Setting 1, the PRR and ROR perform better, showing stable high sensitivities. However, the sensitivities for both the Confidence and RR decrease quickly when the thresholds increase. Moreover, comparing Figure 4 and Figure 5, it can be seen that when the AE frequency p, increases, the sensitivity for PRR decreases more than that for ROR at high threshold levels, which demonstrates that ROR performs the best in terms of sensitivity under this setting.

Figure 6 shows that, similar to setting 1, higher threshold values result in better specificity. In addition, the specificity decreases when the correlation increases. As the magnitude of correlation increases, the co-occurrences of positively correlated AEs increases, thereby increasing the chance of false positive findings compared to zero correlated AEs.

As expected, the overall accuracy shows dual effects of sensitivity and specificity, similar to Setting 1; the corresponding figures can be found in the Supplementary Materials. Similar trends are also observed for other combinations of

β

and p values; the corresponding simulation results are listed in the Supplementary Material as well.

3.1.3. Setting 3: Five Drugs and Ten AEs

Setting 3 includes five drugs and ten AEs, of which AEs are associated with drugs. To evaluate the effect of the direction (positive or negative) of AE correlations as well as to the magnitude of

ρ

, we consider both positive and negative correlations among the associated drug–AE pairs.

Figure 7 shows the sensitivity results when

β

= 1, the AE frequency p = 0.5, and the absolution correlation among AEs |

ρ

| = 0.3; we use notations −, +, and 0 to indicate negative

ρ

, positive

ρ

, and

ρ

= 0, respectively. Again, both PRR and ROR show better performance than Confidence and RR, and ROR is the most robust due to its slowly declining sensitivity when the threshold value increases. In addition, it can be seen from Figure 7 that both positive and negative correlations increase sensitivity for all three disproportionality measures compared with when AEs are uncorrelated, and positive correlation increases sensitivity more than negative correlation. As discussed before, when AEs are positively correlated, the co-occurrence counts of correlated AEs increase, which results in more associated drug–AE pairs being selected; however, in the presence of negatively correlated AEs the co-occurrence counts of correlated AEs decrease, meaning that fewer associated drug–AE pairs are detected.

Figure 8 shows that, similar to the other two settings, all four choices of the second parameter have increased specificity when the threshold values increase. In addition, both positive and negative correlations among AEs result in the decline of specificity compared with uncorrelated AEs. Because negative AE correlations will not have an incremental effect on AE co-occurrence counts compared to when AEs are positively correlated, the chance of false positives (specificity of 1) in the presence of negatively correlated AEs is lower.

The remaining combinations of AE frequencies/signal strengths/AE correlations for settings 1, 2, and 3 typically demonstrate trends similar to those already shown above. The corresponding plots are provided in the Supplementary Materials.

3.2. VAERS Data

In a total of 746 reports between 2010 and 2019 retrieved from VAERS database, there were 1094 AEs reported for 47 vaccines. The most frequently reported AE is Death, with a prevalence rate [34] of 0.764 (570 reports). Among the least frequently reported AEs are Eyelid Function Disorder, Blood Fibrinogen Increased, and Biopsy Soft Tissue, with a prevalence rate of 0.0013 (1 report). This is reflective of the over-reporting of severe AEs compared to AEs that are more moderate in severity in surveillance monitoring datasets. In addition, the most frequently reported vaccine administered is Pneumococcal, 13-Valent Vaccine (PREVNAR13), with 280 cases reported (prevalence rate of 0.375), while the most infrequently reported vaccines include BCG, Meningococcal B, and Japanese Encephalitis Virus Vaccine, Inactivated, Adsorbed with only one reported case. Several of the AEs reported in the VAERS dataset have strong positive correlations, such as (Ammonia Decreased, Alpha 1 Foetoprotein Normal) and (Lymphocyte Percentage Decreased, Blood Calcium Decreased), while some AEs have weak negative correlations, such as (Encephalopathy, Death).

To implement the Apriori approach, for each unique report we used binary code (1 and 0) to represent whether a particular vaccine was administered and whether a specific AE was reported. The threshold for the first parameter, Support, was set to be minsup = 3 (0.4% of all reports), as the overall mean of all AE frequencies in the data was 3. The threshold levels for the four second parameters of Confidence, PRR, RR, and ROR were the same as in the simulation studies (Section 2.3.1).

In this analysis, there are 51,418 out of 650,370 total possible two-item rules that contain one vaccine and one AE for further investigation, which results in a pruning rate of

92.09 %

.

Table 4 lists the number of associated vaccine–AE pairs identified using the Apriori approach under four different choices of the second parameter across varying thresholds. Notably, when using Confidence as the second parameter, the classical Apriori approach detects a much smaller number of associated vaccine–AE pairs than with the other choices of second parameter, even at a low threshold of 0.4. This underscores the limitation highlighted by Harpaz et al. [17,22] in terms of a propensity to overlook rare yet strongly associated AEs when employing Confidence as the second parameter in the Apriori algorithm.

When using PRR, RR, and ROR as the second parameter, the improved Apriori approach identifies the same 341 vaccine–AE pairs as when the threshold is 1; however, as the threshold value increases, the number of associated pairs detected by RR declines the most steeply, followed by PRR and then ROR.

Table 5 displays 10 out of 232 vaccine–AE pairs that were identified as associated using ROR at a threshold of 2, which involve seven unique vaccines and nine unique AEs. The rows in the table denote the first ten pairs from the list of all 232 pairs identified as associated, sorted in increasing order of their respective ROR values. The corresponding values of PRR, RR, and Confidence for each of these vaccine–AE pairs are also shown in the table. The values in parentheses in the column names denote the boundary values for each parameter.

From Table 5, it can be seen that some of the pairs identified as associated, such as Haemophilus B Conjugate Vaccine (HIBV) → Pallor, have Support as low as the minimum threshold count of three reports. In addition, PRR shows values close to ROR for these pairs, and the RR values are not far from ROR and PRR. This further illustrates the benefits of using disproportionality measures as the second parameter in the Apriori approach, especially when the association is rare or infrequent. On the contrary, 29 out of 33 pairs selected as associated using Confidence as the second parameter with a threshold of 0.4 contain Death as the AE, which is the most frequently reported AE in this dataset. This again confirms the limitation of using Confidence as the second parameter. Within these 33 identified pairs, 20 are also identified using PRR or ROR at a threshold value of 1 (see Table S1 in the Supplementary Materials for more detail). We also noticed that the rules DTAPIPVHIB → Unresponsive to Stimuli, FLU3 (SEASONAL) → Nausea, DTAPHEPBIP → Respiratory Arrest, RV5 → Resuscitation, HIBV → Pallor, and HIBV → Apnoea have been studied before [35,36,37,38,39,40,41,42,43,44,45,46], which confirms the validity of the improved Apriori approach.

4. Discussion

This paper involves a detailed study of an improved Apriori approach wherein the disproportionality measures of PRR, RR, and ROR are substituted for Confidence as the second parameter when implementing the Apriori algorithm for drug safety data. This method incorporates the advantages of disproportionality measures to detect rare safety signals that may be associated with a drug or vaccine. Our comprehensive studies have also taken into account challenges that are specific to drug safety data, such as rarely occurring AEs, underlying AE correlations, and varying levels of safety signal strength. In addition, we have investigated the performance of each parameter choice at different thresholds in order to understand how it affects the accuracy of association identification.

Our numerical studies show that using Confidence in Apriori implementation for drug safety data results in infrequent AEs having smaller Confidence even when they are strongly associated with a drug. The Relative Reporting Ratio (RR) has high sensitivity at a threshold of 1, but demonstrates decreased sensitivity at higher thresholds. Our simulation studies extensively compared PRR and ROR in different settings, revealing that PRR tends to have a sharply declining sensitivity at higher thresholds compared to ROR when the signal strength between an AE and a drug or vaccine is weak, even when AE frequencies are high. The stable performance of ROR in identifying drug/vaccine–AE pairs is particularly useful in identifying rare associations that are not observed during preclinical studies but are reported during the post-market stage, when there more data are present due to the drug or vaccine being available for a larger population.

Furthermore, from the real data studies it is observed that using Confidence as a parameter for screening associated vaccine–AE pairs leads to the selection of pairs mostly containing Death as the AE. Using disproportionality measures identifies only some of the rules that have Death as the AE; however, we recommend that in practice severe adverse events such as death should always be added manually for further investigation regardless of the screening analysis results.

Conclusions based on disproportionality measures implemented in post-marketing surveillance reports can only be drawn about the risk of AE reporting, not about the risk of the AE itself. Thus, results obtained therefrom should be considered strictly exploratory as a screening tool to find potentially important drug–AE associations, with further investigation required to establish causation.

5. Conclusions

The Apriori method is a widely used data mining tool for finding important associations within market basket datasets. However, because the method primarily focuses on frequent co-occurrences of transactions within the data, the classical Apriori approach is not suitable for safety signal analysis from post-marketing clinical data, which may exhibit safety signals arising from rarely occurring adverse events in relation to a drug or a vaccine.

Our comprehensive comparative study shows the benefit of using disproportionality measures as the second parameter in the Apriori method for screening of important associations between drugs/vaccines and AEs in post-marketing safety data, particularly when the goal is to identify safety signals from AEs that rarely co-occur with a drug or a vaccine. However, because the purpose of the improved Apriori method is to screen safety signals from rare AEs, it should be used in conjunction with conventional Apriori (using Confidence as second parameter) and/or manual investigation to ensure that potential signals from relatively more frequent AEs are not missed. It is also important to note that the generated rules are indicative of an underlying association and not a causal relationship; further studies and clinical assessment are needed to ascertain whether causality exists between a drug or vaccine and an AE.

The future scope of this study can be elucidated in the context of the methods used as well as the datasets used for application. In terms of method, the statistical properties of the disproportionality methods can be leveraged for statistical hypothesis testing to gauge associations between screened rules. The method used in this study can also be extended to higher-order itemsets to account for interactions between drugs or AEs. Furthermore, a wider application of the method with data from different post-marketing databases, such as the FDA Adverse Event Reporting System (FAERS) [47] or EudraVigilance Medicine Safety Database [48], as well as preclinical data, can be explored in the future.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12172705/s1, Figure S1: Sensitivity plots for Setting 1: 1 drug and 3 AEs when

β = 10

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S2: Specificity plots for Setting 1: 1 drug and 3 AEs when

β = 10

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S3: Overall Accuracy plots for Setting 1: 1 drug and 3 AEs when

β = 10

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S4: Sensitivity plots for Setting 1: 1 drug and 3 AEs when

β = 50

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S5: Specificity plots for Setting 1: 1 drug and 3 AEs when

β = 50

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S6: Overall Accuracy plots for Setting 1: 1 drug and 3 AEs when

β = 50

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S7: Sensitivity plots for Setting 1: 1 drug and 3 AEs when

β = 100

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S8: Specificity plots for Setting 1: 1 drug and 3 AEs when

β = 100

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S9: Overall Accuracy plots for Setting 1: 1 drug and 3 AEs when

β = 100

, and AE frequency

p = 0.1, 0.3, 0.5

. Figure S10: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.3

. Figure S11: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.3

. Figure S12: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.3

. Figure S13: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.5

. Figure S14: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.5

. Figure S15: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 1

, and AE frequency

p = 0.5

. Figure S16: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.1

. Figure S17: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.1

. Figure S18: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.1

. Figure S19: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.3

. Figure S20: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.3

. Figure S21: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.3

. Figure S22: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.5

. Figure S23: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.5

. Figure S24: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 10

, and AE frequency

p = 0.5

. Figure S25: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.1

. Figure S26: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.1

. Figure S27: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.1

. Figure S28: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.3

. Figure S29: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.3

. Figure S30: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.3

. Figure S31: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.5

. Figure S32: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.5

. Figure S33: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 50

, and AE frequency

p = 0.5

. Figure S34: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.1

. Figure S35: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.1

. Figure S36: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.1

. Figure S37: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.3

. Figure S38: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.3

. Figure S39: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.3

. Figure S40: Sensitivity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.5

. Figure S41: Specificity plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.5

. Figure S42: Overall Accuracy plots for Setting 2: 3 drug and 5 AEs when

β = 100

, and AE frequency

p = 0.5

. Figure S43: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 1 and p = 0.1. Figure S44: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 1 and p = 0.1. Figure S45: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 1 and p = 0.3. Figure S46: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 1 and p = 0.3. Figure S47: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.1. Figure S48: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.1. Figure S49: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.3. Figure S50: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.3. Figure S51: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.5. Figure S52: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 10 and p = 0.5. Figure S53: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.1. Figure S54: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.1. Figure S55: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.3. Figure S56: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.3. Figure S57: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.5. Figure S58: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 50 and p = 0.5. Figure S59: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.1. Figure S60: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.1. Figure S61: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.3. Figure S62: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.3. Figure S63: Sensitivity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.5. Figure S64: Specificity plots for Setting 3: 5 drugs and 10 AEs when

β

= 100 and p = 0.5. Table S1: 33 Vaccine-AE pairs selected using Apriori with Confidence at the threshold of 0.4, along with their corresponding values of PRR, RR, and ROR.

Author Contributions

Methodology, J.S. and R.S.; study design, J.S. and R.S.; computation, R.S.; data analysis, R.S.; investigation, R.S.; resources, R.S.; data curation, R.S.; manuscript writing—original draft preparation, R.S.; manuscript writing—review and editing, J.S. and R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization (WHO). The Importance of Pharmacovigilance; WHO Quality Assurance and Safety Team: Athens, Greece, 2022; Retrieved 18 December 2023; ISBN 9241590157. [Google Scholar]
Wax, P.M. Elixirs, diluents, and the passage of the 1938 Federal Food, Drug and Cosmetic Act. Ann. Intern. Med. 1995, 122, 456–461. [Google Scholar] [CrossRef]
Vargesson, N. Thalidomide-induced teratogenesis: History and mechanisms. Birth Defects Res. Part C Embryo Today Rev. 2015, 105, 140–156. [Google Scholar] [CrossRef]
Sibbald, B. Rofecoxib (Vioxx) voluntarily withdrawn from market. CMAJ Can. Med. Assoc. J. 2004, 171, 1027–1028. [Google Scholar] [CrossRef] [PubMed]
Krumholz, H.M.; Ross, J.S.; Presler, A.H.; Egilman, D.S. What have we learnt from Vioxx? BMJ Br. Med. J. 2007, 334, 120–123. [Google Scholar] [CrossRef] [PubMed]
Mehrotra, D.V.; Adewale, A.J. Flagging clinical adverse experiences: Reducing false discoveries without materially compromising power for detecting true signals. Stat. Med. 2012, 31, 1918–1930. [Google Scholar] [CrossRef] [PubMed]
Tan, X.; Chen, B.E.; Sun, J.; Patel, T.; Ibrahim, J.G. A hierarchical testing approach for detecting safety signals in clinical trials. Stat. Med. 2020, 39, 1541–1557. [Google Scholar] [CrossRef] [PubMed]
Berry, S.M.; Berry, D.A. Accounting for multiplicities in assessing drug safety: A three-level hierarchical mixture model. Biometrics 2004, 60, 418–426. [Google Scholar] [CrossRef]
Amy Xia, H.; Ma, H.; Carlin, B.P. Bayesian hierarchical modeling for detecting safety signals in clinical trials. J. Biopharm. Stat. 2011, 21, 1006–1029. [Google Scholar] [CrossRef]
DuMouchel, W. Multivariate Bayesian Logistic Regression for Analysis of Clinical Study Safety Issues. Statist. Sci. 2012, 27, 319–339. [Google Scholar] [CrossRef]
DuMouchel, W.; Pregibon, D. Empirical bayes screening for multi-item associations. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 26–29 August 2001; pp. 67–76. [Google Scholar]
DuMouchel, W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Am. Stat. 1999, 53, 177–190. [Google Scholar] [CrossRef]
Bate, A.; Lindquist, M.; Edwards, I.R.; Olsson, S.; Orre, R.; Lansner, A.; De Freitas, R.M. A Bayesian neural network method for adverse drug reaction signal generation. Eur. J. Clin. Pharmacol. 1998, 54, 315–321. [Google Scholar] [CrossRef] [PubMed]
Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference very Large Data Bases, VLDB, Santiago, Chile, 12–15 September 1994; Volume 1215, pp. 487–499. [Google Scholar]
Srikant, R.; Agrawal, R. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the Advances in Database Technology—EDBT ’96; Apers, P., Bouzeghoub, M., Gardarin, G., Eds.; Springer: Berlin/Heidelberg, Germany, 1996; pp. 1–17. [Google Scholar]
Kuo, M.H.; Kushniruk, A.W.; Borycki, E.M.; Greig, D. Application of the Apriori Algorithm for Adverse Drug Reaction Detection. In Detection and Prevention of Adverse Drug Events; IOS Press: Amsterdam, The Netherlands, 2009; pp. 95–101. [Google Scholar] [CrossRef]
Harpaz, R.; Chase, H.S.; Friedman, C. Mining multi-item drug adverse effect associations in spontaneous reporting systems. BMC Bioinform. 2010, 11, S7. [Google Scholar] [CrossRef] [PubMed]
van Puijenbroek, E.P.; Bate, A.; Leufkens, H.G.M.; Lindquist, M.; Orre, R.; Egberts, A.C.G. A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Pharmacoepidemiol. Drug Saf. 2002, 11, 3–10. [Google Scholar] [CrossRef]
Evans, S.J.W.; Waller, P.; Davis, S. Proportional reporting ratios: The uses of epidemiological methods for signal generation. Pharmacoepidemiol. Drug Saf. 1998, 7, S102. [Google Scholar]
Szarfman, A.; Machado, S.G.; O’Neill, R.T. Use of screening algorithms and computer systems to efficiently signal higher-than-expected combinations of drugs and events in the US FDA’s spontaneous reports database. Drug Saf. 2002, 25, 381–392. [Google Scholar] [CrossRef] [PubMed]
Meyboom, R.H.; Hekster, Y.A.; Egberts, A.C.; Gribnau, F.W.; Edwards, I.R. Causal or casual? The role of causality assessment in pharmacovigilance. Drug Saf. 1997, 17, 374–389. [Google Scholar] [CrossRef]
Harpaz, R.; Haerian, K.; Chase, H.S.; Friedman, C. Statistical Mining of Potential Drug Interaction Adverse Effects in FDA’s Spontaneous Reporting System. AMIA Annu. Symp. Proc. 2010, 2010, 281–285. [Google Scholar]
Wang, C.; Guo, X.J.; Xu, J.F.; Wu, C.; Sun, Y.L.; Ye, X.F.; Qian, W.; Ma, X.Q.; Du, W.M.; He, J. Exploration of the Association Rules Mining Technique for the Signal Detection of Adverse Drug Events in Spontaneous Reporting Systems. PLoS ONE 2012, 7, e40561. [Google Scholar] [CrossRef]
Rothman, K.J.; Lanes, S.; Sacks, S.T. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol. Drug Saf. 2004, 13, 519–523. [Google Scholar] [CrossRef]
Lallich, S.; Teytaud, O.; Prudhomme, E. Association Rule Interestingness: Measure and Statistical Validation; Springer: Berlin/Heidelberg, Germany, 2007; Volume 43, pp. 251–275. [Google Scholar] [CrossRef]
Evans, S.J.W.; Waller, P.C.; Davis, S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol. Drug Saf. 2001, 10, 483–486. [Google Scholar] [CrossRef]
Avery, A.; Anderson, C.; Bond, C.; Fortnum, H.; Gifford, A.; Hannaford, P.; Hazell, L.; Krska, J.; Lee, A.; McLernon, D.; et al. Evaluation of patient reporting of adverse drug reactions to the UK ‘Yellow Card Scheme’: Literature review, descriptive and qualitative analyses, and questionnaire surveys. Health Technol. Assess. 2011, 15, 1–234. [Google Scholar] [CrossRef] [PubMed]
Sindhu, M.S.; Kannan, B. Detecting signals of drug-drug interactions using association rule mining methodology. IJCSIT Int. J. Comput. Sci. Inf. Technol. 2013, 4, 590–594. [Google Scholar]
Bate, A.; Evans, S.J.W. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol. Drug Saf. 2009, 18, 427–436. [Google Scholar] [CrossRef] [PubMed]
Lunn, A.D.; Davies, S.J. A Note on Generating Correlated Binary Variables. Biometrika 1998, 85, 487–490. [Google Scholar] [CrossRef]
Shimabukuro, T.T.; Nguyen, M.; Martin, D.; DeStefano, F. Safety monitoring in the vaccine adverse event reporting system (VAERS). Vaccine 2015, 33, 4398–4405. [Google Scholar] [CrossRef]
Kohl, M. Performance Measures in Binary Classification. Int. J. Stat. Med. Res. 2012, 1, 79–81. [Google Scholar] [CrossRef]
Alberg, A.J.; Park, J.W.; Hager, B.W.; Brock, M.V.; Diener-West, M. The Use of “Overall Accuracy” to Evaluate the Validity of Screening or Diagnostic Tests. J. Gen. Intern. Med. 2004, 19, 460–465. [Google Scholar] [CrossRef]
Marshall, S.W. Prevalence and Incidence. In Encyclopedia of Social Measurement; Kempf-Leonard, K., Ed.; Elsevier: New York, NY, USA, 2005; pp. 141–147. [Google Scholar] [CrossRef]
Hansen, J.; Decker, M.D.; Lewis, E.; Fireman, B.; Pool, V.; Greenberg, D.P.; Johnson, D.R.; Black, S.; Klein, N.P. Hypotonic-hyporesponsive Episodes After Diphtheria, Tetanus and Acellular Pertussis Vaccination. Pediatr. Infect. Dis. J. 2021, 40, 1122–1126. [Google Scholar] [CrossRef]
Baxter, R.; Patriarca, P.; Ensor, K.; Izikson, R.; Goldenthal, K.; Cox, M. Evaluation of the safety, reactogenicity and immunogenicity of FluBlok® trivalent recombinant baculovirus-expressed hemagglutinin influenza vaccine administered intramuscularly to healthy adults 50–64 years of age. Vaccine 2011, 29, 2272–2278. [Google Scholar] [CrossRef]
Yang, L.P. Recombinant trivalent influenza vaccine (Flublok®): A review of its use in the prevention of seasonal influenza in adults. Drugs 2013, 73, 1357–1366. [Google Scholar] [CrossRef]
Halperin, S.A.; Smith, B.; Clarke, K.; Treanor, J.; Mabrouk, T.; Germain, M. A Phase I, Randomized, Controlled Trial to Study the Reactogenicity and Immunogenicity of a Nasal, Inactivated Trivalent Influenza Virus Vaccine in Healthy Adults. Hum. Vaccines 2005, 1, 37–42. [Google Scholar] [CrossRef] [PubMed]
Baldo, V.; Bonanni, P.; Castro, M.; Gabutti, G.; Franco, E.; Marchetti, F.; Prato, R.; Vitale, F. Combined hexavalent diphtheria-tetanus-acellular pertussis-hepatitis B-inactivated poliovirus-Haemophilus influenzae type b vaccine; Infanrix™ hexa: Twelve years of experience in Italy. Hum. Vaccines Immunother. 2014, 10, 129–137. [Google Scholar] [CrossRef] [PubMed]
Zangwill, K.M.; Eriksen, E.; Lee, M.; Lee, J.; Marcy, S.M.; Friedland, L.R.; Weston, W.; Howe, B.; Ward, J.I. A population-based, postlicensure evaluation of the safety of a combination diphtheria, tetanus, acellular pertussis, hepatitis B, and inactivated poliovirus vaccine in a large managed care organization. Pediatrics 2008, 122, e1179–e1185. [Google Scholar] [CrossRef]
Omeñaca, F.; Vázquez, L.; Garcia-Corbeira, P.; Mesaros, N.; Hanssens, L.; Dolhain, J.; Gómez, I.P.; Liese, J.; Knuf, M. Immunization of preterm infants with GSK’s hexavalent combined diphtheria-tetanus-acellular pertussis-hepatitis B-inactivated poliovirus-Haemophilus influenzae type b conjugate vaccine: A review of safety and immunogenicity. Vaccine 2018, 36, 986–996. [Google Scholar] [CrossRef] [PubMed]
Loughlin, J.; Mast, T.C.; Doherty, M.C.; Wang, F.T.; Wong, J.; Seeger, J.D. Postmarketing evaluation of the short-term safety of the pentavalent rotavirus vaccine. Pediatr. Infect. Dis. J. 2012, 31, 292–296. [Google Scholar] [CrossRef]
McGrath, E.J.; Thomas, R.; Duggan, C.; Asmar, B.I. Pentavalent rotavirus vaccine in infants with surgical gastrointestinal disease. J. Pediatr. Gastroenterol. Nutr. 2014, 59, 44–48. [Google Scholar] [CrossRef]
Black, S.; Greenberg, D.P. A combined diphtheria, tetanus, five-component acellular pertussis, poliovirus and Haemophilus influenzae type b vaccine. Expert Rev. Vaccines 2005, 4, 793–805. [Google Scholar] [CrossRef]
Silfverdal, S.A.; Coremans, V.; François, N.; Borys, D.; Cleerbout, J. Safety profile of the 10-valent pneumococcal non-typeable Haemophilus influenzae protein D conjugate vaccine (PHiD-CV). Expert Rev. Vaccines 2017, 16, 109–121. [Google Scholar] [CrossRef]
Botham, S.; Isaacs, D.; Henderson-Smart, D. Incidence of apnoea and bradycardia in preterm infants following DTPw and Hib immunization: A prospective study. J. Paediatr. Child Health 1997, 33, 418–421. [Google Scholar] [CrossRef]
Alatawi, Y.M.; Hansen, R.A. Empirical estimation of under-reporting in the US Food and Drug Administration adverse event reporting system (FAERS). Expert Opin. Drug Saf. 2017, 16, 761–767. [Google Scholar] [CrossRef]
Postigo, R.; Brosch, S.; Slattery, J.; van Haren, A.; Dogné, J.M.; Kurz, X.; Candore, G.; Domergue, F.; Arlett, P. EudraVigilance medicines safety database: Publicly accessible data for research and public health protection. Drug Saf. 2018, 41, 665–675. [Google Scholar] [CrossRef]

Figure 1. Sensitivity plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 1. Sensitivity plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 2. Specificity plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 2. Specificity plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 3. Overall accuracy plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 3. Overall accuracy plots for Setting 1: One drug and three AEs with

β = 1

and AE frequency

p = 0.1, 0.3, 0.5

.

Figure 4. Sensitivity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.1.

Figure 4. Sensitivity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.1.

Figure 5. Sensitivity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.5.

Figure 5. Sensitivity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.5.

Figure 6. Specificity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.1.

Figure 6. Specificity plots for Setting 2: Three drugs and five AEs with

β

= 1 and p = 0.1.

Figure 7. Sensitivity plots for Setting 3: Five drugs and ten AEs with

β

= 1, p = 0.5, and |

ρ

| = 0.3.

Figure 7. Sensitivity plots for Setting 3: Five drugs and ten AEs with

β

= 1, p = 0.5, and |

ρ

| = 0.3.

Figure 8. Specificity plots for Setting 3: Five drugs and ten AEs with

β

= 1, p = 0.5, and |

ρ

| = 0.3.

Figure 8. Specificity plots for Setting 3: Five drugs and ten AEs with

β

= 1, p = 0.5, and |

ρ

| = 0.3.

Table 1. Contingency table for disproportionality measures.

	Suspected AE	All Other AEs	Total
Suspected drug	a	b	a + b
All other drugs	c	d	c + d
Total	a + c	b + d	a + b + c + d

Table 2. Subset of VAERS data from January 2010.

VAERS ID	VAERS ID Code	Vaccine Type	Vaccine Type Code	Symptoms	Symptoms Code
0376710-1	0376710-1	DIPHTHERIA AND TETANUS TOXOIDS AND ACELLULAR PERTUSSIS VACCINE + INACTIVATED POLIOVIRUS VACCINE + HAEMOPHILUS B CONJUGATE VACCINE	DTAPIPVHIB	DEATH	10011906
0376710-1	0376710-1	DIPHTHERIA AND TETANUS TOXOIDS AND ACELLULAR PERTUSSIS VACCINE + INACTIVATED POLIOVIRUS VACCINE + HAEMOPHILUS B CONJUGATE VACCINE	DTAPIPVHIB	UNRESPONSIVE TO STIMULI	10045555
0376710-1	0376710-1	INFLUENZA VIRUS VACCINE, TRIVALENT (INJECTED)	FLU3(SEASONAL)	DEATH	10011906
0376710-1	0376710-1	INFLUENZA VIRUS VACCINE, TRIVALENT (INJECTED)	FLU3(SEASONAL)	UNRESPONSIVE TO STIMULI	10045555
0376710-1	0376710-1	PNEUMOCOCCAL, 7-VALENT VACCINE (PREVNAR)	PNC	DEATH	10011906
0376710-1	0376710-1	PNEUMOCOCCAL, 7-VALENT VACCINE (PREVNAR)	PNC	UNRESPONSIVE TO STIMULI	10045555
0376969-1	0376969-1	INFLUENZA (H1N1) MONOVALENT (INJECTED)	FLU(H1N1)	COAGULOPATHY	10009802
0376969-1	0376969-1	INFLUENZA (H1N1) MONOVALENT (INJECTED)	FLU(H1N1)	DEATH	10011906
0376969-1	0376969-1	INFLUENZA (H1N1) MONOVALENT (INJECTED)	FLU(H1N1)	DRUG INTERACTION	10013710

Table 3. Contingency table for validity measurement of binary classification. TP: true positive; FP: false positive; FN: false negative; TN: true negative.

	Associated Pairs	Non-Associated Pairs	Total
Selected	a (TP)	b (FP)	a + b
Not selected	c (FN)	d (TN)	c + d
Total	a + c	b + d	a + b + c + d

Table 4. Counts of selected vaccine–AE association pairs from VAERS data.

Parameter	Boundary Values	Threshold	Selected Pairs
Confidence	(0,1)	0.4	33
		0.5	30
		0.6	28
		0.7	26
PRR	(0,∞)	1	341
		1.2	308
		1.5	272
		2	214
RR	(0,∞)	1	341
		1.2	300
		1.5	246
		2	153
ROR	(0,∞)	1	341
		1.2	321
		1.5	282
		2	232

Table 5. Ten associated vaccine–AE pairs selected using Apriori with ROR at a threshold of 2, along with the corresponding values of PRR, RR, and Confidence.

Vaccine Code	Vaccine Name	AE/Symptom Name	Support	ROR (0,∞)	PRR (0,∞)	RR (0,∞)	Confidence (0,1)
DTAPIPVHIB	DIPHTHERIA AND TETANUS TOXOIDS AND ACELLULAR PERTUSSIS VACCINE + INACTIVATED POLIOVIRUS VACCINE + HAEMOPHILUS B CONJUGATE VACCINE	UNRESPONSIVE TO STIMULI	25	2.059	1.867	1.609	0.181
FLU3 (SEASONAL)	INFLUENZA VIRUS VACCINE, TRIVALENT (INJECTED)	NAUSEA	8	2.061	2.002	1.681	0.056
DTAPHEPBIP	DIPHTHERIA AND TETANUS TOXOIDS AND ACELLULAR PERTUSSIS VACCINE + HEPATITIS B + INACTIVATED POLIOVIRUS VACCINE	RESPIRATORY ARREST	15	2.063	1.939	1.668	0.116
RV5	ROTAVIRUS VACCINE, LIVE, ORAL, PENTAVALENT	RESUSCITATION	35	2.074	1.853	1.551	0.206
HEPA	HEPATITIS A	INTENSIVE CARE	4	2.074	1.972	1.870	0.095
HIBV	HAEMOPHILUS B CONJUGATE VACCINE	PALLOR	3	2.077	2.055	1.703	0.020
HIBV	HAEMOPHILUS B CONJUGATE VACCINE	DEHYDRATION	3	2.077	2.055	1.703	0.020
HIBV	HAEMOPHILUS B CONJUGATE VACCINE	RHINORRHOEA	3	2.077	2.055	1.703	0.020
PPV	PNEUMOCOCCAL VACCINE, POLYVALENT	INTENSIVE CARE	3	2.082	1.977	1.900	0.097
HIBV	HAEMOPHILUS B CONJUGATE VACCINE	APNOEA	4	2.084	2.055	1.703	0.027

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sarkar, R.; Sun, J. Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data. Mathematics 2024, 12, 2705. https://doi.org/10.3390/math12172705

AMA Style

Sarkar R, Sun J. Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data. Mathematics. 2024; 12(17):2705. https://doi.org/10.3390/math12172705

Chicago/Turabian Style

Sarkar, Reetika, and Jianping Sun. 2024. "Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data" Mathematics 12, no. 17: 2705. https://doi.org/10.3390/math12172705

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Apriori Method for Safety Signal Detection Using Post-Marketing Clinical Data

Abstract

1. Introduction

1.1. Safety Data Analysis

1.2. Apriori-Based Methods for Safety Signal Detection

2. Method

2.1. Classical Apriori Method

2.2. Improved Apriori Method

2.3. Numerical Studies

2.3.1. Simulation Studies

2.3.2. VAERS Data

3. Results

3.1. Simulation Study

3.1.1. Setting 1: One Drug and Three AEs

3.1.2. Setting 2: Three Drugs and Five AEs

3.1.3. Setting 3: Five Drugs and Ten AEs

3.2. VAERS Data

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI