1. Introduction
Bioequivalence (BE) between two drug products results in the conclusion that both, by showing a sufficiently similar rate and extent of absorption, should present a comparable in vivo performance in terms of safety and efficacy. In general, BE studies are usually conducted in healthy subjects, where the plasma concentration—time curve is generally used to assess the rate and extent of absorption [
1]. The assessment of BE is based upon the 90% confidence interval (CI) for the test-to-reference product population geometric mean ratio (GMR) for the parameters under consideration, usually area under the concentration time curve (AUC), that reflects the extent of exposure, and the maximum plasma concentration (Cmax), or peak exposure, that is influenced by the absorption rate, using an average bioequivalence (ABE) approach [
2].
BE is usually concluded if the GMR 90% CI falls within the regulatory acceptance limits. Usually, it is considered that a ±20% difference between the test and reference Cmax and AUC should not result in clinically relevant differences and based on this, these limits are fixed and symmetrical on a logarithmic scale presenting an acceptance range on the original scale of 80.00–125.00% [
3]. However, for a narrow therapeutic index (NTI) drug, where a small difference in the administered dose may result in either serious therapeutic failures or adverse drug reactions [
4], a more conservative approach is usually followed by considering an acceptable difference reduced to ±10% [
5]. In these situations, a tighter acceptance interval of 90.00–111.11% (thereafter referred to as ‘tighter limits’) is currently applied arbitrarily by the European Medicines Agency (EMA) [
6], as well as several other regulatory agencies [
7,
8], resulting in an increased difficulty to demonstrate BE because of larger sample size requirements. It is considered unnecessary to narrow the acceptance range by more than this ±10% acceptance range because the reference product content can vary between 95% and 105% in the European Union, therefore, the reference side batches may differ up to 10%, even if it is acknowledged that the difference in content and the differences in bioavailability are additive factors.
Recently, an alternative acceptance criterion has been proposed for products containing NTI drugs that would allow for a lower burden in terms of the number of subjects required to show BE [
9]. This approach, that intents to be used as an alternative option on a voluntary basis to the current European NTI acceptance criteria, consists of an ABE with narrowed limits based on the intra-subject (or within-subject) variability of the reference product (NLIVR), similarly to the approach used for widening the acceptance range of Cmax in the case of highly variable drug products (HVDP) [
6]. If the applicant decides to take use of this approach, the following five conditions have been proposed:
- (1)
The within-subject standard deviation (sWR) is calculated from the reference formulation in the same replicate cross-over study where the acceptance range is to be narrowed;
- (2)
If the estimated reference within-subject coefficient of variation (WSCV) does not exceed 13.93% (corresponding to sWR ≤ 0.1386), the 90.00–111.11% acceptance range is applied;
- (3)
If the estimated WSCV exceeds 30% (corresponding to sWR ≥ 0.29356), the 80.00–125.00% acceptance range is applied;
- (4)
If the estimated WSCV range between 13.93% and 30%, the acceptance range is defined by [U, L] = exp [±k·sWR];
- (5)
The regulatory “proportionality” constant k is set to 0.760, as for HVD products.
A graphical representation of the proposed strategy is shown in
Figure 1. In theory, and because the acceptable differences between test and reference are similar to the differences that may exist within the reference product itself, this approach is not expected to increase the clinical risk. In fact, in the past some NTI drugs were approved with a 20% acceptance range (e.g., carbamazepine, levothyroxine) in the European Union and presently some drugs considered as NTI drugs in the US FDA are not considered as NTI drugs in the European Union (e.g., dabigatran, flecainide). Additionally, the advantages of the method in terms of reducing the number of subjects have already been previously discussed [
9]. However, similarly to what was noted for the HVDP ABE with expanding limit criteria [
10], the TOST (two one-sided test) procedure cannot be directly applied to the proposed NLIVR method since the BE limits themselves become random variables and the method is not correct in the strict sense. As such, in this work, the authors intend to explore the performance of the procedures to be applied for the determination of BE for products containing NTI drugs using the NLIVR criteria, by means of power curves that were simulated under various assumptions, conditions and sample size requirements. Since, theoretically, the NLIVR criteria will allow for products with higher differences to be considered as bioequivalent, the application of a new and additional requirement on the method (a further condition 6); namely, a constraint on the GMR to be contained in the acceptance range of 90.00–111.11%, was also tested.
4. Discussion
We have recently proposed an alternative criterion for assessing BE between two products containing a NTI drug in the European Union, consisting of narrowed limits based on the within-subject variability of the reference product [
9], to be used voluntarily by the sponsors if the intra-subject CV is expected to be lower than 13.9%. One of the problems with NTI drugs, especially if they present a low WSCV, is the risk of “generic drift” due to the possibility that an over dimensioned study, with an unnecessarily high number of subjects, could allow to approve a drug product closer to the extremes of the acceptance interval. As such, the approach taken by the EMA (and other regulatory agencies) was to reduce the acceptance difference to 10% and tightened the acceptance interval to 90.00–111.11%. This resulted not only in generic products that are safe and efficacious, but also considered bioequivalence between themselves [
5]. However, the definition of NTI drugs in Europe is made on a case-by-case analysis and is based on clinical considerations [
6]. Although taken in consideration, having a very low within-subject variability is not a mandatory condition for a drug being regarded as NTI, and examples exist in this regard [
9]. As such, for these drug products, higher variability may make it very difficult to conclude BE unless a significantly high number of subjects is recruited. By considering acceptance limits that are narrowed based on the intra-subject CV of the reference product instead of the currently used tighter acceptance interval of 90.00–111.11%, this burden is substantially reduced, but the probability of showing BE when the products differ more than 10%, although expectably clinically irrelevant, is obviously possible if the reference product’s WSCV is moderate-to-high.
The general performance of the proposed regulatory criterion can be seen In
Figure 2, either with (right side) or without (left side) the consideration of the additional GMR constraint, where the GMR has to be located inside the 90.00–111.11% acceptance limits. If on the one hand, as theoretically expected, these plots show that the probability to conclude BE between products is always very low when the difference between these drugs is higher than the reference product’s within-subject variability; on the other hand, they also show that, as variability increases, it is possible to approve products with theoretical differences equal to or higher than 10%. For example, if a BE trial with 90 subjects is performed with products differing by 12.5% (FIG 2C), the probability to conclude BE is around 70%, if the reference product presents a within-subject variability higher than 30%. Based on this, it could be argued that a generic product containing a NTI drug could still show BE in spite of its different bioavailability/lower biopharmaceutical quality just by increasing the number of subjects to an acceptable value. However, since the reference product itself shows high variability (>30%) and typically patients are exposed to notable exposure differences between administrations without clinical relevance, this difference > ±10% between the test and reference products should not be clinically relevant. In fact, the differences in bioavailability experienced by the patients when administered chronically is the sum of the differences between units of the same batch, between batches of the same product, the differences due to intra-subject CV and the difference in BA between formulations. The latter is only affected when changing between reference and generic products. For this reason, we may contrast variability of within-patient exposure to a change in the population mean exposure, since both are impacting the exposure differences that are observed in a given patient. If the variability is low, the difference has to be low to consider that the distributions of the test and the reference sufficiently overlap. Consequently, the comparison of differences in ABE can be improved if the differences are assessed under standardization, i.e., when the differences are scaled by the observed variability [
16]. However, it could nevertheless result in a general public concern and loss of confidence in generics [
17] because a higher theoretical differences between drug products may, in fact, be observed. This same concern was also put forward when the widening of the acceptance range was established for assessing BE of highly variable drug products, fostering the inclusion of an additional point estimate constraint mainly due to “political” reasons [
18]. In this line of reasoning, for HVDP under the EMA guideline, the GMR for Cmax must be inside the 80.00–125.00% limit in order to conclude BE within the enlarged acceptance limits [
6]. Additionally, although in a different context, under the Canadian guideline, BE for Cmax of “typical” drugs is concluded if the T/R GMR (as point estimate) for this parameter is inside the 80.00–125.00% limits [
7].
As shown in
Figure 2, the inclusion of this constraint changes the approval surface profile significantly. By being a point estimate-based criterium, this additional condition limits the probability of accepting BE between products differing by 10% at around 50% maximum (
Figure 2B), while it was more than 95% without the point estimate constraint (
Figure 2A) for higher variabilities and sample sizes. For greater differences between the products, it limits the probability of concluding BE even for higher variabilities and, most importantly, the increase in sample size seems to reduce the probability of showing BE (
Figure 2D,F,H). This effect is somehow expected because when the differences increase, the point estimate constraint will progressively be predominant in the overall combined regulatory criterion, as previously demonstrated [
19].
A T1E increase associated with the BE acceptance limits that are scaled in relation to the size of the within-subject variability have been expected since the very early presentation of these methods [
20]. This came from the fact that the TOST (two one-sided test) procedure cannot be directly applied to the proposed method since the BE limits themselves become random variables and the method is not correct in the strict sense [
3]. This inflation was later seen to be influenced by many factors, such as the regulatory constant value, the cut-off point between the unscaled and scaled ABE, as well as the existence of continuity at the cut-off point. Depending on the balance between these factors, T1E inflations of up to 16% have been described [
13,
19]. In the present case, as shown in
Figure 3A, a T1E up to a maximum of 7% around a WSCV of 13.93% for sample sizes of more than 40 subjects is seen. This is in the same order of magnitude as already described for the EMA widening approach for HVDP [
14]. It can also be seen that the surface of T1E inflation spreads from 12% to 27% of a WSCV, in a relatively independent manner for sample sizes higher than 40 subjects. For sample sizes below 40, this T1E inflation is not seen, maybe because for sample sizes of this order the demonstration of BE is always very difficult, and the power obtained should be intrinsically low. In most of the inflation area, T1Es were less than 6%. The inclusion of the point estimate constraint did not reduce the maximum T1E inflation around the WSCV of 13.93% (
Figure 3B). This is similar to the conclusions reached by Endrenyi et al. [
19], where no effect of the point estimate constraint was observed on consumer risk in the vicinity of the cut-off point. However, the inclusion of this constraint significantly reduces the area of T1E inflation to be contained between a WSCV of 12% to 18%, especially for higher sample sizes. The inflation of the T1E has obvious consequences in the quality of the employed statistical methods and several approaches have been put forward in order to solve this issue [
14,
21]. However, as for the EMA criteria for HVDP, it is fair to say that in practice the consumer risk is nearly 5% and since the proposed regulatory conditions are continuous around a WSCV of 13.93% the probabilities of acceptance and rejection are only slightly different on the two sides of the cut-off WSCV [
19].
We have previously shown that the proposed NLIVR criterion, when compared to the current EMA NTI acceptance criterion, greatly reduces the sample sizes required to demonstrate BE for a WSCV above 15% when the two products are equal [
9]. This sample size reduction is also observed, as expected, when small acceptable differences (from 2.5% to 7.5%) in the GMR are considered, a possible initial conditions considered by applicant’s when determining required sample sizes for BE clinical trials.
Figure 4 shows that the sample sizes required for 80% power in a BE study for a WSCV above 15% is, again, greatly reduced when comparing the NLVIR to the current NTI EMA criterion in all the considered scenarios. This is, in fact, one of the major advantages expected for this type of “fixed multiple-of-CV” methods [
20]. In addition to this, the simulations also show that the inclusion of the GMR constraint does not significantly change the sample size requirement, if the expected differences are low.
Figure 4 and
Table 1 show that if the differences are lower than 5%, the sample size required (for both 80% or 90% power) is basically the same either with or without the GMR constraint. This picture, however, changes dramatically if the expected difference between products increases above 5%. For the simulated difference of 7.5% and a WSCV above 20%, the inclusion of the GMR constraint greatly increases the sample size required to have at least 80% power. This is understandable since at large variabilities the probability of observing very high (or very low) GMR is higher, as already described [
3]. In addition, the inclusion of this constraint should also work as an additional reason for not artificially increasing the clinical assay variability in order to conclude BE because, as can be seen in
Figure 2, it reduces the power to conclude BE in a non-linear way in those cases.
It is acknowledged that this proposed method does not include a comparison of the intra-subject CV of the test and reference products, since this has never been required in the European Union. For a change in this paradigm, evidence that products may exhibit a >2-fold difference in the intra-subject variability would be essential. It seems rather unlikely, except in exceptional circumstances with notable differences in manufacturing technology, because intra-subject variability is mostly due to the bioanalytical method, especially if the CV is low, and the physiological processes involved in absorption and the first-pass effect that are affecting both products similarly. Therefore, imposing a comparison for intra-subject CV within a 2-fold acceptance limit was considered unnecessary and an increased burden for something that is not considered as a clinical concern presently [
22]. Furthermore, large differences in manufacturing technology are not expected between generics and the innovator product, since generics tend to copy as much as possible the reference products to have greater success in ABE (e.g., solid dispersions for tacrolimus and everolimus, but simple manufacturing for BCS class I drugs like warfarin). In addition, Tothfalusi and Endrenyi further noted that “the additional regulatory criterion modifies the statistical features of the primary criterion that of comparing the means” [
22].