Next Article in Journal
Robustness and Efficiency Considerations When Testing Process Reliability with a Limit of Detection
Previous Article in Journal
Analyzing the Impact of Organic Food Consumption on Citizens Health Using Unsupervised Machine Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Advances in Clinical Trial Design: Employing Adaptive Multiple Testing and Neyman Allocation for Unequal Samples

1
Department of Mathematics and Statistics, Jordan University of Science and Technology, Irbid 22110, Jordan
2
Department of Mathematical Sciences, New Mexico State University, Las Cruces, NM 88001, USA
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(8), 1273; https://doi.org/10.3390/math13081273
Submission received: 8 February 2025 / Revised: 6 April 2025 / Accepted: 9 April 2025 / Published: 12 April 2025
(This article belongs to the Special Issue Data Modeling and Analysis in Epidemiology and Biostatistics)

Abstract

:
This study introduces a new method that combines three distinct approaches for comparing two treatments: Neyman allocation, the O’Brien and Fleming multiple testing procedure, and a system of different sample weights at different stages. This new approach is called the Neyman Weighted Multiple Testing Procedure (NWMP). Each of these adaptive designs “individually” has proven beneficial for clinical research by removing constraints that can limit clinical trials. The advantages of these three methods are merged into a single, innovative approach that demonstrates increased efficiency in this work. The multiple testing procedure allows for trials to be stopped before their chosen time frame if one treatment is more effective. Neyman allocation is a statistically sound method designed to enhance the efficiency and precision of estimates. It strategically allocates resources or sample sizes to maximize the quality of statistical inference, considering practical constraints. Additionally, using different weights in this method provides greater flexibility, allowing for the effective distribution of sample sizes across various stages of the research. This study demonstrates that the new method maintains similar efficiency in terms of the Type I error rate and statistical power compared to the O’Brien and Fleming test while offering additional flexibility. Furthermore, the research includes examples of both real and hypothetical cases to illustrate the developed procedure.

1. Introduction

Adaptive designs have significant advantages in clinical research, as they address limitations that have hindered the application of a static pre-established protocol. Interim analyses (sequential testing procedures), especially, enable the early termination of trials when one treatment demonstrates clear superiority over the other. These interim analyses are used to evaluate the early evidence of an intervention’s effectiveness and safety, which helps researchers make informed decisions about whether to continue the trial or adjust the study design [1,2,3,4]. Interim analyses can be planned or unplanned, and they can be comparative or noncomparative. A significant challenge with comparative interim analyses is the increased risk of committing a Type I error from repeated testing. Type I error represents the probability of incorrectly rejecting the true null hypothesis, falsely indicating a treatment effect. The Type I error rate, known as alpha (α), is predetermined and typically set at 0.05 to control this risk. This setting ensures that the likelihood of mistakenly declaring a treatment effect as significant does not exceed 5%. Robust statistical techniques, predetermined stopping rules, and transparent decision-making rules and regulations should be applied to dominate this challenge. Group sequential models incorporate a strategy of predefined comparative interim analyses, including guidelines that may terminate a trial based on significant or insignificant results. A larger sample size is usually needed with a special trial design that includes stops for interim analyses. In addition, there are many ways to define the stopping boundaries regarding α choices, and many of the critical values adopted in clinical trials have been suggested [5,6,7].
Successful applications were detected using interim analyses. For instance, numerous researchers have utilized the critical values proposed by O’Brien and Fleming to determine the significance of their studies [8]. For example, Hammond et al. (2022) [9] have demonstrated that the combination of nirmatrelvir and ritonavir, when given to COVID-19 patients at the onset of the illness, significantly reduces the likelihood of the patients developing severe symptoms. This treatment has also been shown to rapidly decrease the SARS-CoV-2 virus in patients. These essential findings emphasize the importance of taking proactive measures and represent a positive step forward in the battle against COVID-19. Likewise, Goldberg et al. (2004) [10] also conducted a study on chemotherapy for metastatic colorectal cancer using three different treatments, using the critical values from O’Brien and Fleming. The research also established that while there were no disparities in the general survival of patients in the three treatments, two of the treatments, FOLFOX and FOLFIRINOX, had better response rates and longer PFS than the third treatment, FOLFIRI. In addition, Marcus et al. (2017) [11] employed O’Brien and Fleming’s methodology in their study and decided to conclude Gallium trials prematurely for treatment-naive follicular lymphoma patients. These studies indicate that the group sequential testing procedure developed by O’Brien and Fleming, when applied to various types of clinical trials, emphasizes the method’s value and usefulness in various circumstances. It facilitates more precise and timely decisions regarding experiments and evaluations that can save patients from receiving ineffective and redundant treatments. O’Brien and Fleming used an approximate distribution and resorted to simulation to demonstrate that a fixed one-stage chi-square test shares the same Type I error rate and power as their method, which is a strength point that encourages researchers to follow their steps and modify it. The O’Brien and Fleming group sequential testing procedure has undergone significant refinements to address practical limitations and expand its applicability. Kung-Jong Lui (1993, 1994) extended the procedure by developing methods to incorporate intraclass correlation. This allows the model to be used in cluster-randomized trials, often presenting unique challenges due to correlation within clusters. Lui’s work improved the flexibility and accuracy of the procedure in settings where clusters or groups of participants are allocated together [12,13]. Tang et al. (1989) [14] extended the O’Brien and Fleming method to accommodate multiple correlated endpoints. Their work demonstrated how critical boundaries could be adapted to account for correlations among endpoints, making the process more applicable to complex trial designs where multiple outcomes are evaluated simultaneously. Additionally, Weigl et al. (2020) [15] explored the use of the O’Brien and Fleming approach in the context of longitudinal studies, where data are collected repeatedly for each participant. The continued refinement of the O’Brien and Fleming procedure has expanded its utility across a variety of research domains. The ability to incorporate intraclass correlations, handle longitudinal data, and account for multiple endpoints has enhanced the procedures’ relevance and applicability in both clinical and behavioral research settings.
O’Brien and Pocock introduced an alpha-spending approach to generalize the O’Brien and Fleming procedure. This method allows for flexibility in the timing of interim analyses, enabling researchers to conduct analyses at non-predefined intervals while maintaining overall control of the Type I error rate. The alpha-spending function dynamically adjusts critical values based on the number and timing of interim analyses, ensuring statistical rigor and practicality in real-world scenarios.
Moreover, the O’Brien and Fleming procedure has been compared to other group sequential designs, such as the Pocock and Haybittle–Peto methods. These comparisons have shown that the O’Brien–Fleming approach is generally more conservative, with a lower risk of early termination and higher power at later trial stages [16]. In 2013, Hammouri addressed a crucial issue regarding the stopping bounds of the O’Brien and Fleming procedure. Specifically, Hammouri noted an inconsistency with the critical values of the O’Brien and Fleming multiple testing methods, particularly their non-monotonic behavior. This challenge was effectively resolved by conducting additional simulations to generate critical values with a monotonic characteristic. Adhering to this pattern enhances the management of Type I errors, reducing the likelihood of failure.
Additionally, Hammouri focused on further changes to the O’Brien and Fleming procedure by applying three distinct implementations, which helped make the method more flexible and adaptable. Since each implementation was performed individually, there were three different procedures. In this sense, optimal, Neyman, and weighted allocations (the weights decrease sequentially) are three implementations with extra advantages within clinical trial design [17]. Then, the original O’Brien and Fleming procedure was modified by combining it with the weighted samples and optimal random allocation simultaneously; the optimal allocation aims to maximize resource utilization efficiency by assigning a larger proportion of subjects to the superior treatment. On the other hand, the implementation of weighted allocation allows for applying different sample weights in various trial phases. This allocation procedure assumes that only certain phases or time points in clinical trials contribute to additional data gathering or provide more informative data. Hence, assigning higher weights to the samples collected during such critical phases enables investigators to plan optimal resource deployment and focus on obtaining well-organized data at the most sensitive phase of the trial [18]. The last integration used the Urn allocation method with the O’Brien and Fleming multiple testing procedure. This innovative approach offers a refined framework combining the dynamic allocation properties of the Urn method with the rigorous control of Type I error provided by the O’Brien and Fleming procedure [19].
A significant benefit of these implementations is that they allow trials to end early when one treatment demonstrates superiority. This capability not only boosts the ethical dimensions of clinical trials but also minimizes patient exposure to ineffective treatments while enhancing resource efficiency. Importantly, these new approaches maintain the statistical power of the original multiple testing procedure, ensuring robust and reliable results. The efficacy and robustness of all these methods have been demonstrated using experimental results and simulated case studies. These studies highlight its broad applicability across diverse clinical scenarios, making it a versatile tool for modern clinical research. The methods’ ability to adapt to varying trial conditions while preserving the integrity of statistical outcomes underscores its potential to become a standard in adaptive clinical trial designs. Previous work sets a strong foundation for future research and practical applications in optimizing treatment comparisons in clinical settings.
On the other hand, randomization techniques are critical in clinical trials to ensure unbiased treatment allocation and comparability between groups. Various methods serve unique purposes in enhancing the integrity of trial results. Simple randomization provides a straightforward approach where each participant has an equal chance of being assigned to any treatment group, minimizing selection bias. In contrast, block randomization helps maintain balance across treatment groups by dividing participants into blocks and randomizing within those blocks, ensuring that each group is evenly represented throughout the trial. Stratified randomization involves categorizing participants based on specific characteristics, such as age or gender, before randomization, which ensures that these characteristics are evenly distributed across treatment groups. Additionally, response-adaptive randomization adjusts treatment probabilities based on participant responses, allowing for more efficient resource allocation by favoring treatments with better outcomes [20]. The foundational work of researchers like Bai et al. (2002) [21] provides insights into the asymptotic properties of adaptive designs, highlighting their relevance in trials with delayed responses. Furthermore, Rosenberger and Lachin (2016) [22] explored the theoretical and practical applications of randomization in clinical trials after Rosenberger and Sverdlov (2008) [23] examined how to handle covariates effectively in trial design. Atkinson et al. (2023) [24] discussed innovative methodologies in randomization, emphasizing their importance in modern clinical trials.
Over the years, researchers have implemented various adaptation procedures to reduce bias in clinical trials, including random allocation or randomization. Random allocation is a strategy designed to unbiasedly assign participants to different treatment groups, which enhances statistical power and ensures that any observed differences in outcomes between treatments can be attributed to the specific interventions being evaluated rather than confounding factors. Random allocations can eliminate biases from various sources, including participant behavior, investigator preferences, and other external factors. All of these can alter results and undermine the validity of the treatment under analysis [25,26,27,28].
One example of random allocation is optimal allocation, which leverages success rates observed in earlier stages of interim analyses. This method dynamically adjusts allocation probabilities to maximize the expected number of successes or minimize failures, making it a commonly used approach in clinical trials [29]. Another notable method is the Neyman allocation, a widely recognized approach in sample allocation. This method considers group variances and sampling costs to achieve optimal precision in estimating population means within a fixed total sample size. By accounting for intra-group variation and associated costs, the Neyman allocation ensures efficient resource distribution and enhances the statistical accuracy of clinical trials [30]
Randomization based on Neyman allocation is a sophisticated method designed to enhance the efficiency and balance of treatment assignments in clinical trials. This approach aims to minimize the variance of treatment effect estimates by allocating participants proportionately to the expected treatment effects. Specifically, the Neyman allocation involves assigning more participants to treatments that are anticipated to be more effective, thereby optimizing the trial’s statistical power. One study discusses the theoretical foundations of Neyman allocation and its application in real-world scenarios, highlighting its advantages over more straightforward randomization methods. Another analysis provides a comprehensive look at Neyman allocation’s practical applications, emphasizing its role in improving the precision of treatment effect estimates while maintaining ethical considerations in participant allocation. Further research explores the implications of this method in adaptive trial designs, demonstrating how it can lead to more informed decision making as data accumulate. Together, these studies underscore the significance of Neyman allocation in advancing clinical trial methodologies [31,32,33].
The Neyman allocation method continues to demonstrate its value across various domains, from clinical trials to market surveys and political representation. For instance, Sverdlov and Rosenberger (2013) emphasize the importance of evaluating the performance of competing allocation rules under varying experimental scenarios—such as low, medium, and high treatment success probabilities—to optimize patient assignment in clinical trials [34]. Their work highlights how adaptive designs incorporating Neyman allocation can lead to more efficient and ethical trials. In the context of market research, Olayiwola et al. (2013) compared three different allocation procedures for estimating both the average and the variance of Peak Milk (Nigerian made) prices in local markets. Their findings identified Neyman (optimum) allocation as the most effective method, offering superior efficiency compared to the alternatives [35]. This reinforces the practical advantages of Neyman allocation in reducing variability within stratified sampling frameworks. Extending this principle beyond empirical studies, Wright (2012) demonstrates that Neyman’s optimal allocation is equivalent to the method of equal proportions used in apportioning seats in the U.S. House of Representatives—bridging classical sampling theory with political representation [36]. Taken together, these studies demonstrate that the Neyman allocation procedure is not only statistically optimal for reducing variance but also versatile and effective across diverse real-world applications.
Integrating ethical considerations in clinical trials is essential for ensuring participant protection and maintaining research integrity, especially when utilizing Neyman allocation. This allocation method, which optimizes statistical efficiency by assigning more participants to treatments expected to be more effective, must be implemented with a strong ethical framework. Antognini and Giovagnoli (2010) [33] underscore the importance of addressing the treatment allocation problem when employing Nyman allocation. They proposed two steps: first identifying an appropriate target allocation and then applying a sequential approach to implement it. It is crucial to continuously monitor any ethical concerns, such as disparities in treatment assignments, as noted by May and Flournoy (2009) [37]. Furthermore, equity in participant selection is crucial; researchers should stratify participants based on relevant characteristics to prevent bias, as highlighted by Duarte et al. (2024a) [38]. Ethics committees should be involved in designing and overseeing trials using Neyman allocation to assess its ethical implications, ensuring alignment with ethical guidelines. Ultimately, by embedding Neyman allocation within a comprehensive ethical framework, researchers can achieve a balance between statistical consistency and participant well-being and develop a responsible research environment that prioritizes both effective treatment evaluation and ethical integrity, as discussed by Duarte et al. (2024b) and Metelkina et al. (2017) [39,40].
Rosenberger (1993) [41] further supported the application of adaptive designs, illustrating their potential to enhance statistical power and ethical considerations by ensuring that more participants receive potentially beneficial treatments. Hu et al. (2006) [42] emphasized the advantages of adaptive designs in clinical trials, highlighting their ability to respond to accumulating data and make informed decisions about treatment efficacy and safety. The use of adaptive designs for Neyman allocation enhances the flexibility and efficiency of clinical trials by allowing modifications based on interim results. Duarte and Atkinson (2024b) [39] discussed how adaptive designs can optimize treatment assignments in personalized medicine, particularly when response variance is treatment dependent. This approach allows researchers to adjust the allocation of participants dynamically, improving the likelihood of identifying effective treatments. Together, these works underscore the importance of integrating adaptive designs with Neyman allocation to improve trial outcomes while maintaining ethical standards and participant welfare.
This paper presents the implementation of Neyman allocation with distinct weights assigned to sample sizes within the O’Brien and Fleming testing outline. A novel methodological approach is introduced, named the Neyman Weighted Multiple Testing Procedure (NWMP). This approach provides a robust framework for treatment comparison and enhances the efficiency of study design in detecting treatment differences. An evaluation of the Type I error rate and statistical power is conducted using Monte Carlo simulations. Monte Carlo simulations are used because they offer a flexible and competent way to investigate the Type I error and the power of statistical procedures. While a theoretical approach can offer more rigorous and generalizable insights, it often requires significant effort to be executed. In contrast, Monte Carlo simulations allow researchers to quickly assess the performance of various testing methods under different scenarios. The use of Monte Carlo simulations is a valuable complement to theoretical analysis, as they can provide insights that are more readily applicable to real-world situations. Furthermore, Monte Carlo simulations can be easily adapted to integrate a wide range of factors, such as different sample sizes, effect sizes, and distributions, enabling a more comprehensive evaluation of a procedure’s properties [25,43,44,45,46,47,48]. Moreover, this study includes practical examples to illustrate the effective application of the proposed procedure in realistic scenarios. The NWMP enhances the quality of statistical inference under practical constraints by carefully managing sample allocation and weighting during the trial. The NWMP preserves statistical power and Type I error rates similar to the well-known O’Brien and Fleming test, while adding greater flexibility to accommodate different stages of data collection and varying treatment effect scales. Flexibility stands out as the main benefit.

2. Methodologies

In the Methodologies section, the foundational methods for clinical trials are described: the O’Brien and Fleming procedure and the Neyman allocation method. The new approach, NWMP, is also presented, integrating diverse sample weights with the Neyman allocation technique to advance the precision of treatment effect evaluations while monitoring the overall Type I error rate. Finally, the methods used to evaluate Type I errors and statistical power were explored.

2.1. The Original O’Brien and Fleming Procedure

The data are reviewed and tested periodically with n 1 ,     n 2 subjects receiving treatment 1 and treatment 2, respectively, at each stage in the K stages ( K refers to the number of interim analyses planned during the trial). Here, the total sample size N equals K ( n 1 + n 2 ) . N represents the highest number of subjects utilized in the trial. The usual Pearson chi-square statistic χ ( i ) 2 is calculated for each stage i after α is determined, and the critical value of O’Brien and Fleming P   ( K ,   α ) is chosen.
For stage i , if i K   χ ( i ) 2 P   ( K ,   α ) , the study is terminated, and the null hypothesis is rejected. Otherwise, if i K   χ ( i ) 2 < P   ( K ,   α ) , the following subjects are randomized. Their measurements are taken, and the process is repeated for the subsequent stage. If after K stages, and K K   χ ( K ) 2 < P K ,   α , the conclusion is to terminate the study and conclude that the null hypothesis cannot be rejected at the significance level α.
Table 1 lists the critical values, as originally stated by O’Brien and Fleming, alongside the corrected values provided by Hammouri (2013) [17].
For more information on calculating and assessing the original and adjusted critical points, refer to [8,17].

2.2. Neyman Allocation Procedure

The Neyman allocation procedure is a statistical method aimed at effectively distributing a finite sample across various segments of a population. This approach minimizes the overall variance of population estimates while adhering to any constraints on the total allocation. The Neyman allocation is determined using unknown binomial parameters, which are estimated to identify it.
Let X 1 , …, X n be a binary response (with two values success = 1 and failure = 0). Here, n is the total sample size, and T 1 , …, T n are treatment assignment indicators, which have the value of one for treatment 1 and zero for treatment 2.
Then, N 1 , n =   1 n T i , N 2 , n   = n   N 1 , n , where N 1 , n and N 2 , n are the total number of patients assigned to treatments 1 and 2, respectively.
p ^ 1 , n = i = 1 n T i X i N 1 , n ,   p ^ 2 , n = i = 1 n 1 T i X i N 2 , n , q ^ 1 , n = 1     p ^ 1 , n and   q ^ 2 , n = 1     p ^ 2 , n .
Denoting F i = { X 1 , …, X i , T 1 , …, T i }, i = 1 ,   2 ,   ,   n , and conditional expectation as E i ( ۰ ) = E   ( ۰ | F i ) ,   i = 1 ,   2 ,   ,   n , the following allocation rule is obtained:
E i 1 T i = p ^ 1 , i 1 q ^ 1 , i 1 p ^ 1 , i 1 q ^ 1 , i 1 + p ^ 2 , i 1 q ^ 2 , i 1 ,
Therefore, this rule simply substitutes the unknown success and failure probability parameters p 1 , p 2 , q 1 , and q 2 in the Neyman allocation rule with the existing estimates of the proportion of successes and failures calculated using the trial sample p ^ 1 , n , p ^ 2 , n , q ^ 1 , n , and q ^ 2 , n .
When p 1 , p 2 , q 1 , q 2 ∈ (0,1),
N 1 , n n     p 1 q 1 p 1 q 1 + p 2 q 2   as   n ,
Here, the formula p 1 q 1 p 1 q 1 + p 2 q 2 is the Neyman allocation, which is used as a weight for the new subsamples [27].

2.3. Using Different Subsample Sizes

Different weights are intricately linked to the resource availability and the size of the sample pool. Adjustments to the weights are made based on these factors, ensuring that data collection is not only strategic but also resource-efficient. When higher weights are employed earlier in the trial, significant benefits can be realized, particularly in facilitating early stopping decisions. By prioritizing more informative data points through increased weighting, the robustness of early analyses is enhanced, allowing trends and treatment effects to be detected more quickly. Consequently, if efficacy or safety concerns meet predefined criteria, the trial can be stopped early. This ensures that decisions to halt the study are made based on compelling evidence, thereby potentially reducing exposure to ineffective or harmful treatments. Ultimately, the strategic application of weights early in the process supports more efficient and ethically responsible study conduct, allowing each data point to contribute optimally to the overall analysis and enhancing the reliability of conclusions under varying resource conditions [42].

2.4. The Proposed Method: Neyman Weighted Multiple Testing Procedure (NWMP)

2.4.1. The New Methodology

Our framework combines the O’Brien and Fleming method with Neyman allocation and weighted subsamples, allowing researchers to track and evaluate data gathered at each interim analysis stage. It is applicable in clinical trials where both the input and outcome variables are binary. This configuration, which assesses proportions between two groups based on binary outcomes, works especially well with this procedure. Since NWMP aims to manage the Type I error rate while testing various hypothesis endpoints, it is suited for chi-square-based test statistics. Examples of binary outcomes include symptom resolution, the occurrence of adverse events, and treatment success; any of these can be analyzed within a binary treatment allocation (e.g., drug versus placebo). This makes NWMP particularly useful in scenarios involving multiple binary endpoints assessed with chi-square tests.
The O’Brien and Fleming procedure is designed to compare two treatments when the treatment response is binary and occurs immediately. Our methodology follows the O’Brien–Fleming framework. This method allows for adjusting subsample sizes within each stage using Neyman allocation and allows different variable weights to be applied to subsamples between stages. Informed decisions are made based on predefined stopping rules and efficacy boundaries, resulting in a comprehensive and sophisticated methodology that enhances flexibility while maintaining the reliability of studies. By leveraging the strengths of these combined procedures, our procedure leads to more meaningful and accurate results, ultimately providing a valuable contribution to the field.
The new procedure involves periodically reviewing and testing collected data in K stages, K = 1 , , 5 .
  • Choose K, α, { w 1 ,   ,   w K } , P (K, α) and N. Here, K is the number of the total stages, { w 1 ,   ,   w K } are the predefined weights for the subsamples, and N is the total sample size (should be even).
  • The subsample sizes between the stages will be determined using the predefined weights { w 1 ,   ,   w K } . Here, 0 < w k ≤ 1, k = 1, …, K and k = 1 K w k = 1.
  • Then, { w 1 , …, w K } is used to find each stage sample size. Calculate n k = round ( w k N ) if n k is even. Otherwise, n k = round ( w k N ) + 1, for k = 1, …, K − 1 and n K = N   k = 1 K 1 n k ( n k should be even to work with equal allocation).
  • Split the sample size, n i , into two portions, n i 1 and n i 2 , which will be assigned to treatment 1 and treatment 2, respectively.
    • If i = 1 , the sample size for the first stage n 1 and equal allocation are used to compute the subsamples, where n 11 = n 12   = n 1 2 .
    • If i = 2 , , K , the subsamples are given by n i 1 = r o u n d y i n i with y i = p i 1.1 q i 1.1 p i 1.1 q i 1.1 + p i 1,2 q i 1,2 , where P i 1,1 and P i 1,2 are success rates from the cumulative previous stages for treatment 1 and treatment 2, respectively, and n i 2   = ( n i n i 1 ) . In cases where n i 1   o r   n i 2 equals zero, equal allocation will be used.
  • Subjects are randomized starting from the initial stage, and their measurements are recorded. When no rejection occurs, a subsample will be appended to the prior subsamples for every treatment.
  • i K χ i 2 is calculated and compared to P ( K , α ) :
    • If i K χ i 2 P ( K , α ) , then the study ends, and the null hypothesis is rejected.
    • If i K χ i 2 < P ( K , α ) and i < K , the procedure precedes the next stage and goes back to step 4.
    • If i K χ i 2 < P K , α and i = K , the study is terminated, and the null hypothesis fails to be rejected.
The algorithm for the new procedure is graphically illustrated in the following flow chart in Figure 1.

2.4.2. Participant Preferences and Informed Consent for the NWMP

When implementing the NWMP, clinical trial participants need to understand the randomization process. Although the technical details of Neyman allocation and the O’Brien and Fleming procedure might not be extensively explained in simple terms, participants must recognize that they will be randomly assigned to various treatment groups for both ethical and practical reasons. To aid their comprehension, the procedure should be explained in a simpler manner, allowing participants to grasp its importance in preserving the integrity and validity of the trial results. This strategy not only adheres to ethical guidelines but also fosters participant trust and cooperation, which are essential for the successful conduct of a clinical trial.

2.5. Type I Error and Power Testing Methods

This section is designed to demonstrate the accuracy of the procedure by examining Type I error and power. Monte Carlo simulations are employed to assess the effectiveness of a procedure in scenarios where theoretical analysis presents challenges. The Monte Carlo simulation method facilitates the examination of statistical power and the Type I error rate associated with the NWMP, followed by a comparison to the original procedure. Monte Carlo simulations illustrate how effectively the NWMP can detect significant effects while maintaining an equivalent Type I error rate and power, thereby upholding statistical standards.

2.5.1. Type I Error Testing Method

SAS (Statistical Analysis System, version 9.4, SAS Institute Inc., Cary, NC, USA) code was employed to run simulations to calculate Type I errors for the NWMP. The simulations address a range of success probabilities (P = 0.1, 0.2, 0.3, 0.4, and 0.5), with α values of 0.01 and 0.05, for all critical values corresponding to the selected α, and for values from 1 to 5 for K.
Running these simulations can evaluate the NWMP’s performance in numerous scenarios. The differing success probabilities facilitate an examination of its robustness and adaptability to diverse conditions, where various test sizes and critical values will be considered.
To evaluate how effectively the new procedure retains the null hypothesis when suitable, subsamples will be simulated from a single binomial distribution with equal success rates, ensuring that the null hypothesis holds. The process will be conducted 500,000 times, and the percentage where the null hypothesis is rejected will be computed to assess the Type I error rate.

2.5.2. Power Testing Method

First, the sample sizes required to achieve a power value of 0.8 for specific scenarios involving the conventional chi-square test are determined for comparison purposes. This analysis compares the estimated power values for each sample of the NWMP to 0.08.
The success probability p 1 is set at 0.1, while the success probabilities p 2 take one of the following values: 0.15, 0.2, 0.25, and 0.3. The program will run 500,000 iterations with significance levels α = 0.01 and 0.05.
The analysis includes the corrected O’Brien and Fleming critical values, denoted as P ( K , α ). After applying the conventional chi-square test power calculations, sample sizes were determined for each combination of p 1 = 0.1 and p 2 = 0.15 , 0.2 , 0.25 ,   o r   0.3 . to achieve the desired power value of 0.8. For α = 0.05 , the sample sizes were 1366, 396, 200, and 120, respectively. For α = 0.01 , the sizes were 2032, 588, 292, and 182, respectively. To ensure a significant difference, the two subsamples should be generated for each case of K = 1 ,   ,   5 from distinct binomial distributions with different means (resulting from using different success probabilities). The NWMP procedure is conducted to test whether the null hypothesis ( H 0 ) indicating that no difference between the two groups could be rejected against the alternative hypothesis ( H a ) . The entire process will be repeated 500,000 times to calculate the proportion of rejections of H 0 , representing the power rate, given that H a was guaranteed to be true.

2.5.3. Type I Error and Power Estimation Algorithm

Monte Carlo simulations are conducted to evaluate the proposed method’s Type I error rate and power.
  • These simulations assess the probability of incorrectly rejecting a true null hypothesis across different experimental settings or the probability of correctly rejecting a false null hypothesis.
    (a)
    The Type I error analysis considers probabilities p 1 and p 2 = {0.1, 0.2, 0.3, 0.4, 0.5}.
    (b)
    The power analysis considers an initial probability p 1 = 0.1 and p 2 = {0.15, 0.2, 0.25, 0.3}.
  • The simulations are performed for a range of success probabilities ( p 1 and p 2 ): equal for Type I error and unequal for power.
  • Then, random samples are generated from binomial distributions with the specified success probability p 1 and p 2 and sample size N.
  • Each simulation is repeated 500,000 times to ensure robust statistical inference. Application of the New NWMP is performed for all values of K.
  • The test statistics are computed according to the proposed NWMP methodology.
  • The computed statistics are compared to the predefined critical values P(K, α).
    Decision Rule:
    • If the test statistic exceeds the critical value in any stage, H 0 is rejected (false positive for Type I error and true positive for power).
    • Otherwise, H 0 fails to be rejected.
  • Computation of Type I Error Rate or Power: The proportion of rejected H 0 cases across all iterations is calculated.
    • Type I Error Rate = (Number of Rejections of H 0 / H 0 is true)/(Total Simulations).
    • Power = (Number of Rejections of H 0 / H a is true)/(Total Simulations)

3. Results

This section presents the results of the analysis of Type I error and power levels for the proposed procedure. It was found that the Type I error rate was acceptable, indicating that the null hypothesis was not rejected when it was true. Additionally, a higher power level of the test was observed, suggesting that the alternative hypothesis was correctly accepted when it was true. These results support the validity of the proposed procedure, instilling confidence in the accuracy of the findings.

3.1. Type I Error Results

After conducting an extensive analysis of various sample sizes, it was observed that the results remained consistent irrespective of the sample size. Furthermore, when comparing these results to the conventional chi-square test, it was found that the modified procedure reduced Type I error values. This reduction became more pronounced as the value of K increased.
The rationale behind this trend lies in the fact that as K increases, the chi-square statistic is getting larger. Raising K for interim analyses helps prevent premature conclusions about a treatment’s effectiveness, maintaining scientific integrity. This approach not only upholds the accuracy of the trial results but also enhances ethical standards by responding dynamically to new data. On the other hand, adaptive allocations promote a more efficient trial process by potentially reducing the number of participants needed to reach conclusive results. By focusing on treatments that show promise and adjusting away from those that do not, trials can achieve their objectives faster and with fewer resources. This accelerates the development process of new medical interventions and reduces participants’ exposure to less effective treatments.
Adjusting the critical value, K, and employing adaptive allocations ensures that clinical trials are flexible and precise.
Table 2 summarizes Type I error values across two distinct sample sizes and significance levels. Specifically, it shows results for a sample size of 360 at α = 0.05 and for a size of 300 at α = 0.01. In the case of the sample size of 360 with a significance level of 0.05, the Type I error was observed to range from 0.0487 to 0.0502. As the value of K increased, there was a consistent decrease in Type I error. For the sample size 300 at α = 0.01, the Type I error values ranged from 0.0421 to 0.0428, lower than the significance level of 0.05. Compared to the standard chi-square procedure, this enhanced control of errors suggests that the proposed method is effective.
When using a significance level of α = 0.01 and a sample size of 300, the Type I error values maintain a monotonic behavior among the stages, with values ranging between 0.0090 and 0.099 in the second stage. As the value of K increased, a decrease in the Type I error values was observed, ranging between 0.0081 and 0.0091 in the last stage. Importantly, these values remain below 0.0105, which is considered satisfactory.
The Type I error also showed similar patterns when considering the same significance level α and varying sample sizes of 90 and 720. For the sample size of 90, the Type I error ranges from 0.0374 to 0.0511. For the sample size 720, the Type I error ranges from 0.0428 to 0.0494.
Furthermore, the Type I error values were calculated for sample sizes of 90 and 720, using a significance level of α = 0.01 . The results indicated that the new procedure performs appropriately regarding Type I errors. Specifically, for a sample size of 90, all Type I error values fall within the range of 0.0061 to 0.0104. Similarly, for a sample size of 720, the Type I error values range from 0.0086 to 0.0103. The remaining Type I errors are shown in Table 3, and Figure 2.

3.2. Impact of Random Seed Selection on Type I Error Rates

ANOVA analysis was conducted to assess the impact of random seed selection on Type I error rates in Monte Carlo simulations. Each simulation was treated as a sample, with the outcome of each iteration coded as 1 or 0. Five simulations were run under identical parameter settings, with only the random seed varying. The results of the ANOVA indicate that no significant difference was found in the mean Type I error rates across the five simulations. The calculated mean Type I error rates were very close, ranging from 0.0498 to 0.0510. Based on these findings, it can be concluded that the choice of random seed does not materially affect the estimation of Type I error in this Monte Carlo simulation context.

3.3. Power Analysis and Performance Trends

Statistical power is crucial for evaluating a testing procedure’s effectiveness. Power refers to the likelihood of correctly rejecting a false null hypothesis, thereby significantly decreasing the risk of Type II errors. This section presents power results from Monte Carlo simulations conducted under several scenarios. The fluctuations in power values associated with the NWMP were analyzed in depth. The influence of modifications to the O’Brien and Fleming procedure on its power performance was examined. The findings revealed significant trends in power estimates, offering essential insights into the robustness of the suggested approach. After testing several scenarios, it was observed that the power values with K = 1 ranged between 0.80195 and 0.80717. Additionally, these values exhibited a decreasing trend as K increased. For α = 0.05 at K = 5, the power values ranged from 0.7723 to 0.7819, with a margin of error less than 0.0277 compared to the target power value of 0.8.
Next, power values for α = 0.01 and a K value of 1 range from 0.80109 to 0.8038. Further analysis confirmed that power values decline as K values rise. When K equals 5, the power values range from 0.7793 to 0.7873. The other power values are shown in Table 4 and illustrated in Figure 3a,b.

3.4. Rejection Rates: Calculations for Each Stage

This section aimed to identify the stage at which rejection occurs by calculating rejection rates and determining the sample size needed to conclude that the null hypothesis was rejected. The objective was to assess whether the proposed methodology requires a smaller sample size to achieve statistical significance, which is the desired outcome.

3.4.1. Calculating Rejection Rates for Each Stage When the Difference Is Present

With a standard power of 0.8 and reflecting different probabilities of success (0.1 and 0.15) with an α = 0.01 , a total sample size of 2032 was determined. Using 500,000 iterations at each value of K, the results are provided in Table 5 and exemplified in Figure 4.
For K = 2, 43% of rejections of H 0 (acceptance of H a ) were observed in the second stage, amounting to 170,396 iterations. In this scenario, most rejections (57%) took place in the second stage, suggesting that the entire sample size was necessary to reject the null hypothesis in 43% of instances.
For K = 3, 236,537 iterations were necessary to reject the null hypothesis in the second stage, resulting in a rejection rate of 59%, the highest among the three stages. For K = 4, the second stage had the highest rejection rate at 41%, requiring 162,778 iterations to reject the null hypothesis. Finally, for K = 5, the third stage had the highest rejection rate of 38%, with 150,267 iterations.
Based on the findings, the proposed procedure effectively minimized the necessary sample size for statistical significance, significantly decreasing expenses and effort. This outcome indicated the efficiency and practicality of the suggested procedure.

3.4.2. Calculating Rejection Rates for Each Stage When the Difference Is Not Presented

Rejections were computed using α = 0.01, based on a total sample size of 360. From 500,000 iterations, the following table and figure illustrate the necessary sample size for reaching rejection, along with the number of rejections at stage K.
Additionally, it is important to recognize that these percentages are derived from the 5% rejection rate. The decision rules applied in this multiple testing procedure closely resemble those of the standard chi-square one-stage procedure, provided there is no early termination when H 0 is true. We notice that most of the rejections happened in the last stage. Results are presented in Table 6 and Figure 5.

4. Examples

4.1. Example 1: Computational Example

We simulated the data for a trial on 600 subjects divided into two groups, each with five stages. Both groups had a high success rate of 0.3 and 0.5 during the initial phase. We analyzed the data and presented data for K = 5 as part of our findings. For K = 5, the NWMP was used with the following subsamples weights w 1 = 35 % , w 2 = 25 % , w 3 = 20 % , w 4 = 15 % ,   a n d   w 5 = 5 % . Then, subsample sizes equivalent to n 1 = 210 , n 2 = 150 , n 3 = 120 , n 4 = 90 , and n 5 = 30 were obtained.
In the first stage, where n 1 equals 210, the subsamples were evenly distributed, resulting in n 11 and n 12 both being 105. The chi-square statistic calculated was 7.704. When this value is multiplied by one-fifth, it equals 1.54, which does not exceed the critical value, leading us to fail to reject the null hypothesis.
For the second stage, with n 2 = 150 , and the cumulative subsamples were distributed by using Neyman allocation as follows:
Since p 1,1 = 37 105 = 0.35 and p 1,2 = 57 105 = 0.54 , we obtain y 2 = 0.35 0.65 0.35 0.65 + 0.54 0.46 = 0.489 , with n 21 = n 11 + y 2 n 2 = 105 + 73 = 178 , and n 22 = 182 . The success was counted, and the chi-square statistic was calculated.
The chi-square statistic stood at 9.04. After multiplying this value by two-fifths, we obtained 3.616, which does not exceed the critical value. Therefore, we did not reject the null hypothesis once more.
For the third stage, with n 3 = 120 , the cumulative subsamples are distributed using Neyman allocation, which resulted in n 21 = 237 and n 22 = 243 . Then, the chi-square statistic was 7.3. After multiplying the chi-square value by third-fifths, the value equals 4.382, which is greater than the critical value. Thus, we reject the H 0 .
In total, for K = 5, using only three stages with 480 participants out of 600 patients needed to terminate the experiment, there was a significant difference between the two treatments. See Table 7 for further details.

4.2. Example 2: Real-Life Example

We collected data for illustrative purposes on the smoking habits of 300 individuals and how their parents’ smoking habits affected them. Participants were chosen based on their smoking status, including 150 smokers and 150 non-smokers, and their parents’ smoking status was recorded.
The question is whether parents’ smoking habits affect their children’s smoking behavior. The order in which participants responded was considered essential and used to organize the data for analysis. This sequence acted as a ‘bank’, allowing the data to be arranged in various ways for different values of K and Nyman allocation based on the estimates of successes.
Concurrently, the NWMP was applied to analyze these sequences. This statistical method may include adjustments or modifications to cater to different subsets of the data based on the predefined value of K.
After assigning data for each value from 1 to 5, the NWMP was applied. The necessary sample size was also noted when the hypothesis was rejected. To demonstrate the procedure, we included details of case four of the trial below:
For K = 4, the trial was concluded after two stages with 195 participants out of 300.
In detail, for the first stage, with K = 4 ,   n 1 = 106 . The subsamples were split equally, where n 11 = n 12 = 53 . Then, the chi-square statistic was calculated to be equal to 0.427. Multiplying the chi-square by one-fourth, equals 0.1068, which is not greater than the critical value, so H 0 is not rejected.
For the second stage, n 21 and n 22 required recalculating using the Neyman allocation. The cumulative subsamples are equal to n 21 = 100 and n 22   = 95 . Moreover, the chi-square statistic, after being multiplied by two-fourths, is 10.82, which is greater than the critical value. Thus, H 0 is rejected.
A total of only 195 students were included in the analysis to identify a significant difference between the two treatments. For K = 5, the analysis also reached significance with just 165 students out of 300. See Table 8 for additional information.

4.3. Example 3: Real-Life Example

The example assesses two binary outcomes: whether the participant is diagnosed with cancer (yes/no) and whether the participant is a smoker (yes/no). Participants are grouped by a cancer diagnosis, and then the smoking status is evaluated. The analysis was based on publicly available lung cancer data from Data World [49].
I.
For stage 1 (i = 1), with a weight of w1 = 0.40, the sample size is n1 = 56, distributed evenly:
  • n1,1 = 28 participants with cancer (23 smokers and 5 non-smokers)
  • n1,2 = 28 participants without cancer (15 smokers and 13 non-smokers)
Next, one-third of the chi-square statistic is calculated as follows: 5.24/3 ≈ 1.747, less than 4.0191. Since this statistic does not reach the critical value, we proceed to Stage 2.
II.
In stage 2 (i = 2), with a weight of w2 = 0.35, the sample size is n2 = 50. The allocation of the subsample is determined adaptively:
  • Given p1,1 = 0.4643, q1,1 = 0.5357, and y2 = 0.5657.
  • Then, n2,1 = round(0.5657 × 50) = 28, and n2,2 = 22.
  • n2,1 = 28 participants with cancer (9 smokers, 19 non-smokers)
  • n2,2 = 22 participants without cancer (0 smokers, 22 non-smokers)
Cumulatively (after combining with Stage 1), two-thirds of the chi-square statistics equals approximately 7.953 (calculated as 11.93 × 2/3), which is greater than 4.0191.
Since the adjusted statistic exceeds the boundary, the null hypothesis is rejected at Stage 2, and we conclude that there is a statistically significant association between smoking and lung cancer.

5. Discussion

Specific adaptive designs in clinical trials are structured to boost efficiency by incorporating sequential processes that focus on specific critical endpoints. This approach allows for multiple testing, enabling a comprehensive assessment of the treatment under review.
This study introduces an innovative adaptation of the O’Brien and Fleming procedure, first proposed in 1979, by incorporating Neyman allocation and unequal weighted allocation methods into its framework. This enhancement is designed to optimize the procedure’s efficiency and effectiveness across multiple stages of clinical trials.
Neyman’s allocation method, a key component of the revised procedure, strategically assigns subjects in a manner that maximizes the likelihood of detecting effective treatments early. This approach is particularly beneficial as it provides more patients with potentially superior treatment. Unlike traditional methods that apply uniform subsample weights across all phases, this innovative procedure utilizes unequal weighted allocations, enabling a more responsive and effective trial design.
The introduction of weighted allocations complements Neyman’s strategy by ensuring that more subjects receive what preliminary results suggest might be the better treatment option. This methodological enhancement not only streamlines the process but also improves the overall ethical and scientific quality of clinical trials. By focusing on these advanced allocation strategies, the study positions the adapted O’Brien and Fleming procedure as a powerful tool for contemporary clinical research, aiming to deliver timely and reliable healthcare solutions.
The modified procedure was evaluated under several scenarios with varying sample sizes. The findings showed that the innovative approach, combining Neyman’s allocation with weighted allocation, significantly reduced Type I error while ensuring adequate Power. Lowering Type I error is vital since it minimizes false positive results, thereby improving the reliability of the outcomes. Additionally, preserving power is crucial to guarantee that the study remains sensitive enough to identify genuine treatment effects.
For Type I error, a simulation study was conducted considering various α values and sample sizes. For example, with an α value of 0.05 and a sample size of 360, the Type I error values ranged between 0.0487 and 0.0502 for K = 1 but decreased monotonically as K increased, reaching 0.0421 to 0.0428 when K = 5. These values were below the 0.05 threshold, indicating acceptable errors that were better than those obtained for the original procedures.
Similarly, when considering an α level of 0.01, the Type I error values demonstrated a monotonic trend, ranging from 0.0091 to 0.0099 for K = 1, and decreasing further as K increased, falling between 0.0081 and 0.0088. All these values remained below the specified α level of 0.01, indicating successful control of Type I error.
The remaining Type I error values demonstrated improved performance compared to the usual chi-square test, with the observed Type I error values falling within acceptable limits.
Conversely, although there was a minor decline in values, the new implementation effectively preserved the acceptance rate of power. By dividing the chi-square values by the stages in the interim analysis, the values diminished, complicating the rejection of the null hypothesis.
However, these values remained acceptable, as the power values for an α level of 0.05 ranged between 0.8020 and 0.8072 for K = 1 and between 0.7723 and 0.7819 for K = 5. The marginal error between these values was less than 0.0277 for an α level of 0.05.
For an α level of 0.01, Power values ranged from 0.8011 to 0.8038 for K = 1, while they varied between 0.7793 and 0.7873 for K = 5. The maximum difference of less than 0.0124 indicates sustained adequate Power levels.
The results show that, for both α levels, despite slight reductions, the proposed implementation successfully maintains satisfactory power values. The differences observed between these values are within reasonable margins, highlighting the robustness of the modified procedures.
These characteristics make the new procedure a promising tool for clinical trials and other medical research contexts where efficient and reliable analysis is important [14,15,43].
In conclusion, the findings of this study indicate that the NWMP constitutes a more adaptable procedure compared to the multiple tests utilized by O’Brien and Fleming, as well as the single sample method. The NWMP provides enhanced control over Type I error rates while preserving reasonable levels of statistical power.
A comparative analysis of three multiple testing procedures—the Neyman Weighted Multiple Testing Procedure (NWMP), the Optimal Weighted Multiple Testing Procedure (OWMP), and the Adaptive Multiple Testing Procedure with Urn Allocation (UMP)—reveals key differences in their ability to control Type I error rates and maintain strong statistical power under various significance thresholds. Each method applies a unique allocation mechanism tailored to specific trial dynamics, with the shared objective of enhancing efficiency in statistical decision making.
The UMP exhibits Type I error rates ranging from 0.0428 to 0.0545 at α = 0.05 and from 0.0082 to 0.0113 at α = 0.01, with corresponding power values between 0.7807 and 0.8158 at α = 0.05 and 0.7820 to 0.8131 at α = 0.01. While UMP generally maintains acceptable error control, its slightly elevated upper bounds at α = 0.05 indicate a modest risk of Type I error inflation in some scenarios.
The OWMP demonstrates tighter Type I error control—ranging from 0.0415 to 0.0507 at α = 0.05 and 0.0084 to 0.0104 at α = 0.01—alongside robust statistical power ranging from 0.7726 to 0.8164 at α = 0.05 and 0.7841 to 0.8113 at α = 0.01. This suggests that OWMP delivers a favorable balance between precision in error control and sensitivity in detecting true effects.
The NWMP also performs competitively, with Type I error rates between 0.0421 and 0.0539 at α = 0.05 and 0.0081 to 0.0104 at α = 0.01 and power levels ranging from 0.7723 to 0.8072 at α = 0.05 and 0.7793 to 0.8011 at α = 0.01. Although NWMP provides strong overall performance, its Type I error control is marginally less strict than OWMP’s, particularly at the 5% level.
Overall, all three procedures demonstrate effectiveness in maintaining statistical validity, but none consistently dominates across all criteria. The OWMP procedure offers a slightly superior balance between stringent Type I error control and high statistical power. To further clarify these distinctions, future research should implement the same simulation frameworks that vary sample size, treatment effect variability, and interim analysis frequency to enable direct and equitable comparisons of each method’s performance under controlled conditions [18,19].
Furthermore, we plan to extend these comparisons to include multi-arm trials, reflecting the significant contributions of Dr. Lui’s 1993 [12] expansion of the O’Brien and Fleming group sequential test to multiple treatment groups. This expansion will involve integrating Neyman allocation and weighted sample size techniques, enhancing methodological robustness, and tailoring approaches to the specific challenges of multi-arm clinical trials. Additionally, the comparison will be expanded to include modern adaptive designs that employ Bayesian methods, which are becoming increasingly prevalent due to their flexibility in incorporating prior knowledge and updating trial parameters in real-time based on accumulating data. This comprehensive evaluation will not only clarify the strengths and weaknesses of each procedure but also refine statistical methods in clinical trials, ultimately leading to significant advancements in medical research. Such efforts are aligned with our overarching goal to improve the statistical framework of clinical trials, ensuring that they are optimally designed to meet both scientific and ethical standards.

Author Contributions

Conceptualization, H.H.; methodology, H.H. and M.S.; software, H.H., M.S. and R.A.M.; validation, H.H., M.S. and R.A.M.; formal analysis, H.H. and M.S.; investigation, H.H. and M.S. writing—original draft preparation, H.H., M.S. and R.A.M.; writing—review and editing, H.H., M.S., M.A. and R.A.M.; supervision, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funds from the Deanship of Scientific Research at Jordan University of Science and Technology, research grant no. 20240478.

Data Availability Statement

The SAS codes used for the analyses in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hamasaki, T.; Evans, S.R.; Asakura, K. Design, Data Monitoring, and Analysis of Clinical Trials with Co-Primary Endpoints: A Review. J. Biopharm. Stat. 2018, 28, 28–51. [Google Scholar] [CrossRef] [PubMed]
  2. Guidance, D. Adaptive Designs for Clinical Trials of Drugs and Biologics; Center for Biologics Evaluation and Research (CBER): Silver Spring, MD, USA, 2018.
  3. Chow, S.C.; Liu, J.P. Design and Analysis of Clinical Trials: Concepts and Methodologies; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  4. O’Brien, P.C. Data and Safety Monitoring. In Encyclopedia of Biostatistics; Armitage, P., Colton, T., Eds.; NIH: Bethesda, MD, USA, 2005; Volume 2, pp. 1362–1371. [Google Scholar]
  5. Jennison, C.; Turnbull, B.W. Group Sequential Methods with Applications to Clinical Trials; CRC Press: Boca Raton, FL, USA, 1999. [Google Scholar]
  6. Mazumdar, M.; Liu, A. Group Sequential Design for Comparative Diagnostic Accuracy Studies. Stat. Med. 2003, 22, 727–739. [Google Scholar] [CrossRef] [PubMed]
  7. Thompson, W.R. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika 1933, 25, 285–294. [Google Scholar] [CrossRef]
  8. O’Brien, P.C.; Fleming, T.R. A Multiple Testing Procedure for Clinical Trials. Biometrics 1979, 35, 549–556. [Google Scholar] [CrossRef]
  9. Hammond, J.; Leister-Tebbe, H.; Gardner, A.; Abreu, P.; Bao, W.; Wisemandle, W.; Baniecki, M.; Hendrick, V.M.; Damle, B.; Simón-Campos, A.; et al. Oral Nirmatrelvir for High-Risk, Nonhospitalized Adults with COVID-19. N. Engl. J. Med. 2022, 386, 1397–1408. [Google Scholar] [CrossRef] [PubMed]
  10. Goldberg, R.M.; Sargent, D.J.; Morton, R.F.; Fuchs, C.S.; Ramanathan, R.K.; Williamson, S.K.; Findlay, B.P.; Pitot, H.C.; Alberts, S.R. A Randomized Controlled Trial of Fluorouracil Plus Leucovorin, Irinotecan, and Oxaliplatin Combinations in Patients with Previously Untreated Metastatic Colorectal Cancer. J. Clin. Oncol. 2004, 22, 23–30. [Google Scholar] [CrossRef] [PubMed]
  11. Marcus, R.; Davies, A.; Ando, K.; Klapper, W.; Opat, S.; Owen, C.; Phillips, E.; Sangha, R.; Schlag, R.; Seymour, J.F.; et al. Obinutuzumab for the First-Line Treatment of Follicular Lymphoma. N. Engl. J. Med. 2017, 377, 1331–1344. [Google Scholar] [CrossRef]
  12. Lui, K. A Simple Generalization of the O’Brien and Fleming Group Sequential Test Procedure to More Than Two Treatment Groups. Biometrics 1993, 49, 1216. [Google Scholar] [CrossRef]
  13. Lui, K. The Performance of the O’Brien-Fleming Multiple Testing Procedure in the Presence of Intraclass Correlation. Biometrics 1994, 50, 232. [Google Scholar] [CrossRef]
  14. Tang, D.; Gnecco, C.; Geller, N.L. Design of Group Sequential Clinical Trials with Multiple Endpoints. J. Am. Stat. Assoc. 1989, 84, 775. [Google Scholar] [CrossRef]
  15. Weigl, K.; Ponocny, I. Group Sequential Designs Applied in Psychological Research. Methodology 2020, 16, 75. [Google Scholar] [CrossRef]
  16. Frequentist Methods: O’Brien-Fleming, Pocock, Haybittle-Peto. 2024. Available online: https://online.stat.psu.edu/stat509/lesson/9/9.5 (accessed on 29 November 2024).
  17. Hammouri, H. Review and Implementation for the O’Brien Fleming Multiple Testing Procedure; Virginia Commonwealth University: Richmond, VA, USA, 2013. [Google Scholar]
  18. Hammouri, H.; Alquran, M.; Abdel Muhsen, R.; Altahat, J. Optimal Weighted Multiple-Testing Procedure for Clinical Trials. Mathematics 2022, 10, 1996. [Google Scholar] [CrossRef]
  19. Hammouri, H.; Ali, M.; Alquran, M.; Alquran, A.; Abdel Muhsen, R.; Alomari, B. Adaptive Multiple Testing Procedure for Clinical Trials with Urn Allocation. Mathematics 2023, 11, 3965. [Google Scholar] [CrossRef]
  20. Robertson, D.S.; Lee, K.M.; López-Kolkovska, B.C.; Villar, S.S. Response-Adaptive Randomization in Clinical Trials: From Myths to Practical Considerations. arXiv 2020, arXiv:2005.00564. [Google Scholar] [CrossRef]
  21. Bai, Z.D.; Hu, F.; Rosenberger, W.F. Asymptotic Properties of Adaptive Designs for Clinical Trials with Delayed Response. Ann. Stat. 2002, 30, 122–139. [Google Scholar] [CrossRef]
  22. Rosenberger, W.F.; Lachin, J.L. Randomization in Clinical Trials: Theory and Practice, 2nd ed.; Wiley: New York, NY, USA, 2016. [Google Scholar]
  23. Rosenberger, W.F.; Sverdlov, O. Handling Covariates in the Design of Clinical Trials. Stat. Sci. 2008, 23, 404–419. [Google Scholar] [CrossRef]
  24. Atkinson, A.C.; Duarte, B.P.M.; Pedrosa, D.J.; van Munster, M. Randomizing a Clinical Trial in Neurodegenerative Disease. Contemp. Clin. Trials Commun. 2023, 33, 101140. [Google Scholar] [CrossRef]
  25. Sverdlov, O.; Rosenberger, W.F. Randomization in Clinical Trials: Can We Eliminate Bias? Clin. Investig. 2013, 3, 37–47. [Google Scholar] [CrossRef]
  26. Ioannidis, J.P. Why most clinical research is not useful. PLoS Med. 2016, 13, e1002049. [Google Scholar] [CrossRef]
  27. Schulz, K.F.; Grimes, D.A. Sample Size Slippages in Randomized Trials: Exclusions and the Lost and Wayward. Lancet 2002, 359, 781–785. [Google Scholar] [CrossRef]
  28. Novosel, L.M. Understanding the Evidence: Quantitative Research Designs. Urol. Nurs. 2022, 42, 303. [Google Scholar] [CrossRef]
  29. Rosenberger, W.F.; Stallard, N.; Ivanova, A.; Harper, C.N.; Ricks, M.L. Optimal adaptive designs for binary response trials. Biometrics 2001, 57, 909–913. [Google Scholar] [CrossRef]
  30. Atkinson, A.C. Optimum Designs for Two Treatments with Unequal Variances in the Presence of Covariates. Biometrika 2015, 102, 494–499. [Google Scholar] [CrossRef]
  31. Wang, Y.; Ai, M. Optimal Designs for Multiple Treatments with Unequal Variances. J. Stat. Plan. Inference 2016, 171, 175–183. [Google Scholar] [CrossRef]
  32. Sverdlov, O.; Ryeznik, Y.; Wong, W.K. On Optimal Designs for Clinical Trials: An Updated Review. J. Stat. Theory Pract. 2020, 14, 10. [Google Scholar] [CrossRef]
  33. Antognini, A.B.; Giovagnoli, A. Compound Optimal Allocation for Individual and Collective Ethics in Binary Clinical Trials. Biometrika 2010, 97, 935–946. [Google Scholar] [CrossRef]
  34. Sverdlov, O.; Rosenberger, W.F. On Recent Advances in Optimal Allocation Designs in Clinical Trials. J. Stat. Theory Pract. 2013, 7, 753–773. [Google Scholar] [CrossRef]
  35. Olayiwola, O.M.; Apantaku, F.S.; Bisira, H.O.; Adewara, A.A. Efficiency of Neyman Allocation Procedure over Other Allocation Procedures in Stratified Random Sampling. Am. J. Theor. Appl. Stat. 2013, 2, 122–127. [Google Scholar] [CrossRef]
  36. Wright, T. The Equivalence of Neyman Optimum Allocation for Sampling and Equal Proportions for Apportioning the US House of Representatives. Am. Stat. 2012, 66, 217–224. [Google Scholar] [CrossRef]
  37. May, C.; Flournoy, N. Asymptotics in Response-Adaptive Designs Generated by a Two-Color, Randomly Reinforced Urn. Ann. Stat. 2009, 37, 1058–1078. [Google Scholar] [CrossRef]
  38. Duarte, B.P.M.; Atkinson, A.C. Optimum Designs for Clinical Trials in Personalized Medicine When Response Variance Depends on Treatment. J. Biopharm. Stat. 2024, 34, 1054–1071. [Google Scholar] [CrossRef] [PubMed]
  39. Duarte, B.P.M.; Atkinson, A.C.; Pedrosa, D.J.; van Munster, M. Compound Optimum Designs for Clinical Trials in Personalized Medicine. Mathematics 2024, 12, 3007. [Google Scholar] [CrossRef]
  40. Metelkina, A.; Pronzato, L. Information-Regret Compromise in Covariate-Adaptive Treatment Allocation. Ann. Stat. 2017, 45, 2046–2073. [Google Scholar] [CrossRef]
  41. Rosenberger, W.F. Asymptotic Inference with Response-Adaptive Treatment Allocation Designs. Ann. Stat. 1993, 21, 2098–2107. [Google Scholar] [CrossRef]
  42. Hu, F.; Rosenberger, W.F.; Zhang, L.X. Asymptotically Best Response-Adaptive Randomization Procedures. J. Stat. Plan. Inference 2006, 136, 1911–1922. [Google Scholar] [CrossRef]
  43. Dupont, W.D.; Plummer, W.D. Power and Sample Size Calculations. Control. Clin. Trials 1990, 11, 116–128. [Google Scholar] [CrossRef]
  44. Malsburg, T.V.D.; Angele, B. False Positives and Other Statistical Errors in Standard Analyses of Eye Movements in Reading. J. Mem. Lang. 2016, 94, 119–133. [Google Scholar] [CrossRef]
  45. Pearce, G.; Frisbie, D.D. Statistical Evaluation of Biomedical Studies. Osteoarthr. Cartil. 2010, 18, S117–S122. [Google Scholar] [CrossRef]
  46. Morris, T.P.; White, I.R.; Crowther, M.J. Using Simulation Studies to Evaluate Statistical Methods. Statistics in Medicine 2019, 38, 2074–2102. [Google Scholar] [CrossRef]
  47. Mehta, C.R.; Pocock, S.J. Adaptive Increase in Sample Size When Interim Results are Promising: A Practical Guide with Examples. Stat. Med. 2011, 30, 3267–3284. [Google Scholar] [CrossRef]
  48. Al Garni, H.Z.; Awasthi, A. Al Garni, H.Z.; Awasthi, A. A Monte Carlo Approach Applied to Sensitivity Analysis of Criteria Impacts on Solar PV Site Selection. In Handbook of Probabilistic Models, 1st ed.; Samui, P., Bui, D.T., Chakraborty, S., Deo, R.C., Eds.; Butterworth-Heinemann: Oxford, UK, 2020. [Google Scholar]
  49. Data World. LCDAS12319. Available online: https://data.world/project-029/lcdas12319 (accessed on 5 April 2025).
Figure 1. The flow chart for the NWMP procedure.
Figure 1. The flow chart for the NWMP procedure.
Mathematics 13 01273 g001
Figure 2. NWMP results: (a) type I error values for α = 0.01 and sample size 300, (b) type I error values for α = 0.05 and sample size 360, (c) type I error values ranges for α = 0.01 and sample sizes 90 and 720, and (d) type I error values ranges for α = 0.05 and sample sizes 90 and 720.
Figure 2. NWMP results: (a) type I error values for α = 0.01 and sample size 300, (b) type I error values for α = 0.05 and sample size 360, (c) type I error values ranges for α = 0.01 and sample sizes 90 and 720, and (d) type I error values ranges for α = 0.05 and sample sizes 90 and 720.
Mathematics 13 01273 g002
Figure 3. Power values of NWMP with (a) α = 0.05, p 1 = 0.1, and p 2 = 0.15, 0.2, 0.25, 0.3. (b) α = 0.01, p 1 = 0.1, and p 2 = 0.15, 0.2, 0.25, 0.3.
Figure 3. Power values of NWMP with (a) α = 0.05, p 1 = 0.1, and p 2 = 0.15, 0.2, 0.25, 0.3. (b) α = 0.01, p 1 = 0.1, and p 2 = 0.15, 0.2, 0.25, 0.3.
Mathematics 13 01273 g003
Figure 4. Percent of rejections of H 0 occurring at stage i with 500,000 iterations when the difference is present.
Figure 4. Percent of rejections of H 0 occurring at stage i with 500,000 iterations when the difference is present.
Mathematics 13 01273 g004
Figure 5. Percent of rejections of H 0 occurring at stage i with 500,000 iterations when the difference is not presented.
Figure 5. Percent of rejections of H 0 occurring at stage i with 500,000 iterations when the difference is not presented.
Mathematics 13 01273 g005
Table 1. The O’Brien and Fleming original and corrected critical values.
Table 1. The O’Brien and Fleming original and corrected critical values.
αOriginal Critical Values
Corrected Critical Values
K = 1K = 2K = 3K = 4K = 5
0.50.462
0.4547
0.656
0.6546
0.75
0.7439
0.785
0.8013
0.819
0.8431
0.12.67
2.7042
2.859
2.8195
2.907
2.9247
2.979
3.0047
3.087
3.0650
0.092.866
2.8730
3.031
2.9817
3.073
3.0877
3.147
3.1668
3.283
3.2275
0.083.077
3.0633
3.197
3.1646
3.24
3.2700
3.338
3.3498
3.467
3.4114
0.073.294
3.2814
3.363
3.3754
3.437
3.4799
3.546
3.5594
3.663
3.6220
0.063.576
3.5348
3.652
3.6207
3.683
3.7251
3.853
3.8034
3.889
3.8669
0.053.869
3.8399
3.928
3.9152
3.94
4.0191
4.17
4.0961
4.149
4.1602
0.044.289
4.2177
4.231
4.2809
4.264
4.3836
4.477
4.4599
4.584
4.5243
0.034.8
4.7099
4.722
4.7622
4.7
4.8587
4.964
4.9341
5.045
5.0008
0.025.49
5.4106
5.392
5.4537
5.462
5.5396
5.555
5.6148
5.789
5.6827
0.016.667
6.6393
6.574
6.6618
6.503
6.7353
6.864
6.8021
6.838
6.8764
0.0057.885
7.8863
7.818
7.9019
7.442
7.9529
7.89
8.0094
8.037
8.0803
0.00110.062
10.8280
10.24
10.8527
10.202
10.8618
11.06
10.9263
10.6
10.9820
Table 2. Type I values obtained from Monte Carlo simulations for sample sizes of 360 with α = 0.05 and 300 with α = 0.01 using the NWMP.
Table 2. Type I values obtained from Monte Carlo simulations for sample sizes of 360 with α = 0.05 and 300 with α = 0.01 using the NWMP.
Estimated   Type   I   Values
P α K =  1 K =  2 K =  3 K =  4 K =  5
0.1 0.05
0.01
0.0495
0.0095
0.0476
0.0091
0.0459
0.0088
0.0438
0.0086
0.0421
0.0081
0.2 0.05
0.01
0.0505
0.0103
0.0484
0.0097
0.0456
0.0094
0.0440
0.0092
0.0421
0.0088
0.3 0.05
0.01
0.0508
0.0101
0.0480
0.0099
0.0456
0.0097
0.0444
0.0093
0.0423
0.0090
0.4 0.05
0.01
0.0487
0.0104
0.0481
0.0099
0.0467
0.0097
0.0446
0.0096
0.0422
0.0091
0.5 0.05
0.01
0.0510
0.0095
0.0502
0.0090
0.0453
0.0088
0.0442
0.0086
0.0428
0.0088
Table 3. Type I value ranges obtained from Monte Carlo simulations for α = 0.05 and α = 0.01 and sample sizes 90 and 720 with the NWMP.
Table 3. Type I value ranges obtained from Monte Carlo simulations for α = 0.05 and α = 0.01 and sample sizes 90 and 720 with the NWMP.
Sample Sizes
α N = 90
Type I Value (From, To)
N = 720
Type I Value (From, To)
K 1 0.05
0.01
(0.0450, 0.0539)
(0.0066, 0.0104)
(0.04831,0.0519)
(0.0099, 0.0109)
2 0.05
0.01
(0.0428, 0.0511)
(0.0064, 0.0104)
(0.0483, 0.0494)
(0.0098, 0.0101)
3 0.05
0.01
(0.0418, 0.0476)
(0.0062, 0.0096)
(0.0463, 0.0494)
(0.0092, 0.0102)
4 0.05
0.01
(0.0379, 0.0461)
(0.0061,0.0099)
(0.0431, 0.0454)
0.0091, 0.0103)
5 0.05
0.01
(0.0374, 0.0458)
(0.0064, 0.0094)
(0.0428, 0.0441)
(0.0086, 0.0091)
Table 4. Power estimates for different cases implementing Neyman allocation and different weights to the O’Brien Fleming procedure, with decreasing weights, with α = 0.05 and α = 0.01 .
Table 4. Power estimates for different cases implementing Neyman allocation and different weights to the O’Brien Fleming procedure, with decreasing weights, with α = 0.05 and α = 0.01 .
P Power Estimates
Total Sample SizesK = 1K = 2K = 3K = 4K = 5
0.151366
2032
0.80195
0.80109
0.7965
0.79942
0.78966
0.79555
0.78357
0.79171
0.77882
0.78726
0.2396
586
0.80459
0.80187
0.79619
0.79992
0.78956
0.79606
0.78525
0.79011
0.7803
0.7857
0.25200
292
0.80717
0.8038
0.79827
0.80251
0.79231
0.79388
0.78666
0.7924
0.78187
0.78596
0.3120
182
0.80555
0.80185
0.79701
0.7976
0.78917
0.79582
0.7749
0.78848
0.7723
0.77932
Table 5. Iterations and percentages of acceptance of H a occurring at stage k out of 500,000 iterations.
Table 5. Iterations and percentages of acceptance of H a occurring at stage k out of 500,000 iterations.
Iterations   of   Acceptance   of   H a (%)
K = 2 K = 3 K = 4 K = 5
First Stage170,396 (43%)43,222 (11%)7936 (2%)1133 (0%)
Second Stage229,315 (57%)236,537 (59%)162,778 (41%)89,734 (23%)
Third Stage 118,018 (30%)149,650 (38%)150,267(38%)
Fourth Stage 75,325(19%)97,035 (25%)
Fifth Stage 55,462 (14%)
Table 6. Sample sizes and percentages of the rejection of H 0 occurring at stage k out of 500,000 iterations.
Table 6. Sample sizes and percentages of the rejection of H 0 occurring at stage k out of 500,000 iterations.
Iterations   of   Rejection   of   H 0 (%)
K = 2 K = 3 K = 4 K = 5
First Stage2421 (10%)224 (1%)16 (0%)0 (0%)
Second Stage21,776 (90%)6726 (29%)2022 (9%)578 (3%)
Third Stage 15,860 (70%)7865 (35%)3673 (16%)
Fourth Stage 12,499 (56%)7512 (35%)
Fifth Stage 9990 (46%)
Table 7. The NWMP results for the computational example.
Table 7. The NWMP results for the computational example.
i Critical Valuesn1n2x1x2Total Sample SizeChi-Square i K χ i 2
Results of Case K = 1
( w 1 = 100 % )
13.8430030010015260018.518.5
Results of case K = 2 ,
( w 1 = 65%, w 2 = 35%)
13.921951956710239012.7926.396
Results of case K = 3
( w 1 = 40%, w 2 = 35%, w 3 = 25%)
13.9412012042642408.1782.726
23.942232277411745015.51910.346
Results of case K = 4
( w 1 = 35%, w 2 = 30%, w 3 = 25%, w 4 = 10%)
14.1010510537572107.7041.926
24.101931976710339012.2386.119
Results of case K = 5
( w 1 = 35%, w 2 = 25%, w 3 = 20%, w 4 = 15%, w 5 = 5%)
14.1610510537572107.7041.540
24.1617818263933609.043.616
34.1641367156374807.34.382
Note: stages reported up to significance.
Table 8. The NWMP results for the real-life example.
Table 8. The NWMP results for the real-life example.
i Critical Values n 1 n 2 x 1 x 2 Total Sample SizeChi-Square i K χ i 2
Results of Case K = 1
(w1 = 100%)
13.84150150911430086.9129.12
Results of case 2 ( K = 2 )
( w 1 = 60 % ,   w 2 = 40 % )
13.929090371318015.957.975
Results of case 3 ( K = 3 )
( w 1 = 40 % ,   w 2 = 35 % ,   w 3 = 25 % )
13.94606019131201.5340.5113
23.9413392771322543.428.93
Results of case 4 ( K = 4 )
( w 1 = 35 % ,   w 2 = 30 % , w 3 = 20 % ,   w 4 = 15 % )
14.02535316131060.4270.1068
24.0210095441319521.64410.82
Results of case 5 ( K = 5 )
( w 1 = 30 % ,   w 2 = 25 % ,   w 3 = 20 % ,   w 4 = 15 % ,   w 5 = 10 % )
14.1645451311900.2270.0454
24.168580331316510.44.16
Note: stages reported up to significance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hammouri, H.; Salman, M.; Ali, M.; Abdel Muhsen, R. Advances in Clinical Trial Design: Employing Adaptive Multiple Testing and Neyman Allocation for Unequal Samples. Mathematics 2025, 13, 1273. https://doi.org/10.3390/math13081273

AMA Style

Hammouri H, Salman M, Ali M, Abdel Muhsen R. Advances in Clinical Trial Design: Employing Adaptive Multiple Testing and Neyman Allocation for Unequal Samples. Mathematics. 2025; 13(8):1273. https://doi.org/10.3390/math13081273

Chicago/Turabian Style

Hammouri, Hanan, Muna Salman, Mohammed Ali, and Ruwa Abdel Muhsen. 2025. "Advances in Clinical Trial Design: Employing Adaptive Multiple Testing and Neyman Allocation for Unequal Samples" Mathematics 13, no. 8: 1273. https://doi.org/10.3390/math13081273

APA Style

Hammouri, H., Salman, M., Ali, M., & Abdel Muhsen, R. (2025). Advances in Clinical Trial Design: Employing Adaptive Multiple Testing and Neyman Allocation for Unequal Samples. Mathematics, 13(8), 1273. https://doi.org/10.3390/math13081273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop