Next Article in Journal
Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering
Next Article in Special Issue
Fiducial Inference on the Right Censored Birnbaum–Saunders Data via Gibbs Sampler
Previous Article in Journal
fsdaSAS: A Package for Robust Regression for Very Large Datasets Including the Batch Forward Search
 
 
Article
Peer-Review Record

Weighted Log-Rank Statistics for Accelerated Failure Time Model

Stats 2021, 4(2), 348-358; https://doi.org/10.3390/stats4020023
by Seung-Hwan Lee
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Stats 2021, 4(2), 348-358; https://doi.org/10.3390/stats4020023
Submission received: 23 March 2021 / Revised: 28 April 2021 / Accepted: 30 April 2021 / Published: 3 May 2021
(This article belongs to the Special Issue Survival Analysis: Models and Applications)

Round 1

Reviewer 1 Report

This article provides an interesting extension of the Buske et al (2000) approach for improving the power of weighted log-rank statistics for studies with rare events.  The methodology is sound.   However, I suggest the presentation can be improved upon and suggest to consider an alternative data analysis example (see below for specific recommendation).

Suggestions/comment are the following:

(1) The simulations are interesting, however they seem to illustrate the advantages of weight log-rank tests due to non-proportional hazards scenarios rather than rare event issues.   It seems that the rare event modification (extension of Buyske et al) is not really a factor in these simulations?   Or perhaps the rare event scenario is evaluated by the heavy censoring configurations?   

(2) Possibly related to the above question is that, in the simulations, the advantage of the W_{2}^{*} test seems to marginally improve the G^{2} test.  The power gains in the text seem to be incorrectly stated -- For example, on page 15,  it is stated "For moderate censoring (40%), they increase up to 15% (20.5% to 22.6%)".  But isn't (0.226-0.205)/0.205 only a 10% increase?  Also for the heavy censoring case, 21.7% to 37.5% seems to be a 72% improvement, and not 27%?   Suggest to check these calculations.

(3) In Table 1, last row of Case 1, the coverage is reported as "9.60" for W_{2}^{*}.   Should this be 96?

(4)  Strongly suggest to provide simulations under the null hypothesis (theta=1) to check for control of the type-1 errors under the cases 1-3.   Do the new procedures adequately controlled the size of the tests?

(5) On page 12, the text refers to the Table 1 results as "under the null hypothesis".   But it is previously stated that these results are under theta=2, and thus not under the null (theta=1)?   It seems that "null hypothesis" refers to the use of the weighted log-rank test for estimating the scale parameter.  However, in the simulations, this seems confusing in the context of comparing treatments where the *null* would be the treatments are identical.

(6)  The data analysis example does not seem to highlight the advantage of the methods.  The results are practically the same for the weighted estimators and log-rank.    Suggest to provide the Kaplan-Meier curves;  Is the event rate rare, or suggests NPH?

(7)  Strongly suggest commenting on the insignificance of the treatment differences.  I believe the text overstates the analysis findings.  The text reads "The weighted log-rank statistics .... implies the 10% difference approximately in effectiveness".     The 95% CIs all contain 1.  Personally, I would suggest the findings indicate no significant difference in effectiveness between the pretreatment regimens.   And by the way, what is the pre-treatment regimen.  Does that mean the treatment regimen?  Please describe the treatments being compared.

(8) On page 16, text reads "Results show that the vaginal cancer data can be reasonably fit by a two-sample scale model".   Why do the results suggest this?   It does not seem the agreement between the different estimators is a goodness-of-fit test for the model?

(9)   Suggest to consider the analysis dataset ACTG175 which is available in the R package speff2trial (R manual attached).   This is an HIV study of several treatment regimens where the event rate is quite rare.   It may be an interesting example for the proposed methods.     Please see the KM curves in the attached NEJM publication (Hammer et al, 1996). 

(9 continued)   For the data analysis example, suggest to include a plot of U_{rho}(theta) versus theta along with J(theta) to illustrate the calculation of the estimate and CIs. 

(10)  In the discussion section, can you comment on whether the procedures can be extended to include baseline covariates?   

(11)  Suggest to provide computing code (R or SAS?) to encourage the use of the methods in practice.

 Minor comments:

(a)  There are several grammatical edits that can improve the article.  For example on page 1 "For this kind of studies".   Suggest to replace "this" with "these".   Also "Cox proportional hazard model" -- the "hazard" should be "hazards".   

(b)  The notation of W_{n}^{*} for the definition of the weight and W_{1}^{*} and W_{2}^{*} in the simulations seems to conflict.   Maybe include the dependence on rho in the definition of W_{n}^{*}, or just clarify that W_{1}^{*} denotes W_{n}^{*} with rh0=1.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

In the section 3. Numerical studies it is not entirely clear from which variant of the G^rho family tests the proposed variant of the log-rank test is compared. Is it the original weight function proposed by Fleming and Harrington (1981) or the modification of Buyske et al. (2000) mentioned also in the paper. This needs to be better described and argued.

It would also be interesting to compare the proposed method with the supremum variants of weighted log-rank tests (sometimes also called Renyi tests) proposed by Gill (1980). At a minimum, this type of test should be mentioned in the Introduction as one of the possibilities in the case of crossing hazard (survival) functions.

Furthermore, in my opinion, real data on vaginal cancer demonstrating the use of the proposed method are not appropriately selected.  This is mainly due to the fact that the method is presented as an improvement of existing methods for situations with a high proportion of censored cases (for example more than 50%). The proportion of censored cases is around 10% in both groups. 

The section Conclusions (Concluding Remarks) should also be extended and the essential results of the paper should be better presented.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop