Next Article in Journal
Properties and Limiting Forms of the Multivariate Extended Skew-Normal and Skew-Student Distributions
Next Article in Special Issue
A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase
Previous Article in Journal
The Stacy-G Class: A New Family of Distributions with Regression Modeling and Applications to Survival Real Data
Previous Article in Special Issue
Resampling Plans and the Estimation of Prediction Error
 
 
Review
Peer-Review Record

Resampling under Complex Sampling Designs: Roots, Development and the Way Forward

Stats 2022, 5(1), 258-269; https://doi.org/10.3390/stats5010016
by Pier Luigi Conti 1,* and Fulvia Mecatti 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Stats 2022, 5(1), 258-269; https://doi.org/10.3390/stats5010016
Submission received: 27 January 2022 / Revised: 28 February 2022 / Accepted: 1 March 2022 / Published: 8 March 2022
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

Round 1

Reviewer 1 Report

see my review

"Resampling under complex sampling designs: roots, development, and the way foreward"

P.L. Conti  and F. Mecatti

This is an enjoyable paper that ckearly discusses the difficulties and possibilities for bootstrapping samples obtained from finite populations. A nice addition to the special issue on resampling, the problems with applying  traditional bootstrapping methods to finite sampling schemes are made clear, and the pseudo-population correction scheme, among others, is well motivated. The writing is generally good but suffers a bit from split personality: mostly the ideas are presented with an appealing heuristic informality, mixed in however with occasional theoretical probabilistic jargon. This will scare off the less mathematical readers. Most of page 4, in particular, doesn't really add to the subsequent message. I hope the paper will be accepted after the authors have a go at simplifying the exposition.

                         

Author Response

Reply to Reviewer 1: Please see Attachments

Author Response File: Author Response.pdf

Reviewer 2 Report

It was a pleasure for me to go through the submitted paper and I enjoyed reading it.

Starting from the classical Efron’s bootstrap, and in order to account for the sampling design in resampling from finite populations, the paper provides a reasoned review of the pseudo-population approach to resampling which mimics the plug-in principle in Efron’s bootstrapping.

The paper is clear, well-written and well-motivated. The pros and cons of the approach are fairly discussed and open problems highlighted.

I do not have specific suggestions on how the paper could be improved and I am favorable to accept it in the present form

Author Response

Many thanks are due to the Reviewer for his/her comments.

Author Response File: Author Response.pdf

Reviewer 3 Report

This is an interesting and well written review paper that contains a valuable contribution to the field. I have only a few minor comments below aiming to clarify certain points in the paper.

1- I agree with the authors that the pseudo-population bootstrap is attractive and the natural extension of Efron's bootstrap. I also see no issue with the authors' choice of focusing on this bootstrap approach. However, I find that some statements in the paper appear too strong, and perhaps that was not the intention of the authors. Here are a few examples:

  • Middle of page 2: The authors wrote:"...they are virtually the only methods that possess a solid theoretical justification...". Are they? What about the paper of Rao and Wu (1988), for instance?
  • Bottom of page 9: The authors wrote:"It is the only resampling procedure fully justified by asymptotic arguments..." Similar comment as above. Perhaps the authors meant that this is the only procedure justified by arguments similar to those of Bickel and Freedman (1981). 
  • The authors wrote that all the other bootstrap procedures are ad hoc. This seems to imply that they are not theoretically sound and are applicable only in specific scenarios. However, some of these methods have been used in cases where the pseudo-population bootstrap has yet to be developed (e.g., the Rao, Wu and Yue (1992, Survey Methodology) bootstrap that is applicable to stratified multistage sampling). 
  • It is true that, in the "ad hoc" approach, the resampling mechanism is determined to ensure that the bootstrap variance is equal to the textbook variance estimator in the linear case. Indeed, matching must hold for the first two moments (not only the second).  The authors then say that it is far from the arguments used to justify the classical bootstrap. Still, this argument was made in the well known book by Efron and Tibshirani (1993). I think matching the first two moments, at least asymptotically, is a property that must hold for all bootstrap methods including the pseudo-population bootstrap. 

2- The authors say in section 1.1 that the ad hoc approach consists in rescaling data. This is true for some of them but note that Beaumont and Patak (2012) and Antal and Tillé (2011) do not rescale data. They rescale weights, and Antal and Tillé restrict to rescaling factors that are integers.

3- First centered equation on page 9: I think N should be replaced with n and n should be replaced with N. Also, E_P(I_i | X_N) is pi_i not 1/pi_i.

4- Something appears wrong in equation (10). Should pi_i^(-1) I_i be replaced with pi_i? Also, in the middle of page 14, it is said that N_i^* asymptotically behave as pi_i^(-1) I_i. Should it be that the expectation of N_i^* asymptotically behave as pi_i^(-1)?

5- Step 2* on page 14 is exactly the same as Step 2 on page 13. Step 2* needs to be updated appropriately.

6- Second centered equation on page 15: What is F_H? It is undefined. Should it be F*(N*) instead?

7- Bottom of page 15: There is a range of options for descriptive inference. However, the authors point out that for analytic inference only the multinomial pseudo-population and unconditional approach is valid. Therefore, does this mean that this choice would not be suitable for descriptive inference (particularly if the sampling fraction is large)? Are there restrictions for descriptive inference? Would the conditional approach be better for descriptive inference with large sampling fractions?

8- In section 5, I think it would add value to the paper if some discussion was added on implementation of the pseudo population approach. A difficulty of this approach is that real populations typically contain millions of people. Methods that avoid physically creating pseudo-populations are thus attractive. Perhaps a couple of references could be added on this topic.

Another drawback of the pseudo-population approach is the apparent necessity to generate and save a large number of bootstrap sample files. However, the authors may want to point out that it is not necessary to save all the bootstrap sample files. Only the original sample file must be saved along with two additional variables for each bootstrap replicate: one variable that contains the number of times each sample unit is used to create the pseudo-population and another one that contains the number of times each sample unit has been selected in the bootstrap sample. In other words, it can be implemented similar to methods that rescale the sampling weights. The authors may want to mention this point. 

Minor comments:

1- Bottom of page 2: "direct boottstrap" should be replaced with "direct bootstrap".

2- First sentence of section 3: What is it meant by r.d.s? I think it should be r.v.s. However, for clarity, I think it would be better to write "random variables" and avoid all those shortcuts (e.g., f.p.d.f. in section 4.2, which must mean "finite population distribution function").

3- First sentence of section 4: "technique" should be replaced with "techniques".

4- Middle of page 13: I would suggest replacing "they tend to coincide" with "their distribution tend to coincide".

5- Page 15: At two places, the authors wrote: Y_Ns. I think it would be better to remove the "s".

6- Section 5: replace "notgenerally" with "not generally".

Author Response

Replies to Reviewer 3: see attached file

Author Response File: Author Response.pdf

Back to TopTop