Submitted Successfully!
To reward your contribution, here is a gift for you: A free trial for our video production service.
Thank you for your contribution! You can also upload a video entry or images related to this topic.
Version Summary Created by Modification Content Size Created at Operation
1 -- 1160 2022-03-31 04:26:37 |
2 format is correct + 75 word(s) 1235 2022-03-31 05:05:06 |

Video Upload Options

Do you have a full video?

Confirm

Are you sure to Delete?
Cite
If you have any further questions, please contact Encyclopedia Editorial Office.
Conti, P.L.; Mecatti, F. Resampling under Complex Sampling Designs. Encyclopedia. Available online: https://encyclopedia.pub/entry/21191 (accessed on 17 May 2024).
Conti PL, Mecatti F. Resampling under Complex Sampling Designs. Encyclopedia. Available at: https://encyclopedia.pub/entry/21191. Accessed May 17, 2024.
Conti, Pier Luigi, Fulvia Mecatti. "Resampling under Complex Sampling Designs" Encyclopedia, https://encyclopedia.pub/entry/21191 (accessed May 17, 2024).
Conti, P.L., & Mecatti, F. (2022, March 31). Resampling under Complex Sampling Designs. In Encyclopedia. https://encyclopedia.pub/entry/21191
Conti, Pier Luigi and Fulvia Mecatti. "Resampling under Complex Sampling Designs." Encyclopedia. Web. 31 March, 2022.
Resampling under Complex Sampling Designs
Edit

In principle, survey data are an ideal context to apply resampling methods to approximate the (unknown) sampling distribution of statistics, due to both a usually large sample size and data of controlled quality. However, survey data cannot be generally assumed independent and identically distributed (i.i.d.) so that any resampling methodologies to be used in sampling from finite populations must be adapted to account for the sample design effect. A principled appraisal is given and discussed here.

resampling bootstrap pseudo-population design effect

1. Introduction

Resampling methods have a long and honorable history, going back at least to the Efron's seminal paper in late 70s [1]. Virtually all resampling methodologies used in sampling from finite populations are based on the idea of accounting for the effect of the sampling design. In fact, the main effect of the sampling design is that data cannot be generally assumed independent and identically distributed (i.i.d.).

The main approaches are essentially two: the ad hoc approach and the plug in approach. The basic idea of the ad hoc approach consists in maintaining Efron’s bootstrap as a resampling procedure but in properly rescaling data in order to account for the dependence among units. This approach is used, among others, in [2][3], where the re-sampled data produced by the “usual” i.i.d. bootstrap are properly rescaled, as well as in [4][5]; cfr. also the review in [6]. In [7] a “rescaled bootstrap process” based on asymptotic arguments is proposed. Among the ad hoc approaches, we also classify [8] (based on a rescaling of weights) and the “direct bootstrap” by [9]. Almost all ad hoc resampling techniques are based on the same justification: in the case of linear statistics, the first two moments of the resampled statistic should match (at least approximately) the corresponding estimators; cfr., among the others, [9]. Cfr. also [8], where an analysis in terms of the first three moments is performed for Poisson sampling.

Here the second approach based on pseudo-populations is considered. The reasons beyond this choice are i) resampling based on pseudo-populations actually parallels Efron’s bootstrap for i.i.d. observations; ii) the basic ideas are relatively simple to understand and to apply once the problem is approached in terms of an appropriate estimator of the finite population distribution function (f.p.d.f.); and iii) the main theoretical justification for resampling based on pseudo-population is of asymptotic nature, similar in many respects, to the well known Bickel-Freedman results [10] for Efron’s bootstrap.

2. Accounting for the Sampling Design in Resampling: The Pseudo-Population Approach

The idea of pseudo-population goes back, at least, to [11] in the case of median estimation essentially under srs when the population size is a multiple of the sample size. Rather similar ideas are in [12] for srs, again under the condition that the ratio between population size and sample size is a ninteger, and in [13], for stratified random sampling. A major step forward is the paper by [14], where the construction of a pseudo-population is studied under a general πps sampling design, with general first order inclusion probabilities. In [15], a different approach to the construction of a pseudo-population, very interesting in many respects, is considered.
The pseudo-population approach to resampling can be considered as a two-phase procedure. In the first phase, a pseudo-population (roughly speaking, a prediction of the population) is constructed. In the second phase, a (bootstrap) sample is drawn from the pseudo-population. Broadly speaking, this approach parallels the plug-in principle by Efron. The pseudo-population is plugged in the sampling process and is used as a “surrogate” of the actual finite population. In the second phase, a sample is drawn from the pseudo-population, according to a sampling design that mimics the original one. In this view, the pseudo-population mimics the real population, and the (re)sampling process from the pseudo-population mimics the (original) sampling process from the real population.
In [16], it is thoroughly illustrated a class of resampling techniques for finite populations under a general (complex) sampling design which is asymptotically correct under mild assumptions. Practical recommendations for finite sample sizes and on how to construct the pseudo-population are also given.

3. Computational Issues

Application of the pseudo-population approach, despite its many theoretical merits, can be limited in practice by its computational burden. Real populations could contain millions of units, and thus the actual construction of a pseudo-population could be computationally cumbersome. For this reason, it is of primary interest to develop shortcuts that, while possessing the fundamental theoretical properties, are computationally simple to implement because they avoid the physical construction of the pseudo-population.
In [17] the problem of resampling for finite populations is addressed as a problem of sampling with replacement directly from the sample data with different drawing probabilities. An attempt to avoid complications related to integer-valued bootstrap weights, i.e. the number of replications of each sampled unit into the pseudo-population, is offered in [18], where non-integer bootstrap weights are allowed via the Horvitz–Thompson-based bootstrap (HTB) method. However, unless the sampling fraction tends to 0 as both population size and sample size increase, HTB does not generally possess the good asymptotic properties outlined in [16]. An interesting computational shortcut is in [19], where the pseudo-population (again with possibly non-integer Nis) is only implicitly used, and a computational scheme based on drawings with replacements from the original sample is proposed. Unfortunately, although the main idea behind that paper is interesting, the proposed bootstrap method fails to possess good asymptotic properties. Computational shortcuts, based on ideas similar to those in [19], but based on correct approximations of first order inclusion probabilities, were developed in [20] for descriptive, design-based inference. In particular, in that paper, methodologies based on drawings with replacements from the original sample were proposed, and their merits, from both a theoretical and a computational point of view, were studied.

Another practical drawback related to the pseudo-population approach is the seeming necessity to generate and store a large number of bootstrap sample files. However, it is not necessary to save all the bootstrap sample files. Only the original sample file should be saved along with two additional variables for each bootstrap replicate: one variable that contains the number of times each sample unit is used to create the pseudo-population and another one containing the number of times each sample unit has been selected in the bootstrap sample. In other words, it can be implemented similarly to methods that rescale the sampling weights.

4. Open Problems and Final Considerations

The pseudo-population approach, despite its merits, requires further development from both the theoretical and computational perspectives.

From a theoretical perspective, the results obtained thus far only refer to non-informative single-stage designs. The development of theoretically sound resampling methodologiesfor informative sampling designs is a major issue calling for more research. The main drawback is that, apart from the exception of adaptive designs (cfr. [21]) and the references therein) first order inclusion probabilities can rarely be computed, as these might depend on unobserved quantities. This is what happens, for instance, with most of the network sampling designs that are actually used for hidden populations, where the inclusion probabilities are unknown and depend on unobserved/unknown network links (cfr. [21][22] and the references therein). Again from the theoretical point of view, the consideration of multistage designs appears as a further necessary development as well as the consideration of non-respondent units.

From a computational point of view, the computational shortcuts developed thus far only apply to the case of descriptive inference. The development of theoretically well-founded  computational schemes valid for analytic inference is an important issue that deserves further attention.

References

  1. Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26.
  2. McCarthy, P.J.; Snowden, C.B. The bootstrap and finite population sampling. In Vital and Health Statistics; Public Heath Service Publication, U.S. Government Printing: Washington, DC, USA, 1985; Volume 95, pp. 1–23.
  3. Rao, J.N.K.; Wu, C.F.J. Resampling inference with complex survey data. J. Am. Stat. Assoc. 1988, 83, 231–241.
  4. Sitter, R.R. A resampling procedure for complex data. J. Am. Stat. Assoc. 1992, 87, 755–765.
  5. Chatterjee, A. Asymptotic properties of sample quantiles from a finite population. Ann. Inst. Stat. Math. 2011, 63, 157–179.
  6. Rao, J.N.K.; Wu, C.F.J.; Yue, K. Some recent work on resampling methods for complex surveys. Surv. Methodol. 1992, 18, 209–217.
  7. Conti, P.L.; Marella, D. Inference for quantiles of a fnite population: Asymptotic vs. resampling results. Scand. J. Stat. 2015, 42, 545–561.
  8. Beaumont, J.F.; Patak, Z. On the Generalized Bootstrap for Sample Surveys with Special Attention to Poisson Ssampling. Int. Stat. Rev. 2012, 80, 127–148.
  9. Antal, E.; Tillé, Y. A direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 2011, 106, 534–543.
  10. Bickel, P.J.; Freedman, D. Some asymptotic theory for the bootstrap. Ann. Stat. 1981, 9, 1196–1216.
  11. Gross, S.T. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association, Houston, TX, USA, 11–14 August 1980; pp. 181–184.
  12. Chao, M.T.; Lo, S.H. A bootstrap method for finite population. Sankhya 1985, 47, 399–405.
  13. Booth, J.G.; Butler, R.W.; Hall, P. Bootstrap methods for finite populations. J. Am. Stat. Assoc. 1994, 89, 1282–1289.
  14. Holmberg, A. A bootstrap approach to probability proportional-to-size sampling. In Proceedings of the ASA Section on Survey Research Methods, Alexandria, VA, USA, 1998; pp. 378–383.
  15. Pfeffermann, D.; Sverchkov, M. Parametric and semi-parametric estimation of regression models fitted to survey data. Sankhya B 1999, 61, 166–186.
  16. Conti, P.L.; Marella, D.; Mecatti, F.; Andreis, F. A unified principled framework for resampling based on pseudo-populations: Asymptotic theory. Bernoulli 2020, 26, 1044–1069.
  17. Ranalli, M.G.; Mecatti, F. Comparing Recent Approaches for Bootstrapping Sample Survey Data: A First Step Towards a Unified Approach. In Proceedings of the ASA Section on Survey Research Methods, Alexandria, VA, USA, 2012; pp. 4088–4099.
  18. Quatember, A. Pseudo-Populations—A Basic Concept in Statistical Surveys; Springer: New York, NY, USA, 2015.
  19. Quatember, A. The Finite Population Bootstrap—From the Maximum Likelihood to the Horvitz-Thompson Approach. Austrian J. Stat. 2014, 43, 93–102.
  20. Conti, P.L.; Mecatti, F.; Nicolussi, F. Efficient unequal probability resampling from finite populations. Comput. Stat. Data Anal. 2022, 167, 107366.
  21. Thompson, S.K. Sampling, 3rd ed; Wiley: New York, NY, USA, 2012.
  22. Thompson, S.K. Adaptive and Network Sampling for Inference and Interventions in Changing Populations. J. Surv. Stat. Methodol. 2017, 5, 1–21.
More
Information
Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : ,
View Times: 427
Revisions: 2 times (View History)
Update Date: 31 Mar 2022
1000/1000