USP Electronic Research Repository

stratifyR: An R Package for optimal stratification and sample allocation for univariate populations

Reddy, Karuna and Khan, Mohammad G.M. (2020) stratifyR: An R Package for optimal stratification and sample allocation for univariate populations. Australian and New Zealand Journal of Statistics, 62 (3). pp. 383-405. ISSN 1369-1473

[img] PDF - Published Version
Restricted to Repository staff only

Download (674kB)


Summary This R package determines optimal stratification of univariate populations under stratified sampling designs using a parametric-based method. It determines the optimum strata boundaries (OSB), optimum sample sizes (OSS) and multiple other quantities for the study variable, y, using the best-fit probability density function of a study variable available from survey data. The method requires the parameters and other characteristics of the distribution of the study variable to be known, either from available data or from a hypothetical distribution if the data are not available. In the implementation, the problem of determining the OSB is formulated as a mathematical programming problem and solved by using a dynamic programming technique. If the data of the population (i.e. the study variable) are available to the surveyor, the method estimates its best-fit distribution and determines the OSB and OSS under Neyman allocation, directly. When the dataset is not available, stratification is made based on the assumption that the values of the study variable, y, are available as hypothetical realisations of proxy values of y from past/recent surveys. Thus, it requires certain distributional assumptions about the study variable. At present, the package handles stratification for the populations where the study variable follows a continuous distribution: namely, Pareto, Triangular, Right-triangular, Weibull, Gamma, Exponential, Uniform, Normal, Lognormal and Cauchy distributions. In this paper, applications of major functionalities in the package are illustrated with a number of real/simulated as well as some hypothetical populations.

Item Type: Journal Article
Uncontrolled Keywords: dynamic programming, mathematical programming problem, optimum sample sizes, optimum strata boundaries, R project for statistical computing
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science, Technology and Environment (FSTE) > School of Computing, Information and Mathematical Sciences
Depositing User: Fulori Nainoca - Waqairagata
Date Deposited: 20 Oct 2020 02:56
Last Modified: 20 Oct 2020 02:56

Actions (login required)

View Item View Item

Document Downloads

More statistics for this item...