Sampling terminology is incorrect

I am a sampling statistician. Your terminology is not compatible with any of the existing sampling books. 

What you call [complete random sampling ](https://declaredesign.org/r/randomizr/reference/complete_rs.html) is hard to name, but the term you came up with is not used in sampling work. What you call [simple random sampling](https://declaredesign.org/r/randomizr/reference/simple_rs.html) is called [Poisson sampling](https://en.wikipedia.org/wiki/Poisson_sampling). The definition of simple random sampling is that all samples of size n from the finite population of size N have equal probabilities of selection. Equal probabilities of selection of individual units is not a sign of SRS; when I taught sampling to grad students, an exercise I used to have in the sampling section of the survey statistics class was to came up with multiple designs that are equal probability but not SRS (1 for C -- you need to know at least one to understand what SRS is vs. what it isn't, 3 designs for B, 5 different designs for A).

I understand that you are deep in development with a lot of parts that have already been documented, exposed, put on the website, published, etc. Still... this is fundamentals.

A standard introductory reference on sampling is [Lohr (3rd edition 2022)](https://www.routledge.com/Sampling-Design-and-Analysis/Lohr/p/book/9780367279509). The next level is probably [Kish (1965)](https://www.amazon.com/Survey-Sampling-Leslie-Kish/dp/0471109495). The graduate level textbooks are [Valliant Dever Kreuter (2018)](https://link.springer.com/book/10.1007/978-3-319-93632-1) accompanied by `library(PracTools)`, and [Tille 2006](https://www.amazon.com/Sampling-Algorithms/dp/B00A2MO070) accompanied by `library(sampling)` written by his student. 

A technical problem that you have is that the unequal probability sampling you use is not a proper probability proportional to size (PPS) sample: you can simulate and see that probabilities of selection are not maintained. The problem is with the underlying `base::sample()`; proper procedures are implemented in `library(sampling)` by Alina Matei and Yves Tille, e.g. `sampling::UPmaxentropy()`.

```
# example from Tille (2006)
set.seed(25)
# sum of probabilities == expected sample size == 3L
uneq_p = c(0.07,0.17,0.41,0.61,0.83,0.91)
sim_uneq_p_base <- as.data.frame(matrix(rep(0,6*20000),ncol=6))
for (k in 1:nrow(sim_uneq_p_base)) {
  this <- sample.int(n=length(uneq_p),size=sum(uneq_p),prob=uneq_p)
  sim_uneq_p_base[k,this] <- 1
}
colSums(sim_uneq_p_base)/nrow(sim_uneq_p_base)
uneq_p
# done right
sim_uneq_p_done_right <- as.data.frame(matrix(rep(0,6*20000),ncol=6))
for (k in 1:nrow(sim_uneq_p_done_right)) {
   sim_uneq_p_done_right[k,] <- sampling::UPmaxentropy(uneq_p)
}
colSums(sim_uneq_p_done_right)/nrow(sim_uneq_p_done_right)
uneq_p
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampling terminology is incorrect #99

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sampling terminology is incorrect #99

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions