Created attachment 1933 [details] probability vector p Although it does not say explicitly in the documentation, the following code will (generally) return true: a <- b <- rep(NA_integer_, 1e4) set.seed(123) for (i in 1:1e4) { this.p <- runif(6) this.seed <- .Random.seed a[i] <- sample(1:6, 1, p=this.p) .Random.seed <<- this.seed b[i] <- sample(c(rep(0, i), 1:6), 1, p=c(rep(0, i), this.p)) } all(a==b) i.e. the result of sample does not change if we add elements to x which have a probability weight of 0. However, I came across one particular probability vector p = (0.17942180, 0.12002511, 0.45462425, 0.07368815, 0.05221558, 0.12002511) [see attached probs.Rdata] for which this is not (always) true: load("probs.Rdata") a <- b <- rep(NA_integer_, 1e4) set.seed(123) for (i in 1:1e4) { this.seed <- .Random.seed a[i] <- sample(1:6, 1, p=p) .Random.seed <<- this.seed b[i] <- sample(c(rep(0, i), 1:6), 1, p=c(rep(0, i), p)) } all(a==b) which returns false. If we inspect the logical vector a==b more closely, we find huge blocks of TRUE: all((a==b)[8192:1e4]) all((a==b)[2048:4078]) but we also find blocks with a lot of FALSE.

It appears that this happens because a sort operation performed by sample() is not stable, i.e. does not preserve the order of equal values, in this case equal probabilities. If we modify the R (development version) source code as follows, <diff> --- src/main/random.c (revision 69623) +++ src/main/random.c (working copy) @@ -409,6 +409,11 @@ /* Sort probabilities into descending order */ /* Order element identities in parallel */ revsort(p, perm, n); + for (i = 0; i < n; i++) { + if (p[i] > 0) { + Rprintf("%d: %f\n", perm[i], p[i]); + } + } /* Compute the sample */ totalmass = 1; </diff> , we can examine how the probabilities are ordered. Then, by manually executing the first two rounds of the example loop, we see that on the second round, item 6 comes before item 2 when there are no zero probabilities, but item 8 (previously 6) comes _after_ item 4 (previously 2) when the two zero probabilities are included. This is just an observation. I'm not saying that the sort should be stable. > set.seed(123) > this.seed <- .Random.seed > sample(1:6, 1, p=p) 3: 0.454624 1: 0.179422 6: 0.120025 2: 0.120025 4: 0.073688 5: 0.052216 [1] 3 > .Random.seed <<- this.seed > i <- 1 > sample(c(rep(0, i), 1:6), 1, p=c(rep(0, i), p)) 4: 0.454624 2: 0.179422 7: 0.120025 3: 0.120025 5: 0.073688 6: 0.052216 [1] 3 > i <- 2 > this.seed <- .Random.seed > sample(1:6, 1, p=p) 3: 0.454624 1: 0.179422 6: 0.120025 2: 0.120025 4: 0.073688 5: 0.052216 [1] 2 > .Random.seed <<- this.seed > sample(c(rep(0, i), 1:6), 1, p=c(rep(0, i), p)) 5: 0.454624 3: 0.179422 4: 0.120025 8: 0.120025 6: 0.073688 7: 0.052216 [1] 6