Mirsad Todorovac wrote:
Here's another way to get independent bits, each with probability p=k/2^m, with smaller overhead.
But this one relies on the internal representation again. My point was to get rid of it.
Why I still want to have something "internal" generating random sets of numbers is that I'd like a common user that might lack your knowledge on the matter to be able to use them, for example, for statistical purposes, or as a source of random data for program testing ...
I think they might deserve a function or two in RTS, or at least function or two in a unit.
Feel free to write one. (I suggest a non-RTS unit where you can also put various other stuff related to sets or random that you find useful. Since we have GPCUtils, StringUtils, FileUtils units, a SetUtils or RandomUtils unit would fit nicely.)
Emil Jerabek wrote:
IMHO, the
for i:= 0 to MaxN-1 do if Random (MaxN) <= n then Include (A, i);
(I think Frank suggested it) is better, even though it's perhaps slower. It has guaranteed O(n), at least, and it's very easy to parametrize set with p=n/MaxN probability of each bit being set.
I'm afraid we are confusing two distinct problems. In one mail you asked for a method to generate sets of size n, and that's what the "repeat r:=Random(MaxN) until..." stuff does. This "if Random(MaxN) < n ..." generates a different distribution, namely it gives MaxN independent bits, each being set with probability p (in other words, P(A=X) = p^j * (1-p)^{MaxN-j} for any set X of size j, j=0,..,MaxN). I agree that this distribution is more natural. It just doesn't make much sense to compare those two algorithms, because they serve different tasks. Choose what you want to get, and then pick an appropriate algorithm.
I think so far we have the following useful algorithms:
- independent bits, arbitrary p: as above (for p = n / MaxN) or `Random < p' (for really arbitrary p)
- independent bits, p = 0.5, faster: get 32 bit random word, set 32 elements from it (variants for p = 0.25, 0.75, ...)
- exactly n bits set: Emil's suggestion ("randomized O(n)") or card deck shuffle algorithm (non-randomized O(n), a little more work to implement).
Frank