On Tue, May 20, 2003 at 12:40:51PM +0200, Mirsad Todorovac wrote:
On Tue, 20 May 2003, Frank Heckenbach wrote:
Emil Jerabek wrote:
On Fri, May 16, 2003 at 07:14:46PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
I agree. Also, on a second look, the first algorithm ("for n small") gives O(n) as well, even for n as big as MaxN/2 (the expected number of calls to Random is 1.386 n).
Yet, Frank, Emil, I'm uncomfortable with this repeat r := Random (MaxN) until not (r in A) ...
I get a feeling that this cripples linearity of generated set, and I suspect p(n in A), n in [0..MaxN-1] is no longer constant and equal to n/MaxN ...
It is equal to n/MaxN, and in fact, P(A=X) = 1/\binom{MaxN,n} for any set X with n elements. (Proof: the distribution is obviously invariant under permutations.)
Besides, consider the overhead if the set is near-full, and desired n/MaxN is for example 0.95 ... At the end of filling set we'd end to 20 calls to Random (MaxN) for each next bit.
That's why I suggested to use it only for n/MaxN <= 0.5. To get bigger n, generate A for p = 1-n/MaxN < 0.5, and take [0..MaxN-1]-A.
IMHO, the
for i:= 0 to MaxN-1 do if Random (MaxN) <= n then Include (A, i);
(I think Frank suggested it) is better, even though it's perhaps slower. It has guaranteed O(n), at least, and it's very easy to parametrize set with p=n/MaxN probability of each bit being set.
I'm afraid we are confusing two distinct problems. In one mail you asked for a method to generate sets of size n, and that's what the "repeat r:=Random(MaxN) until..." stuff does. This "if Random(MaxN) < n ..." generates a different distribution, namely it gives MaxN independent bits, each being set with probability p (in other words, P(A=X) = p^j * (1-p)^{MaxN-j} for any set X of size j, j=0,..,MaxN). I agree that this distribution is more natural. It just doesn't make much sense to compare those two algorithms, because they serve different tasks. Choose what you want to get, and then pick an appropriate algorithm.
Emil