On Thu, May 15, 2003 at 03:56:42PM +0200, Mirsad Todorovac wrote:
On Thu, 15 May 2003, Emil Jerabek wrote:
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set.
Now I looked into the implementation, and it does something quite different:
for n := 1 to MaxN do begin A := []; for j := 0 to n do Include (A, Random(MaxN));
This makes the probability of a bit being set 1/e=0.3678..., not 0.5 (moreover the bits are far from being independent).
Thank you, Emil, for pointing out this -- the probability is not 0.5, except perhaps in case where n = MaxN/2. It's approximatelly p = n/MaxN.
The point here is that it is _less_ than n/MaxN (except when n is much smaller than MaxN). For example, for n=MaxN/2, the probability is 1 - ((MaxN-1)/MaxN)^n = 1 - (1 - 1/MaxN)^{MaxN/2}, which is approximatelly 1 - Sqrt(1/e) = 0.393... (The number 1/e I gave in te first mail is just the average of the probabilities over all n=1..MaxN.) I.e.: the fact that the same number may be set more than once has a _significant_ effect on the resulting distribution.
Bits are as independent as results from Random(MaxN) are, IMHO: if Random(MaxN) gives always a result in range 0..MaxN-1, the probability of each bit being set should be slightly lower than n/MaxN.
This doesn't have much to do with independence. The bits are not even pairwise independent: the probability of one bit being set is p=1-(1-1/MaxN)^n, and the probability of two (distinct) bits being set simultaneously is 1-2(1-1/MaxN)^n+(1-2/MaxN)^n, which is different from p^2. This difference is small, but it gets bigger if you take more than 2 bits into account.
Lower because a bit can be set twice, or more times, due to the randomized nature of the test, which in result gives less than n bits set out of total MaxN.
Other than this case (probability for overlap rises with set density), the distribution of set elements if expected to be rather uniform one.
Also, the diagnostics on failure won't work:
if Failed then begin WriteLn ('failed (', RandomSeed, ') ', 'After Include (setA, ', j1, ') ', j1, ' NOT IN!!!'); WriteLn ('dumping set [0..', MaxN,']:'); for j := 0 to MaxN div SetWordSize do WriteLn (j); Halt end
This just prints the same sequence of numbers every time, something like `Bar[j]' is missing here, I guess.
You're absolutely right, Emil -- Bar[j] is exactly what was supposed to be here.
Considering use of Bar[j], here is a method of getting a fair 0.5 probability of each bit being set:
m := MaxN div SetWordSize - 1; for j := 0 to m do Bar[j] := Random (High (MedCard));
Yes, this is a good way to get the uniform distribution.
However, this also relies on Random () providing pseudo-random numbers of good quality, with 0.5 probability of each bit being set. True, it's also much faster than initializing bit by bit via Include (SetA, Random(MaxN)).
We could read from some real random source if we need truly random numbers, such as required in security, but I think Random() function suffices here, wouldn't you agree?
Yes, GPC's PRNG is quite good, and we are not talking about any mission-critical cryptological application.
The problem is, however, how to efficiently generate a uniform RandomSet() with p <> 0.5 with Bar[j in 0..m] := Random(High(MedCard))!
-- we need here a completely new type of pseudo-random function here!
What do you mean by uniform? By definition, there is only one uniform distribution once you fix the universe. It gives p=0.5.
So I assume you meant the binomial distribution (i.e., choose MaxN independent bits, each with probability p).
If p is an integral multiple of a power of 2, say p=k/2^m, you may generate one bit from m Random() bits by
for j := 0 to MaxN-1 do if Random(1 shl m) < k then Include(A, j);
With some extra shuffling, 32 bits per Random() call may be used, which will make it reasonably fast. For example, if m divides 32:
q1 := (1 shl m) - 1; q2 := (BitSizeOf (MedCard) div m) - 1; for j := 0 to MaxN-1 do begin if (j and q2) = 0 then rnd := Random (High (MedCard)); if (rnd and q1) < k then Include (A, j); Shr (rnd, m) end;
For general (real) p, this works as well, but is less efficient:
for j := 0 to MaxN-1 do if Random < p then Include (A, j);
Another problem is with Include (SetA, Random(MaxN)), i in 1..n method that the more dense SetA is, the more set bits Include()-ed will overlap.
The probability that we bump into already set bit is i/MaxN, which can grow near to 1.0, which means, as you see, that total probability may still be uniform, but it will drop bellow desired p = n / maxN
Still not sure what you mean by uniform here. Anyway, it is possible to fix the `Include (SetA, Random (MaxN))' method so that it always produces a set of size n.
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
For n bigger:
A := []; for j := 0 to n-1 do begin r := Random (MaxN - j); { find the r-th element of A's complement: } k := 0; repeat if not (k in A) then Dec (r); if r < 0 then break; Inc (k) until False; Include (A, k) end;
For n > MaxN/2, generate a set of size MaxN-n, and take its complement.
Emil