Re: New (randomized) test ...

16 May 2003

      On Thu, May 15, 2003 at 03:56:42PM +0200, Mirsad Todorovac wrote:
...
On Thu, 15 May 2003, Emil Jerabek wrote:
...
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
...
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for
writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the
past - even though Frank's implementation seems invariant to number of
bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data,
pehaps even with given ``Monte Carlo'' function, to get a different
distibution than uniform one.
The most plain random set could be the one with its MedCards initialized
with random numbers with probably 0.5 probability for each bit being set.
Now I looked into the implementation, and it does something quite different:
  for n := 1 to MaxN do
    begin
 A := [];
 for j := 0 to n do
   Include (A, Random(MaxN));

This makes the probability of a bit being set 1/e=0.3678..., not 0.5
(moreover the bits are far from being independent).
Thank you, Emil, for pointing out this -- the probability is not 0.5,
except perhaps in case where n = MaxN/2. It's approximatelly p = n/MaxN.
The point here is that it is _less_ than n/MaxN (except when n is much
smaller than MaxN). For example, for n=MaxN/2, the probability is
1 - ((MaxN-1)/MaxN)^n = 1 - (1 - 1/MaxN)^{MaxN/2}, which is approximatelly
1 - Sqrt(1/e) = 0.393... (The number 1/e I gave in te first mail is just the
average of the probabilities over all n=1..MaxN.) I.e.: the fact that
the same number may be set more than once has a _significant_ effect
on the resulting distribution.
...
Bits are as independent as results from Random(MaxN) are, IMHO:
if Random(MaxN) gives always a result in range 0..MaxN-1, the probability
of each bit being set should be slightly lower than n/MaxN.
This doesn't have much to do with independence. The bits are not even
pairwise independent: the probability of one bit being set is
p=1-(1-1/MaxN)^n, and the probability of two (distinct) bits being
set simultaneously is 1-2(1-1/MaxN)^n+(1-2/MaxN)^n, which is different
from p^2. This difference is small, but it gets bigger if you take
more than 2 bits into account.
...
Lower because a bit can be set twice, or more times, due to the randomized
nature of the test, which in result gives less than n bits set out of
total MaxN.
Other than this case (probability for overlap rises with set density), the
distribution of set elements if expected to be rather uniform one.
...
Also, the diagnostics on failure won't work:
if Failed then
    begin
      WriteLn ('failed (', RandomSeed, ') ',
               'After Include (setA, ', j1, ') ', j1, ' NOT IN!!!');
      WriteLn ('dumping set [0..', MaxN,']:');
      for j := 0 to MaxN div SetWordSize do
        WriteLn (j);
      Halt
    end
This just prints the same sequence of numbers every time,
something like `Bar[j]' is missing here, I guess.
You're absolutely right, Emil -- Bar[j] is exactly what was supposed to be
here.
Considering use of Bar[j], here is a method of getting a fair 0.5
probability of each bit being set:
 m := MaxN div SetWordSize - 1;
 for j := 0 to m do
    Bar[j] := Random (High (MedCard));

Yes, this is a good way to get the uniform distribution.
...
However, this also relies on Random () providing pseudo-random numbers of
good quality, with 0.5 probability of each bit being set. True, it's also
much faster than initializing bit by bit via Include (SetA, Random(MaxN)).
We could read from some real random source if we need truly random
numbers, such as required in security, but I think Random() function
suffices here, wouldn't you agree?
Yes, GPC's PRNG is quite good, and we are not talking about
any mission-critical cryptological application.
...

The problem is, however, how to efficiently generate a uniform
RandomSet() with p <> 0.5 with Bar[j in 0..m] := Random(High(MedCard))!
-- we need here a completely new type of pseudo-random function
here!

What do you mean by uniform? By definition, there is only one uniform
distribution once you fix the universe. It gives p=0.5.
So I assume you meant the binomial distribution (i.e., choose MaxN
independent bits, each with probability p).
If p is an integral multiple of a power of 2, say p=k/2^m, you may
generate one bit from m Random() bits by
for j := 0 to MaxN-1 do if Random(1 shl m) < k then Include(A, j);
With some extra shuffling, 32 bits per Random() call may be used,
which will make it reasonably fast. For example, if m divides 32:
q1 := (1 shl m) - 1;
   q2 := (BitSizeOf (MedCard) div m) - 1;
   for j := 0 to MaxN-1 do begin
     if (j and q2) = 0 then rnd := Random (High (MedCard));
     if (rnd and q1) < k then Include (A, j);
     Shr (rnd, m)
   end;
For general (real) p, this works as well, but is less efficient:
for j := 0 to MaxN-1 do if Random < p then Include (A, j);
...

Another problem is with Include (SetA, Random(MaxN)), i in 1..n
method that the more dense SetA is, the more set bits Include()-ed
will overlap.
The probability that we bump into already set bit is i/MaxN,
which can grow near to 1.0, which means, as you see, that total
probability may still be uniform, but it will drop bellow desired
p = n / maxN

Still not sure what you mean by uniform here. Anyway, it is possible
to fix the `Include (SetA, Random (MaxN))' method so that it always
produces a set of size n.
For n small:
A := [];
  for j := 1 to n do begin
    repeat
      r := Random(MaxN)
    until not (r in A);
    Include (A, r)
  end;
For n bigger:
A := [];
  for j := 0 to n-1 do begin
    r := Random (MaxN - j);
    { find the r-th element of A's complement: }
    k := 0;
    repeat
      if not (k in A) then Dec (r);
      if r < 0 then break;
      Inc (k)
    until False;
    Include (A, k)
  end;
For n > MaxN/2, generate a set of size MaxN-n, and take its complement.
Emil

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: New (randomized) test ...