Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set. Having them initialized correctly is inefficient, and relying on internal implementation that could change without prior notice can backfire - hence IMHO it would be great to have a function like e.g.
RandomSet{,Uniform}(LBound, UBound)
Please consider this, IMHO right now it could be used for testing purposes, yet probably other uses could be seen ...
All the best, Mirsad
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set.
Just an idea. GMP provides the following function:
- Function: void mpz_rrandomb (mpz_t ROP, gmp_randstate_t STATE, unsigned long int N) Generate a random integer with long strings of zeros and ones in the binary representation. Useful for testing functions and algorithms, since this kind of random numbers have proven to be more likely to trigger corner-case bugs. The random number will be in the range 0 to 2^N-1, inclusive.
A similar distribution might be useful for testing of GPC sets as well
Emil
Having them initialized correctly is inefficient, and relying on internal implementation that could change without prior notice can backfire - hence IMHO it would be great to have a function like e.g.
RandomSet{,Uniform}(LBound, UBound)
Please consider this, IMHO right now it could be used for testing purposes, yet probably other uses could be seen ...
All the best, Mirsad
-- "I have a dream!" -- Martin Luther King Jr.
"You can't hold someone down ... without staying down with him." -- Booker T. Washington
Content-Description: New test
Mirsad Todorovac wrote:
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set. Having them initialized correctly is inefficient,
You mean by calling Random (2) for each element? But you could, e.g. get one `Cardinal attribute (Size = 32)' with one Random call and set 32 elements from it. (That's the type internally used in the PRNG, so getting 64 bits at once probably doesn't gain anything.)
Emil Jerabek wrote:
Just an idea. GMP provides the following function:
- Function: void mpz_rrandomb (mpz_t ROP, gmp_randstate_t STATE, unsigned long int N) Generate a random integer with long strings of zeros and ones in the binary representation. Useful for testing functions and algorithms, since this kind of random numbers have proven to be more likely to trigger corner-case bugs. The random number will be in the range 0 to 2^N-1, inclusive.
A similar distribution might be useful for testing of GPC sets as well
This should be possible by using this procedure via the GMP unit and setting the set elements from it -- or by copying the algorithm used in GMP to work directly on the set (I don't know this algorithm, but you can find it in the GMP sources, of course).
Frank
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set.
Now I looked into the implementation, and it does something quite different:
for n := 1 to MaxN do begin A := []; for j := 0 to n do Include (A, Random(MaxN));
This makes the probability of a bit being set 1/e=0.3678..., not 0.5 (moreover the bits are far from being independent).
Also, the diagnostics on failure won't work:
if Failed then begin WriteLn ('failed (', RandomSeed, ') ', 'After Include (setA, ', j1, ') ', j1, ' NOT IN!!!'); WriteLn ('dumping set [0..', MaxN,']:'); for j := 0 to MaxN div SetWordSize do WriteLn (j); Halt end
This just prints the same sequence of numbers every time, something like `Bar[j]' is missing here, I guess.
Emil
On Thu, 15 May 2003, Emil Jerabek wrote:
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set.
Now I looked into the implementation, and it does something quite different:
for n := 1 to MaxN do begin A := []; for j := 0 to n do Include (A, Random(MaxN));
This makes the probability of a bit being set 1/e=0.3678..., not 0.5 (moreover the bits are far from being independent).
Thank you, Emil, for pointing out this -- the probability is not 0.5, except perhaps in case where n = MaxN/2. It's approximatelly p = n/MaxN.
Bits are as independent as results from Random(MaxN) are, IMHO: if Random(MaxN) gives always a result in range 0..MaxN-1, the probability of each bit being set should be slightly lower than n/MaxN.
Lower because a bit can be set twice, or more times, due to the randomized nature of the test, which in result gives less than n bits set out of total MaxN.
Other than this case (probability for overlap rises with set density), the distribution of set elements if expected to be rather uniform one.
Also, the diagnostics on failure won't work:
if Failed then begin WriteLn ('failed (', RandomSeed, ') ', 'After Include (setA, ', j1, ') ', j1, ' NOT IN!!!'); WriteLn ('dumping set [0..', MaxN,']:'); for j := 0 to MaxN div SetWordSize do WriteLn (j); Halt end
This just prints the same sequence of numbers every time, something like `Bar[j]' is missing here, I guess.
You're absolutely right, Emil -- Bar[j] is exactly what was supposed to be here.
Considering use of Bar[j], here is a method of getting a fair 0.5 probability of each bit being set:
m := MaxN div SetWordSize - 1; for j := 0 to m do Bar[j] := Random (High (MedCard));
However, this also relies on Random () providing pseudo-random numbers of good quality, with 0.5 probability of each bit being set. True, it's also much faster than initializing bit by bit via Include (SetA, Random(MaxN)).
We could read from some real random source if we need truly random numbers, such as required in security, but I think Random() function suffices here, wouldn't you agree?
-----
1. The problem is, however, how to efficiently generate a uniform RandomSet() with p <> 0.5 with Bar[j in 0..m] := Random(High(MedCard))!
-- we need here a completely new type of pseudo-random function here!
2. Another problem is with Include (SetA, Random(MaxN)), i in 1..n method that the more dense SetA is, the more set bits Include()-ed will overlap.
The probability that we bump into already set bit is i/MaxN, which can grow near to 1.0, which means, as you see, that total probability may still be uniform, but it will drop bellow desired p = n / maxN
Please help if you have a suggestion how to generate these (uniform random sets with p (i in SetA) = const), please come forth.
Mirsad
P.S. thank you for the set dump fix.
Mirsad Todorovac wrote:
Considering use of Bar[j], here is a method of getting a fair 0.5 probability of each bit being set:
m := MaxN div SetWordSize - 1; for j := 0 to m do Bar[j] := Random (High (MedCard));
However, this also relies on Random () providing pseudo-random numbers of good quality, with 0.5 probability of each bit being set. True, it's also much faster than initializing bit by bit via Include (SetA, Random(MaxN)).
As I said, you can achieve this also without relying on the internal representation.
Loop over the set range, and for each element, if Random (2) = 0 (e.g.), include it. This can be adapted easily for other probabilities.
For efficiency, you can get a 32 bit random number and use it to set 32 elements. At least for p = 0.5 (for other probabilities, it can perhaps be adjusted, e.g., using 2 bits per element for p = 0.25 or p = 0.75).
We could read from some real random source if we need truly random numbers, such as required in security, but I think Random() function suffices here, wouldn't you agree?
Surely. You need real random for cryptography etc., not for these things. The additional complication (since not every system has a readily available source of real random, to start with) is not worth it.
Frank
On Thu, May 15, 2003 at 03:56:42PM +0200, Mirsad Todorovac wrote:
On Thu, 15 May 2003, Emil Jerabek wrote:
On Tue, May 13, 2003 at 12:47:23PM +0200, Mirsad Todorovac wrote:
Hi, Frank, others!
Here is the first test I wrote using GPC manual's instructions for writting randomized tests.
It concerns sets, since they have been a place where bugs appeared in the past - even though Frank's implementation seems invariant to number of bits ``MedCard'' type could have sometimes in the future.
BTW, there seems to be a need for something called a "random set" of data, pehaps even with given ``Monte Carlo'' function, to get a different distibution than uniform one.
The most plain random set could be the one with its MedCards initialized with random numbers with probably 0.5 probability for each bit being set.
Now I looked into the implementation, and it does something quite different:
for n := 1 to MaxN do begin A := []; for j := 0 to n do Include (A, Random(MaxN));
This makes the probability of a bit being set 1/e=0.3678..., not 0.5 (moreover the bits are far from being independent).
Thank you, Emil, for pointing out this -- the probability is not 0.5, except perhaps in case where n = MaxN/2. It's approximatelly p = n/MaxN.
The point here is that it is _less_ than n/MaxN (except when n is much smaller than MaxN). For example, for n=MaxN/2, the probability is 1 - ((MaxN-1)/MaxN)^n = 1 - (1 - 1/MaxN)^{MaxN/2}, which is approximatelly 1 - Sqrt(1/e) = 0.393... (The number 1/e I gave in te first mail is just the average of the probabilities over all n=1..MaxN.) I.e.: the fact that the same number may be set more than once has a _significant_ effect on the resulting distribution.
Bits are as independent as results from Random(MaxN) are, IMHO: if Random(MaxN) gives always a result in range 0..MaxN-1, the probability of each bit being set should be slightly lower than n/MaxN.
This doesn't have much to do with independence. The bits are not even pairwise independent: the probability of one bit being set is p=1-(1-1/MaxN)^n, and the probability of two (distinct) bits being set simultaneously is 1-2(1-1/MaxN)^n+(1-2/MaxN)^n, which is different from p^2. This difference is small, but it gets bigger if you take more than 2 bits into account.
Lower because a bit can be set twice, or more times, due to the randomized nature of the test, which in result gives less than n bits set out of total MaxN.
Other than this case (probability for overlap rises with set density), the distribution of set elements if expected to be rather uniform one.
Also, the diagnostics on failure won't work:
if Failed then begin WriteLn ('failed (', RandomSeed, ') ', 'After Include (setA, ', j1, ') ', j1, ' NOT IN!!!'); WriteLn ('dumping set [0..', MaxN,']:'); for j := 0 to MaxN div SetWordSize do WriteLn (j); Halt end
This just prints the same sequence of numbers every time, something like `Bar[j]' is missing here, I guess.
You're absolutely right, Emil -- Bar[j] is exactly what was supposed to be here.
Considering use of Bar[j], here is a method of getting a fair 0.5 probability of each bit being set:
m := MaxN div SetWordSize - 1; for j := 0 to m do Bar[j] := Random (High (MedCard));
Yes, this is a good way to get the uniform distribution.
However, this also relies on Random () providing pseudo-random numbers of good quality, with 0.5 probability of each bit being set. True, it's also much faster than initializing bit by bit via Include (SetA, Random(MaxN)).
We could read from some real random source if we need truly random numbers, such as required in security, but I think Random() function suffices here, wouldn't you agree?
Yes, GPC's PRNG is quite good, and we are not talking about any mission-critical cryptological application.
The problem is, however, how to efficiently generate a uniform RandomSet() with p <> 0.5 with Bar[j in 0..m] := Random(High(MedCard))!
-- we need here a completely new type of pseudo-random function here!
What do you mean by uniform? By definition, there is only one uniform distribution once you fix the universe. It gives p=0.5.
So I assume you meant the binomial distribution (i.e., choose MaxN independent bits, each with probability p).
If p is an integral multiple of a power of 2, say p=k/2^m, you may generate one bit from m Random() bits by
for j := 0 to MaxN-1 do if Random(1 shl m) < k then Include(A, j);
With some extra shuffling, 32 bits per Random() call may be used, which will make it reasonably fast. For example, if m divides 32:
q1 := (1 shl m) - 1; q2 := (BitSizeOf (MedCard) div m) - 1; for j := 0 to MaxN-1 do begin if (j and q2) = 0 then rnd := Random (High (MedCard)); if (rnd and q1) < k then Include (A, j); Shr (rnd, m) end;
For general (real) p, this works as well, but is less efficient:
for j := 0 to MaxN-1 do if Random < p then Include (A, j);
Another problem is with Include (SetA, Random(MaxN)), i in 1..n method that the more dense SetA is, the more set bits Include()-ed will overlap.
The probability that we bump into already set bit is i/MaxN, which can grow near to 1.0, which means, as you see, that total probability may still be uniform, but it will drop bellow desired p = n / maxN
Still not sure what you mean by uniform here. Anyway, it is possible to fix the `Include (SetA, Random (MaxN))' method so that it always produces a set of size n.
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
For n bigger:
A := []; for j := 0 to n-1 do begin r := Random (MaxN - j); { find the r-th element of A's complement: } k := 0; repeat if not (k in A) then Dec (r); if r < 0 then break; Inc (k) until False; Include (A, k) end;
For n > MaxN/2, generate a set of size MaxN-n, and take its complement.
Emil
Emil Jerabek wrote:
Still not sure what you mean by uniform here. Anyway, it is possible to fix the `Include (SetA, Random (MaxN))' method so that it always produces a set of size n.
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
For n bigger:
A := []; for j := 0 to n-1 do begin r := Random (MaxN - j); { find the r-th element of A's complement: } k := 0; repeat if not (k in A) then Dec (r); if r < 0 then break; Inc (k) until False; Include (A, k) end;
For n > MaxN/2, generate a set of size MaxN-n, and take its complement.
That's O(n^2). O(n) is possible by using the "card deck shuffling" algorithm. (If you really want exactly n elements -- I think it's sufficient and easier to treat the elements independently.)
Frank
On Fri, May 16, 2003 at 07:14:46PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
Still not sure what you mean by uniform here. Anyway, it is possible to fix the `Include (SetA, Random (MaxN))' method so that it always produces a set of size n.
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
For n bigger:
A := []; for j := 0 to n-1 do begin r := Random (MaxN - j); { find the r-th element of A's complement: } k := 0; repeat if not (k in A) then Dec (r); if r < 0 then break; Inc (k) until False; Include (A, k) end;
For n > MaxN/2, generate a set of size MaxN-n, and take its complement.
That's O(n^2). O(n) is possible by using the "card deck shuffling" algorithm. (If you really want exactly n elements -- I think it's sufficient and easier to treat the elements independently.)
I agree. Also, on a second look, the first algorithm ("for n small") gives O(n) as well, even for n as big as MaxN/2 (the expected number of calls to Random is 1.386 n).
I wrote:
If p is an integral multiple of a power of 2, say p=k/2^m, you may generate one bit from m Random() bits by
for j := 0 to MaxN-1 do if Random(1 shl m) < k then Include(A, j);
With some extra shuffling, 32 bits per Random() call may be used, which will make it reasonably fast. For example, if m divides 32:
q1 := (1 shl m) - 1; q2 := (BitSizeOf (MedCard) div m) - 1; for j := 0 to MaxN-1 do begin if (j and q2) = 0 then rnd := Random (High (MedCard)); if (rnd and q1) < k then Include (A, j); Shr (rnd, m) end;
The loops should go from 0 to MaxN, not MaxN - 1. Random (High (MedCard)) is also off by one, both here and in Mirsad's original example. Random (High (MedCard) + 1) is likely to fail miserably on 64-bit machines with MedCard=LongestCard, but I'll continue to use it here for simplicity.
Here's another way to get independent bits, each with probability p=k/2^m, with smaller overhead.
{ assume 0 < k < 2^m } while not Odd (k) do begin Shr (k, 1); Dec (m) end; A := []; for i := 1 to m do begin if Odd (k) then for j := 0 to (MaxN + 1) div SetWordSize - 1 do Or (Bar[j], Random (High (MedCard) + 1)) else for j := 0 to (MaxN + 1) div SetWordSize - 1 do And (Bar[j], Random (High (MedCard) + 1)); Shr (k, 1) end;
Emil
Emil Jerabek wrote:
On Fri, May 16, 2003 at 07:14:46PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
I agree. Also, on a second look, the first algorithm ("for n small") gives O(n) as well, even for n as big as MaxN/2 (the expected number of calls to Random is 1.386 n).
Yes, it's "randomized O(n)".
Here's another way to get independent bits, each with probability p=k/2^m, with smaller overhead.
But this one relies on the internal representation again. My point was to get rid of it.
Frank
On Tue, 20 May 2003, Frank Heckenbach wrote:
Emil Jerabek wrote:
On Fri, May 16, 2003 at 07:14:46PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
I agree. Also, on a second look, the first algorithm ("for n small") gives O(n) as well, even for n as big as MaxN/2 (the expected number of calls to Random is 1.386 n).
Yet, Frank, Emil, I'm uncomfortable with this repeat r := Random (MaxN) until not (r in A) ...
I get a feeling that this cripples linearity of generated set, and I suspect p(n in A), n in [0..MaxN-1] is no longer constant and equal to n/MaxN ...
Besides, consider the overhead if the set is near-full, and desired n/MaxN is for example 0.95 ... At the end of filling set we'd end to 20 calls to Random (MaxN) for each next bit.
IMHO, the
for i:= 0 to MaxN-1 do if Random (MaxN) <= n then Include (A, i);
(I think Frank suggested it) is better, even though it's perhaps slower. It has guaranteed O(n), at least, and it's very easy to parametrize set with p=n/MaxN probability of each bit being set.
We can also have Monte-Carlo-ized
for i:= 0 to MaxN-1 do if Random (MaxN)/MaxN <= F(i) then Include (A, i);
To get a non-uniform p distribution.
Frank wrote:
Here's another way to get independent bits, each with probability p=k/2^m, with smaller overhead.
But this one relies on the internal representation again. My point was to get rid of it.
Why I still want to have something "internal" generating random sets of numbers is that I'd like a common user that might lack your knowledge on the matter to be able to use them, for example, for statistical purposes, or as a source of random data for program testing ...
I think they might deserve a function or two in RTS, or at least function or two in a unit.
Mirsad
On Tue, May 20, 2003 at 12:40:51PM +0200, Mirsad Todorovac wrote:
On Tue, 20 May 2003, Frank Heckenbach wrote:
Emil Jerabek wrote:
On Fri, May 16, 2003 at 07:14:46PM +0200, Frank Heckenbach wrote:
Emil Jerabek wrote:
For n small:
A := []; for j := 1 to n do begin repeat r := Random(MaxN) until not (r in A); Include (A, r) end;
I agree. Also, on a second look, the first algorithm ("for n small") gives O(n) as well, even for n as big as MaxN/2 (the expected number of calls to Random is 1.386 n).
Yet, Frank, Emil, I'm uncomfortable with this repeat r := Random (MaxN) until not (r in A) ...
I get a feeling that this cripples linearity of generated set, and I suspect p(n in A), n in [0..MaxN-1] is no longer constant and equal to n/MaxN ...
It is equal to n/MaxN, and in fact, P(A=X) = 1/\binom{MaxN,n} for any set X with n elements. (Proof: the distribution is obviously invariant under permutations.)
Besides, consider the overhead if the set is near-full, and desired n/MaxN is for example 0.95 ... At the end of filling set we'd end to 20 calls to Random (MaxN) for each next bit.
That's why I suggested to use it only for n/MaxN <= 0.5. To get bigger n, generate A for p = 1-n/MaxN < 0.5, and take [0..MaxN-1]-A.
IMHO, the
for i:= 0 to MaxN-1 do if Random (MaxN) <= n then Include (A, i);
(I think Frank suggested it) is better, even though it's perhaps slower. It has guaranteed O(n), at least, and it's very easy to parametrize set with p=n/MaxN probability of each bit being set.
I'm afraid we are confusing two distinct problems. In one mail you asked for a method to generate sets of size n, and that's what the "repeat r:=Random(MaxN) until..." stuff does. This "if Random(MaxN) < n ..." generates a different distribution, namely it gives MaxN independent bits, each being set with probability p (in other words, P(A=X) = p^j * (1-p)^{MaxN-j} for any set X of size j, j=0,..,MaxN). I agree that this distribution is more natural. It just doesn't make much sense to compare those two algorithms, because they serve different tasks. Choose what you want to get, and then pick an appropriate algorithm.
Emil
Mirsad Todorovac wrote:
Here's another way to get independent bits, each with probability p=k/2^m, with smaller overhead.
But this one relies on the internal representation again. My point was to get rid of it.
Why I still want to have something "internal" generating random sets of numbers is that I'd like a common user that might lack your knowledge on the matter to be able to use them, for example, for statistical purposes, or as a source of random data for program testing ...
I think they might deserve a function or two in RTS, or at least function or two in a unit.
Feel free to write one. (I suggest a non-RTS unit where you can also put various other stuff related to sets or random that you find useful. Since we have GPCUtils, StringUtils, FileUtils units, a SetUtils or RandomUtils unit would fit nicely.)
Emil Jerabek wrote:
IMHO, the
for i:= 0 to MaxN-1 do if Random (MaxN) <= n then Include (A, i);
(I think Frank suggested it) is better, even though it's perhaps slower. It has guaranteed O(n), at least, and it's very easy to parametrize set with p=n/MaxN probability of each bit being set.
I'm afraid we are confusing two distinct problems. In one mail you asked for a method to generate sets of size n, and that's what the "repeat r:=Random(MaxN) until..." stuff does. This "if Random(MaxN) < n ..." generates a different distribution, namely it gives MaxN independent bits, each being set with probability p (in other words, P(A=X) = p^j * (1-p)^{MaxN-j} for any set X of size j, j=0,..,MaxN). I agree that this distribution is more natural. It just doesn't make much sense to compare those two algorithms, because they serve different tasks. Choose what you want to get, and then pick an appropriate algorithm.
I think so far we have the following useful algorithms:
- independent bits, arbitrary p: as above (for p = n / MaxN) or `Random < p' (for really arbitrary p)
- independent bits, p = 0.5, faster: get 32 bit random word, set 32 elements from it (variants for p = 0.25, 0.75, ...)
- exactly n bits set: Emil's suggestion ("randomized O(n)") or card deck shuffle algorithm (non-randomized O(n), a little more work to implement).
Frank