Some time ago, Adriaan van Os wrote:
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
- is there a difference between $FF.. and $0FF.. ?
- is $FF.. signed or not ?
Any comments ?
<snip>
Yes -- so the answers to the questions above are clearly `$FF = $0FF' (BP agrees), and `$FF.. > 0' (it can be stored in a signed or unsigned type, of course, if range permits, but it should never represent a negative value).
Here are the results for MetroWerks CodeWarrior Pascal (where Integer is 16-bits, longint 32-bits and an 64-bits integer type is missing).
Const {positive unless marked negative} k1= $F; k2= $FF; k3= $FFF; k4= $FFFF; {negative} k5= $FFFFF; k6= $FFFFFF; k7= $FFFFFFF; k8= $FFFFFFFF; {negative} h1= $0F; h2= $0FF; h3= $0FFF; h4= $0FFFF; h5= $0FFFFF; h6= $0FFFFFF; h7= $0FFFFFFF; h8= $0FFFFFFFF; {negative}
I think, the motivation is, if "0" defaults to an integer type, why shouldn't $FFFF or $FFFFFFFF ?
Perhaps I'm too much a mathematician to ever understand such motivations. There's nothing magical about hexadecimal numbers. They're just written in another base than decimal numbers (i.e., with a multiplier of 16 instead of 10, and with 6 more digits which happen to be written as letters). Hexadecimal FFFF is no more a negative number than decimal 65535 or 99999. If the negative number represented in a 16 bit signed type with the same bit pattern as FFFF is meant, that's simply -1, or in hexadecimal -$1, or -$0001. And a leading 0 doesn't change the value of a number. That's a well-known fact for decimal numbers (except in C, of course, where a leading "0" makes a number octal, shudder), so why should be different in another base?
I suppose all this confusion comes from the fact that hexadecimal numbers were originally used in languages such as C and assembler where numbers are more like bit patterns (in various respects), not like numbers in a mathematical sense. And somehow it was later assumed, apparently, that this confusion had any inherent connection to the hexadecimal notation, rather than the environment where they were initially used.
Personally I believe that, whatever the behaviour, it should be well documented, as this kind of thing could lead to very surprising bugs.
My suggestion would be:
- Numbers are interpreted in the mathematical sense (i.e., positive, as long as they don't contain a `-' sign etc.).
- `not', `and', `or' and `xor' are conceptually defined as functions on the integer numbers (defined such that they correspond to the bit operations given a sufficiently large representation), see appendix.
- The actual implementation can use any type size as long as the result matches those conceptual definitions (as is the usual practice in Pascal -- define the result, not the implementation).
This would imply that Adriaan's workaround `word value not word (0)' would not work anymore. Of course there are other ways to get this value, depending on what's the real intention behind it, such as `High (Word)' (if the maximum value of a variable is meant, this is the recommended form, anyway, since it's independent of any representation issues) or just writing out the number, possibly in hex, if the actual numeric value is meant.
Yes, the "not 0" expression came from (mysql 4.1.5) C headers ......
Well, C is different from Pascal.
Given the definitions below, `not 0' is simply -1 which corresponds to the C meaning on signed types.
On unsigned types, we do not have anything equivalent, because this value is dependent on the type size (e.g. `not 0' is 255 or 65535 for an unsigned 8 or 16 bit type, respectively), so it doesn't match well with Pascal's value-centered integers.
An alternative would be to define a "bit-pattern" type distinct from the integers, but I think the implementation difficulties as well as practical disadvantages and incompatibilities with most other compilers that support "bit" functions don't really make this an attractive choice.
Appendix: Mathematical definition of "bit" functions
Note: These definitions are probably not written in an optimal way. Some parts are just written this way for brevity of notation.
They're not meant to be implemented like this, usually; they can just provide a definition to compare an implementation against.
But at least they should be well-defined (WRT termination of recursion and independence of the choice of N), if I didn't overlook anything. The proof it is left as an exercise to the reader. ;-)
Also note that these definitions are independent of the representation of numbers, because they're ultimatelty defined on `+' and `-' and integer comparisons only, so any integer implementation with sufficient range can in principle implement them. However, they express the effects that bitwise functions have when implemented on binary numbers with two's complement (which shows, e.g., in the use of powers of 2 and the defition of `not').
not a := -1 - a
a xor b := (a or b) and not (a and b)
Where N shall be a power of 2 larger than the absolute values of a and b:
a or b := not (not a and not b) if a, b < 0 (a - N) or b if a >= 0 > b b or a if b >= 0 > a ((a - N) or (b - N)) + N if a, b >= 0
a and b := not (not a or not b) if a, b < 0 a and (b + N) if a >= 0 > b b and a if b >= 0 > a 0 if a = 0 or b = 0
For a, b > 0, where M shall be the largest power of 2 not larger than both a and b:
a and b := M + ((a - M) and (b - M)) if a, b >= M (a - M) and b if a >= M > b a and (b - M) if b >= M > a
Frank