GPC accepts:
var i: integer value not 0;
but it doesn't accept:
var w: word value not 0;
not.pas:4: error: constant out of range not.pas:4: error: initial value is of wrong type
so, the workaround is:
var w: word value not word( 0);
At first sight, this looks a bit clumsy (but at second sight it may not). It gives rise to general questions about the type of constants in Pascal, e.g.
* is there a difference between $FF.. and $0FF.. ? * is $FF.. signed or not ?
Any comments ?
Regards,
Adriaan van Os
Adriaan van Os wrote:
GPC accepts:
var i: integer value not 0;
but it doesn't accept:
var w: word value not 0;
not.pas:4: error: constant out of range not.pas:4: error: initial value is of wrong type
so, the workaround is:
var w: word value not word( 0);
At first sight, this looks a bit clumsy (but at second sight it may not). It gives rise to general questions about the type of constants in Pascal, e.g.
- is there a difference between $FF.. and $0FF.. ?
- is $FF.. signed or not ?
Any comments ?
Really, we already had long discussions about integers (but can we access mailing list archive while http://www.gnu-pascal.de/ is down?)
In Pascal constants are just integers (without further type restrictions). One may think that constants are really mathematical integers (but unlike Ada, in Pascal compiler is allowed to restrict range of constants which appear in intermediate expressions).
GPC has many builtin types, some of then larger then 'integer'. Currently GPC treats constants that are small enough as 'integer' and performs all intermediate calculations with limited precision. That may produce (un)expected truncation/overflow. AFAICS `not 0' is the same as `-1': the bit pattern is truncated to integer size and treated as signed number.
The problem with `not 0' IMHO is that C type bit operations really do not fit Pascal rules. One can imagine various approaches to Pascal bit operations: -- have bitsting type and conversion functions from/to integers -- have a modular types (again + conversions) -- have three (or four?) argument bit operations, the third argument would give desired precision. Additionaly signed/unsigned result could be choosen (either using extra argumment or having differently named operations) -- compute using single largest type and explicitly truncate to smaller types
In GPC we are bound by backward (and BP) compatibility, so alternate approaches are less atractive. So your workaround is probably the best what can be done.
Waldek Hebisch wrote:
Adriaan van Os wrote:
GPC accepts:
var i: integer value not 0;
There is a use for this extension; it is not expressible in Pascal.
but it doesn't accept:
var w: word value not 0;
But not for this, as it is expressable as 1 .. maxint;
CBFalconer wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
GPC accepts:
var i: integer value not 0;
There is a use for this extension; it is not expressible in Pascal.
but it doesn't accept:
var w: word value not 0;
But not for this, as it is expressable as 1 .. maxint;
I think there's a misunderstanding here. `value' (Extended Pascal) means the initial value, not a range restriction. And `not' (in this context, from Borland Pascal) means bitwise not, rather than set complement or whatever you might have thought of.
Apart from that, it should be something like `MaxWord' isntead of `MaxInt' (GPC doesn't actually have `MaxWord', but the more general `High (TypeName)', BTW).
Or did you really mean MaxInt only? If so, this might explain some of our differences WRT unsigned types recently. The point of using unsigned types is usually not to disallow negative values (where a proper integer subrange such as `0 .. MaxInt' would do well), but to have a bigger positive range.
Frank
Frank Heckenbach wrote:
... snip ...
Apart from that, it should be something like `MaxWord' isntead of `MaxInt' (GPC doesn't actually have `MaxWord', but the more general `High (TypeName)', BTW).
Or did you really mean MaxInt only? If so, this might explain some of our differences WRT unsigned types recently. The point of using unsigned types is usually not to disallow negative values (where a proper integer subrange such as `0 .. MaxInt' would do well), but to have a bigger positive range.
IMO for most purposes having only maxint will do very well. There are the exceptions, which can be handled by maxword (if there is an unsigned type). The other things that are missing from the original Pascal are ord(maxchar), and maxset, together with the ability to convert an integral value into the corresponding enumerated value, i.e. the inverse of ord. chr only functions for chars. The char type could have been defined as a predefined enumeration, which would have had advantages and disadvantages. Minint is another sore point.
Unfortunately once extensions, whether good or bad, get into a language it is very hard, if not impossible, to weed them out. OTOH extensions implemented as procedures are relatively easy to manipulate.
End of daydreams, for now.
On Fri, Oct 15, 2004 at 02:52:33AM -0400, CBFalconer wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
GPC accepts:
var i: integer value not 0;
There is a use for this extension; it is not expressible in Pascal.
but it doesn't accept:
var w: word value not 0;
But not for this, as it is expressable as 1 .. maxint;
You seem to have misunderstood the construction. Extended Pascal "<type> value <constant>" denotes a type which is the same as <type>, except that variables of the new type are initialized to <constant>. Here, the constant expression is "not 0", which is a BP extension denoting the bitwise complement of 0. It has nothing to do with excluding 0 from the range of the type.
Emil
Waldek Hebisch wrote:
Adriaan van Os wrote:
GPC accepts:
var i: integer value not 0;
but it doesn't accept:
var w: word value not 0;
not.pas:4: error: constant out of range not.pas:4: error: initial value is of wrong type
so, the workaround is:
var w: word value not word( 0);
At first sight, this looks a bit clumsy (but at second sight it may not). It gives rise to general questions about the type of constants in Pascal, e.g.
- is there a difference between $FF.. and $0FF.. ?
- is $FF.. signed or not ?
Any comments ?
Really, we already had long discussions about integers (but can we access mailing list archive while http://www.gnu-pascal.de/ is down?)
Unfortunately not.
In Pascal constants are just integers (without further type restrictions). One may think that constants are really mathematical integers (but unlike Ada, in Pascal compiler is allowed to restrict range of constants which appear in intermediate expressions).
Yes -- so the answers to the questions above are clearly `$FF = $0FF' (BP agrees), and `$FF.. > 0' (it can be stored in a signed or unsigned type, of course, if range permits, but it should never represent a negative value).
The problem with `not 0' IMHO is that C type bit operations really do not fit Pascal rules.
Indeed. While `and' and `or' can be defined mathematically in an abstract sense, independent of type sizes (even for negative numbers, if you assume two's complement -- other systems could still emulate it to provide the same semantics), `not' is problematic. As long as it's used in the form `a and not b', which is a common case, it's unique, but in general it's not.
Interestingly, though, `not' is unique on signed types (again, if assuming two's completement), where `not a' is equivalent to `-1 - a'. So perhaps we should declare this our official definition of integer `not' (i.e., adjust GPC where it does not bevahve like this)? BTW, BP seems to agree as well, since `not Word (0)' gives -1 there.
This would imply that Adriaan's workaround `word value not word (0)' would not work anymore. Of course there are other ways to get this value, depending on what's the real intention behind it, such as `High (Word)' (if the maximum value of a variable is meant, this is the recommended form, anyway, since it's independent of any representation issues) or just writing out the number, possibly in hex, if the actual numeric value is meant.
Frank
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
- is there a difference between $FF.. and $0FF.. ?
- is $FF.. signed or not ?
Any comments ?
<snip>
Yes -- so the answers to the questions above are clearly `$FF = $0FF' (BP agrees), and `$FF.. > 0' (it can be stored in a signed or unsigned type, of course, if range permits, but it should never represent a negative value).
Here are the results for MetroWerks CodeWarrior Pascal (where Integer is 16-bits, longint 32-bits and an 64-bits integer type is missing).
Const {positive unless marked negative} k1= $F; k2= $FF; k3= $FFF; k4= $FFFF; {negative} k5= $FFFFF; k6= $FFFFFF; k7= $FFFFFFF; k8= $FFFFFFFF; {negative} h1= $0F; h2= $0FF; h3= $0FFF; h4= $0FFFF; h5= $0FFFFF; h6= $0FFFFFF; h7= $0FFFFFFF; h8= $0FFFFFFFF; {negative}
I think, the motivation is, if "0" defaults to an integer type, why shouldn't $FFFF or $FFFFFFFF ? Personally I believe that, whatever the behaviour, it should be well documented, as this kind of thing could lead to very surprising bugs.
This would imply that Adriaan's workaround `word value not word (0)' would not work anymore. Of course there are other ways to get this value, depending on what's the real intention behind it, such as `High (Word)' (if the maximum value of a variable is meant, this is the recommended form, anyway, since it's independent of any representation issues) or just writing out the number, possibly in hex, if the actual numeric value is meant.
Yes, the "not 0" expression came from (mysql 4.1.5) C headers ......
Regards,
Adriaan van Os
Some time ago, Adriaan van Os wrote:
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
- is there a difference between $FF.. and $0FF.. ?
- is $FF.. signed or not ?
Any comments ?
<snip>
Yes -- so the answers to the questions above are clearly `$FF = $0FF' (BP agrees), and `$FF.. > 0' (it can be stored in a signed or unsigned type, of course, if range permits, but it should never represent a negative value).
Here are the results for MetroWerks CodeWarrior Pascal (where Integer is 16-bits, longint 32-bits and an 64-bits integer type is missing).
Const {positive unless marked negative} k1= $F; k2= $FF; k3= $FFF; k4= $FFFF; {negative} k5= $FFFFF; k6= $FFFFFF; k7= $FFFFFFF; k8= $FFFFFFFF; {negative} h1= $0F; h2= $0FF; h3= $0FFF; h4= $0FFFF; h5= $0FFFFF; h6= $0FFFFFF; h7= $0FFFFFFF; h8= $0FFFFFFFF; {negative}
I think, the motivation is, if "0" defaults to an integer type, why shouldn't $FFFF or $FFFFFFFF ?
Perhaps I'm too much a mathematician to ever understand such motivations. There's nothing magical about hexadecimal numbers. They're just written in another base than decimal numbers (i.e., with a multiplier of 16 instead of 10, and with 6 more digits which happen to be written as letters). Hexadecimal FFFF is no more a negative number than decimal 65535 or 99999. If the negative number represented in a 16 bit signed type with the same bit pattern as FFFF is meant, that's simply -1, or in hexadecimal -$1, or -$0001. And a leading 0 doesn't change the value of a number. That's a well-known fact for decimal numbers (except in C, of course, where a leading "0" makes a number octal, shudder), so why should be different in another base?
I suppose all this confusion comes from the fact that hexadecimal numbers were originally used in languages such as C and assembler where numbers are more like bit patterns (in various respects), not like numbers in a mathematical sense. And somehow it was later assumed, apparently, that this confusion had any inherent connection to the hexadecimal notation, rather than the environment where they were initially used.
Personally I believe that, whatever the behaviour, it should be well documented, as this kind of thing could lead to very surprising bugs.
My suggestion would be:
- Numbers are interpreted in the mathematical sense (i.e., positive, as long as they don't contain a `-' sign etc.).
- `not', `and', `or' and `xor' are conceptually defined as functions on the integer numbers (defined such that they correspond to the bit operations given a sufficiently large representation), see appendix.
- The actual implementation can use any type size as long as the result matches those conceptual definitions (as is the usual practice in Pascal -- define the result, not the implementation).
This would imply that Adriaan's workaround `word value not word (0)' would not work anymore. Of course there are other ways to get this value, depending on what's the real intention behind it, such as `High (Word)' (if the maximum value of a variable is meant, this is the recommended form, anyway, since it's independent of any representation issues) or just writing out the number, possibly in hex, if the actual numeric value is meant.
Yes, the "not 0" expression came from (mysql 4.1.5) C headers ......
Well, C is different from Pascal.
Given the definitions below, `not 0' is simply -1 which corresponds to the C meaning on signed types.
On unsigned types, we do not have anything equivalent, because this value is dependent on the type size (e.g. `not 0' is 255 or 65535 for an unsigned 8 or 16 bit type, respectively), so it doesn't match well with Pascal's value-centered integers.
An alternative would be to define a "bit-pattern" type distinct from the integers, but I think the implementation difficulties as well as practical disadvantages and incompatibilities with most other compilers that support "bit" functions don't really make this an attractive choice.
Appendix: Mathematical definition of "bit" functions
Note: These definitions are probably not written in an optimal way. Some parts are just written this way for brevity of notation.
They're not meant to be implemented like this, usually; they can just provide a definition to compare an implementation against.
But at least they should be well-defined (WRT termination of recursion and independence of the choice of N), if I didn't overlook anything. The proof it is left as an exercise to the reader. ;-)
Also note that these definitions are independent of the representation of numbers, because they're ultimatelty defined on `+' and `-' and integer comparisons only, so any integer implementation with sufficient range can in principle implement them. However, they express the effects that bitwise functions have when implemented on binary numbers with two's complement (which shows, e.g., in the use of powers of 2 and the defition of `not').
not a := -1 - a
a xor b := (a or b) and not (a and b)
Where N shall be a power of 2 larger than the absolute values of a and b:
a or b := not (not a and not b) if a, b < 0 (a - N) or b if a >= 0 > b b or a if b >= 0 > a ((a - N) or (b - N)) + N if a, b >= 0
a and b := not (not a or not b) if a, b < 0 a and (b + N) if a >= 0 > b b and a if b >= 0 > a 0 if a = 0 or b = 0
For a, b > 0, where M shall be the largest power of 2 not larger than both a and b:
a and b := M + ((a - M) and (b - M)) if a, b >= M (a - M) and b if a >= M > b a and (b - M) if b >= M > a
Frank
Frank Heckenbach wrote:
Perhaps I'm too much a mathematician to ever understand such motivations.
<snip>
My suggestion would be:
Numbers are interpreted in the mathematical sense (i.e., positive, as long as they don't contain a `-' sign etc.).
`not', `and', `or' and `xor' are conceptually defined as functions on the integer numbers (defined such that they correspond to the bit operations given a sufficiently large representation), see appendix.
<snip>
Appendix: Mathematical definition of "bit" functions
And I am too much a programmer to understand mathematicians ...
Looking at the mathematical definitions some more, aren't we transforming back bitwise operations and a specific bitwise notational convention (two's complement) back into mathematics ? If so, isn't it true that:
- the mathematical definitions aren't universal mathematical definitions, independent of bit conventions - the bit-level definitions are simpler
In other words, aren't the mathematical definitions artificial, too complex and valid only for a specfic notational bit convention (two's complement) ?
Regards,
Adriaan van Os
Adriaan van Os wrote:
Frank Heckenbach wrote:
Perhaps I'm too much a mathematician to ever understand such motivations.
<snip>
My suggestion would be:
Numbers are interpreted in the mathematical sense (i.e., positive, as long as they don't contain a `-' sign etc.).
`not', `and', `or' and `xor' are conceptually defined as functions on the integer numbers (defined such that they correspond to the bit operations given a sufficiently large representation), see appendix.
<snip>
Appendix: Mathematical definition of "bit" functions
And I am too much a programmer to understand mathematicians ...
Looking at the mathematical definitions some more, aren't we transforming back bitwise operations and a specific bitwise notational convention (two's complement) back into mathematics ?
Yes.
If so, isn't it true that:
- the mathematical definitions aren't universal mathematical
definitions, independent of bit conventions
No, that's not true. They're defined as integer functions, using nothing but simple operations such as `+', `-' and comparisons. Indeed, you could perform those operations with decimal numbers if you like. It's a bit more complicated to compute, but in the end you'll get the same results.
- the bit-level definitions are simpler
Indeed, and a real implementation (on a machine using binary two's complement numbers, as virtually any machine today does) would usually use them. (Unless, perhaps, you want to provide those operations on a pure standard Pascal system, without escaping to the underlying hardware. If you don't care about efficiency, you could indeed implement them this way.)
As I tried to explain, the definitions are there so we compare the behaviour of an implementation against a well-defined result. E.g., if there are corner-cases or cases where different implementations behave differently, we can use it to decide which one to accept. If we achieve this (we yet have to check and possibly correct it in GPC), a programmer can use those operations and rest assured that they behave the same, whatever the underlying hardware does.
This principle is similar with `mod' semantics. With negative operands, different CPUs behave differently (there was a thread on this list about this issue long time ago, you might find it in the archive if you care). Yet ISO Pascal defines the semantics uniquely (and so does BP -- unfortunately in a different way), so any implementation can do what's necessary to map the CPU result to the desired result (which may be nothing if it matches or some extra code, such as GPC uses, since e.g. on i386, it doesn't match).
Frank
Frank Heckenbach wrote:
not a := -1 - a
a xor b := (a or b) and not (a and b)
Where N shall be a power of 2 larger than the absolute values of a and b:
a or b := not (not a and not b) if a, b < 0 (a - N) or b if a >= 0 > b b or a if b >= 0 > a ((a - N) or (b - N)) + N if a, b >= 0
a and b := not (not a or not b) if a, b < 0 a and (b + N) if a >= 0 > b b and a if b >= 0 > a 0 if a = 0 or b = 0
For a, b > 0, where M shall be the largest power of 2 not larger than both a and b:
a and b := M + ((a - M) and (b - M)) if a, b >= M (a - M) and b if a >= M > b a and (b - M) if b >= M > a
We also have shl and shr and I suggest adding rol and ror (rotate left and rotate right).
Regards,
Adriaan van Os
Adriaan van Os wrote:
... snip ...
We also have shl and shr and I suggest adding rol and ror (rotate left and rotate right).
If you are going to do that you should have a complete spectrum:
asl asr arithmetic shift. (asl = lsl but overflows) lsl lsr logical shift (unsigned) rol ror rotation. (length dependant)
The trouble is that many of these require a binary representation of integer operands, and Pascal in general does not require that. Also Pascal does not have the equivalent of a C unsigned type. How do you interpret them on a decimal machine, a 1's complement machine, a sign-magnitude machine? Also with differing values for maxint (which need not be a binary power of 2 in Pascal).
You can define the arithmetic shifts in terms of multiply/divide.
CBFalconer wrote:
Adriaan van Os wrote:
... snip ...
We also have shl and shr and I suggest adding rol and ror (rotate left and rotate right).
If you are going to do that you should have a complete spectrum:
asl asr arithmetic shift. (asl = lsl but overflows) lsl lsr logical shift (unsigned) rol ror rotation. (length dependant)
I think the difference between "arithmetic" and "logical" shifts is only whether they operate on signed and unsigned types. This distinction is needed on the assembler level (compiler output), but not on the laguage level I think.
The trouble is that many of these require a binary representation of integer operands, and Pascal in general does not require that.
So we should provide abstract definitions, just like I did for `and' etc.
Also Pascal does not have the equivalent of a C unsigned type.
We should define them on values, not on types. I.e., the result of a positive value shifted by n should be the same whether it's stored in a signed or unsigned type -- barring overflows, which would appear as runtime errors, just as can happen with any other operations.
How do you interpret them on a decimal machine, a 1's complement machine, a sign-magnitude machine?
Again, they'd have to be implemented in such a way as to produce the same results. This might require (internally) different cases for positive and negative numbers on such machines, but so be it. (Just like we have to do for `mod', e.g.)
Also with differing values for maxint (which need not be a binary power of 2 in Pascal).
Conceptually, compute the result in arbitrary precision, compare this against MaxInt and if greater, it's a runtime error. Actual implementations can usually do it easier, but in case of doubt, one can always check if one gets the same result as the conceptual definition.
You can define the arithmetic shifts in terms of multiply/divide.
Yes, with `pow' standing for integer powers as defined in EP (if you don't like that, you can insert the usual recursive definition of powers, of course), I think we can define:
a shl b := a * 2 pow b
Right shifts are a bit more tricky because they're like division rounding towards negative infinity unlike `div' which round towards 0. So we could write:
a shr b := a div 2 pow b if a >= 0 (a - 2 pow b + 1) div 2 pow b if a < 0
One potential problem with shl is that it can easily produce overflows. Currently it's not a practical problem, as GPC doesn't do overflow checks, but when it does, we'll have to consider this case. We could define `shl' to "ignore overflows", but this would again make the semantics type-size-dependent. Of course, a programmer will be able to turn off overflow checking locally with compiler directives, as a last resort. One could also mask the result using `and'. A good implementation can optimize away the `and' (as GPC seems to do with the following program on i386 with optimizations). Furthermore, even a completely different integer-representation would then yield the same results, if `and' is defined as I wrote before.
program Foo;
var a, b: Cardinal attribute (Size = 32);
begin a := (b shl 4) and 16#ffffffff end.
Rotates are problematic since they indeed depend on the type size. I see no way to define them independently. I've had a discussion with Mirsad Todorovad about these, and we came to the conclusion that it's probably more trouble than it helps. E.g., if a is a constant and we use the term `a ror 4', suddenly the type of the constant becomes very important (not just for whether there's an overflow, but it would change the result completely) which is contrary to Pascal's rules.
OTOH, rotates don't seem to be needed very often (at least an order of magnitude less than shifts); in fact the only place I could need them in my code is in MD5. So it's probably not worth spending too much effort here.
One way out of the dilemma would be to define specifically "sized" rotates, e.g. `a ror32 b' would mean the values of a rotated by b bits on a 32 bit type. This then could be defined implementation-independent again (though we might have to think about signed and unsigned types, possibly making two versions; I haven't thought this through ATM), and again any integer implementation could be used to implement this on.
Such semantics are indeed what's usually wanted -- e.g., MD5 needs a 32 bit rotation. If the type used to store the values for some reason happened to be 64 bit on any machine (though this shouldn't happen with GPC since we use `attribute Size'), we don't want 64 bit rotation as that would produce wrong results. Instead, we want to do 32 bit rotation on the "lower 32 bits" (and get a runtime error if any of the "higher 32 bits" are set, i.e. if the value is too large). A `rol32' operation would (conceptually) do just that.
But again, rotates aren't very important to me, so I won't spend my time implementing them either way; but I wouldn't object if someone else does. ;-)
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
Adriaan van Os wrote:
... snip ...
We also have shl and shr and I suggest adding rol and ror (rotate left and rotate right).
If you are going to do that you should have a complete spectrum:
asl asr arithmetic shift. (asl = lsl but overflows) lsl lsr logical shift (unsigned) rol ror rotation. (length dependant)
I think the difference between "arithmetic" and "logical" shifts is only whether they operate on signed and unsigned types. This distinction is needed on the assembler level (compiler output), but not on the laguage level I think.
The trouble is that many of these require a binary representation of integer operands, and Pascal in general does not require that.
So we should provide abstract definitions, just like I did for `and' etc.
Also Pascal does not have the equivalent of a C unsigned type.
We should define them on values, not on types. I.e., the result of a positive value shifted by n should be the same whether it's stored in a signed or unsigned type -- barring overflows, which would appear as runtime errors, just as can happen with any other operations.
How do you interpret them on a decimal machine, a 1's complement machine, a sign-magnitude machine?
Again, they'd have to be implemented in such a way as to produce the same results. This might require (internally) different cases for positive and negative numbers on such machines, but so be it. (Just like we have to do for `mod', e.g.)
Also with differing values for maxint (which need not be a binary power of 2 in Pascal).
Conceptually, compute the result in arbitrary precision, compare this against MaxInt and if greater, it's a runtime error. Actual implementations can usually do it easier, but in case of doubt, one can always check if one gets the same result as the conceptual definition.
You can define the arithmetic shifts in terms of multiply/divide.
Yes, with `pow' standing for integer powers as defined in EP (if you don't like that, you can insert the usual recursive definition of powers, of course), I think we can define:
a shl b := a * 2 pow b
Your definition is clearer if you use some parentheses.
asl(a, b) := a * (2 pow b)
and I strongly favor making these syntactical extensions functions (so they can be easily replaced in purely standard systems) and naming them asl, asr when they consider the sign. That allows clear distinction from lsl, lsr which insert 0 bits at one end or the other, and can be defined to never create an overflow by using modular arithmetic as in C unsigned.
Right shifts are a bit more tricky because they're like division rounding towards negative infinity unlike `div' which round towards 0. So we could write:
a shr b := a div 2 pow b if a >= 0 (a - 2 pow b + 1) div 2 pow b if a < 0
One potential problem with shl is that it can easily produce overflows. Currently it's not a practical problem, as GPC doesn't do overflow checks, but when it does, we'll have to consider this case. We could define `shl' to "ignore overflows", but this would again make the semantics type-size-dependent. Of course, a
This is why the arithmetic and logical shift operations should be firmly separated. The logicals will be defined in terms of modulo arithmetic, just as C does.
programmer will be able to turn off overflow checking locally with compiler directives, as a last resort. One could also mask the result using `and'. A good implementation can optimize away the `and' (as GPC seems to do with the following program on i386 with optimizations). Furthermore, even a completely different integer-representation would then yield the same results, if `and' is defined as I wrote before.
program Foo;
var a, b: Cardinal attribute (Size = 32);
begin a := (b shl 4) and 16#ffffffff end.
Rotates are problematic since they indeed depend on the type size. I see no way to define them independently. I've had a discussion with Mirsad Todorovad about these, and we came to the conclusion that it's probably more trouble than it helps. E.g., if a is a constant and we use the term `a ror 4', suddenly the type of the constant becomes very important (not just for whether there's an overflow, but it would change the result completely) which is contrary to Pascal's rules.
Again, define them as functions, and let the function parameters include the size in terms of binary bit count:
FUNCTION ror(in : T, places, bits : int) : T;
and compile time checking ensures that places and bits are compatible with T. Constants can be checked at compile time, so runtime checks only appear for variables there. It could be defined in terms of modulo arithmetic on an unsigned value range 0..2**bits-1.
When all these things are standard functions defined at level 0, they do not interfere with any older code, yet are available. When used in newer code that code remains portable for the effort of defining those functions. No syntactical compiler effort is needed.
OTOH, rotates don't seem to be needed very often (at least an order of magnitude less than shifts); in fact the only place I could need them in my code is in MD5. So it's probably not worth spending too much effort here.
But when they are needed they often need to be efficient. That's why building them into the run-time library can pay off. Incidentally, in PascalP I had two extension warning levels. One warned about syntactic extensions (such as defining byte values by special numeric operations) and another about use of non-standard standard procedures. These gave instant knowledge about portability problems.
CBFalconer wrote:
Frank Heckenbach wrote:
Yes, with `pow' standing for integer powers as defined in EP (if you don't like that, you can insert the usual recursive definition of powers, of course), I think we can define:
a shl b := a * 2 pow b
Your definition is clearer if you use some parentheses.
asl(a, b) := a * (2 pow b)
and I strongly favor making these syntactical extensions functions (so they can be easily replaced in purely standard systems)
Sorry, but this discussion is not really about syntax. We want to be compatible with other compilers where that's easily possible, and several other compilers use such operators. This is just about the semantics which, in other compilers, after often only defined poorly (or accidentally, or for special circumstances).
and naming them asl, asr when they consider the sign. That allows clear distinction from lsl, lsr which insert 0 bits at one end or the other, and can be defined to never create an overflow by using modular arithmetic as in C unsigned.
As I wrote, this makes the operations dependent on the type size. That's why C needs to distinguish between a "short" and a "long" number 1 (and various other sizes), even for constants (`0L' etc.). I definitely don't want such semantics in GPC and prefer value-based identity (in accordance with the standards as far as arithmetics are concerned).
That's my main grief with C's "modular arithmetics"(*). While something like 3 * MaxInt would be an error in Pascal (though GPC doesn't currently detect it), the equivalent expression (on usual types) in C is *defined* to be equal to MaxInt - 2.
(*) BTW, it's not really modular arithmetics, as e.g. `/' is defined quite differently (e.g., modulo 16: 3 / 7 = 5, while in C arithmetics 3 / 7 = 0). Modular arithmetics is often claimed by C supporters who want to argue that C has much resemblance to mathematics, but it doesn't really apply.
One potential problem with shl is that it can easily produce overflows. Currently it's not a practical problem, as GPC doesn't do overflow checks, but when it does, we'll have to consider this case. We could define `shl' to "ignore overflows", but this would again make the semantics type-size-dependent. Of course, a
This is why the arithmetic and logical shift operations should be firmly separated. The logicals will be defined in terms of modulo arithmetic, just as C does.
Overflow can happen with signed and unsigned types just as well. I fail to see your point here.
Rotates are problematic since they indeed depend on the type size. I see no way to define them independently. I've had a discussion with Mirsad Todorovad about these, and we came to the conclusion that it's probably more trouble than it helps. E.g., if a is a constant and we use the term `a ror 4', suddenly the type of the constant becomes very important (not just for whether there's an overflow, but it would change the result completely) which is contrary to Pascal's rules.
Again, define them as functions, and let the function parameters include the size in terms of binary bit count:
FUNCTION ror(in : T, places, bits : int) : T;
This would be possible, but could easiy lead to less efficient code when places or bits is variable. I'm writing "or" because I'm not really sure which means what -- to me "places" and "bits" means basically the same in this context. Which one is the rotation count and which one corresponds to the "type size"?
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
Frank Heckenbach wrote:
... snip ...
and I strongly favor making these syntactical extensions functions (so they can be easily replaced in purely standard systems)
Sorry, but this discussion is not really about syntax. We want to be compatible with other compilers where that's easily possible, and several other compilers use such operators. This is just about the semantics which, in other compilers, after often only defined poorly (or accidentally, or for special circumstances).
If you make them available as standard procedures, you can then add code to call them with other syntax when emulating those other systems. Yet you preserve the ability to write new portable code using the operations.
and naming them asl, asr when they consider the sign. That allows clear distinction from lsl, lsr which insert 0 bits at one end or the other, and can be defined to never create an overflow by using modular arithmetic as in C unsigned.
As I wrote, this makes the operations dependent on the type size.
Not for asl, asr. They simply do whatever is needed for the multiply or divide by 2. Only asl can overflow. lsl does the same thing, but ignores overflow and works as if the representation were unsigned binary. lsr inputs a 0 bit, and is a special case.
That's why C needs to distinguish between a "short" and a "long" number 1 (and various other sizes), even for constants (`0L' etc.). I definitely don't want such semantics in GPC and prefer value-based identity (in accordance with the standards as far as arithmetics are concerned).
That's my main grief with C's "modular arithmetics"(*). While something like 3 * MaxInt would be an error in Pascal (though GPC doesn't currently detect it), the equivalent expression (on usual types) in C is *defined* to be equal to MaxInt - 2.
(*) BTW, it's not really modular arithmetics, as e.g. `/' is defined quite differently (e.g., modulo 16: 3 / 7 = 5, while in C arithmetics 3 / 7 = 0). Modular arithmetics is often claimed by C supporters who want to argue that C has much resemblance to mathematics, but it doesn't really apply.
One potential problem with shl is that it can easily produce overflows. Currently it's not a practical problem, as GPC doesn't do overflow checks, but when it does, we'll have to consider this case. We could define `shl' to "ignore overflows", but this would again make the semantics type-size-dependent. Of course, a
This is why the arithmetic and logical shift operations should be firmly separated. The logicals will be defined in terms of modulo arithmetic, just as C does.
Overflow can happen with signed and unsigned types just as well. I fail to see your point here.
Not when you say you will resolve an unsigned overflow by adding or subtracting the maximum + 1 of that unsigned entity until it fits. Remember this only applies to resolving the action of lsl and lsr, and making their operation overflow free. The C technique works. Range checks can still exist for storing the result.
Rotates are problematic since they indeed depend on the type size. I see no way to define them independently. I've had a discussion with Mirsad Todorovad about these, and we came to the conclusion that it's probably more trouble than it helps. E.g., if a is a constant and we use the term `a ror 4', suddenly the type of the constant becomes very important (not just for whether there's an overflow, but it would change the result completely) which is contrary to Pascal's rules.
Again, define them as functions, and let the function parameters include the size in terms of binary bit count:
FUNCTION ror(in : T, places, bits : int) : T;
This would be possible, but could easiy lead to less efficient code when places or bits is variable. I'm writing "or" because I'm not really sure which means what -- to me "places" and "bits" means basically the same in this context. Which one is the rotation count and which one corresponds to the "type size"?
Pick it. Maybe size is a better name than bits. There has to be a limit on the value such that the result can be represented in an integer. You really want still another, rorc and rolc (rotate through carry) but that may be awkward to implement and describe.
CBFalconer wrote:
Frank Heckenbach wrote:
CBFalconer wrote:
Frank Heckenbach wrote:
... snip ...
and I strongly favor making these syntactical extensions functions (so they can be easily replaced in purely standard systems)
Sorry, but this discussion is not really about syntax. We want to be compatible with other compilers where that's easily possible, and several other compilers use such operators. This is just about the semantics which, in other compilers, after often only defined poorly (or accidentally, or for special circumstances).
If you make them available as standard procedures, you can then add code to call them with other syntax when emulating those other systems. Yet you preserve the ability to write new portable code using the operations.
It's a point. OTOH, it also works the other way around, i.e. have them built-in as operators, then code that wants to use them in a "portable" way can declare procedures (or functions) that uses the operators, possibly in the same module that would provide those routines with different implementations for other compilers.
and naming them asl, asr when they consider the sign. That allows clear distinction from lsl, lsr which insert 0 bits at one end or the other, and can be defined to never create an overflow by using modular arithmetic as in C unsigned.
As I wrote, this makes the operations dependent on the type size.
Not for asl, asr. They simply do whatever is needed for the multiply or divide by 2. Only asl can overflow.
Again, "ignoring overflow" means being dependent on the size. E.g., 1 shl 40 is ok if 1 is represented in 64 bits and overflows in 32 bits. This means that the operation is not well-defined when talking of integer *values*, as Pascal normally does.
lsl does the same thing, but ignores overflow and works as if the representation were unsigned binary.
The point is that the compiler and/or executable know whether the value is positive or negative and thus can (and IMHO should) select the appropriate kind of shift themselves (unlike assembler where the programmer can treat data as signed or unsigned at will in each operation, and therefore can, and has to, select the kind of shift intended).
This holds for operators (and is IMHO an advantage of them). For functions, as you suggested, (and barring overloading, either explicit which is non-standard, or built-it which would also require non-standard features to replace it on another system), one would indeed need different versions for signed and unsigned types (jsut to declare the parameters accordingly) and, possibly, different sizes. (Though `Integer' and its unsigned counterpart, if any, could catch all smaller types, if we provide (non-standard) larger types, as we do, we might need different versions for them, since doing everything in the large types would be inefficient.) So we'd have at least 4 times as many routines than operators. For someone writing a compatibility module (see above) this might be ok (and they'd only need to write those routines they actually need, probably not all that are conceivable); for built-it identifiers I'm reluctant to add so many instead of a single operator.
lsr inputs a 0 bit, and is a special case.
That is probably useful on the assembler level. On the Pascal level, it would seem easier to add in this new bit after the operation than adding a new parameter just for this special case.
One potential problem with shl is that it can easily produce overflows. Currently it's not a practical problem, as GPC doesn't do overflow checks, but when it does, we'll have to consider this case. We could define `shl' to "ignore overflows", but this would again make the semantics type-size-dependent. Of course, a
This is why the arithmetic and logical shift operations should be firmly separated. The logicals will be defined in terms of modulo arithmetic, just as C does.
Overflow can happen with signed and unsigned types just as well. I fail to see your point here.
Not when you say you will resolve an unsigned overflow by adding or subtracting the maximum + 1 of that unsigned entity until it fits.
And for signed over-/underflow, I'd subtract/add the maximum - minimum + 1 until it fits, so it's just the same (since the minimum of an unsigned type is 0).
Rotates are problematic since they indeed depend on the type size. I see no way to define them independently. I've had a discussion with Mirsad Todorovad about these, and we came to the conclusion that it's probably more trouble than it helps. E.g., if a is a constant and we use the term `a ror 4', suddenly the type of the constant becomes very important (not just for whether there's an overflow, but it would change the result completely) which is contrary to Pascal's rules.
Again, define them as functions, and let the function parameters include the size in terms of binary bit count:
FUNCTION ror(in : T, places, bits : int) : T;
This would be possible, but could easiy lead to less efficient code when places or bits is variable. I'm writing "or" because I'm not really sure which means what -- to me "places" and "bits" means basically the same in this context. Which one is the rotation count and which one corresponds to the "type size"?
Pick it. Maybe size is a better name than bits. There has to be a limit on the value such that the result can be represented in an integer. You really want still another, rorc and rolc (rotate through carry) but that may be awkward to implement and describe.
I know these exist in assembler, but do we really want/need them in Pascal? Again, how often are rotates used at all (as I said, only once in all my code, and that's a 32 bit rotation, not 33 bit (32 bit + carry)). So, while there are certainly a few more interesting assembler instructions, the question is which ones do we need built-into GPC.
Frank
Frank Heckenbach wrote:
CBFalconer wrote:
... snip ...
Pick it. Maybe size is a better name than bits. There has to be a limit on the value such that the result can be represented in an integer. You really want still another, rorc and rolc (rotate through carry) but that may be awkward to implement and describe.
I know these exist in assembler, but do we really want/need them in Pascal? Again, how often are rotates used at all (as I said, only once in all my code, and that's a 32 bit rotation, not 33 bit (32 bit + carry)). So, while there are certainly a few more interesting assembler instructions, the question is which ones do we need built-into GPC.
rrc and rlc come in very handy when implementing such things as bignum packages, or when ever extending ranges. Certainly their use is rare.