Peter Gerwinski wrote:
I noticed a lot of confusion about Integer types in GNU Pascal.
So did I (read: I was confused in the beginning ;-). I was going to post something about it sometime anyway...
32 Integer Word [unsigned] int == long 64 LongInt LongWord [unsigned] long long
(BTW: The number of bits is the same on all platforms, isn't it?)
Even if it is now, it probably won't be forever, I guess...
(What about the 64 bit platforms (Alpha)? Isn't Integer 64 bits there? Should it be? What about LongInt there?)
Is it always (on every platform) guaranteed that "Integer" is strictly bigger than "ShortInt", and "LongInt" is strictly bigger than "Integer"?
BTW: Is 64 bits the biggest possible size (at least for 32 bit platforms), or would it be possible to make 128 bit types? (If not, this would probably decide how big the ClassID for objects will be, BTW...) If so, it might probably be called "LongLongInt", and then I'd also suggest something like "LongestInt" or so to get the actually biggest possible type on every system.
I was told more than once that `Word' should have 16 bits (like in Borland Pascal). I made it 32 bits because this is the "natural" size on a 32-bit system (like GPC), and it has the same size as `Integer' (like in Borland Pascal;-).
I think this is a good idea. However, for BP compatibility one might define Word as 16 bit in the bpcompat system unit or so, but this should not be the gpc default, IMHO.
Christian Wendt wrote:
It could be depend on the Compilerswitches? (there's a --borland-pascal switch?) [...] advantage: BP programms can be compiled more identically. (maybe someone uses specific type to swap sign by adding.... %-)
Argh! But there are cases where the size is necessary (e.g. data files that should be read). I'd favour an explicit declaration instead of this compiler switch, because in many places, even in BP programs, you don't necessarily want the BP sizes (e.g. one often uses "Integer" without much thinking about the exact size, just to have "an integer data type", so it wouldn't hurt to use gpc's 32 bit integer instead).
For a quick BP->gpc conversion, one could use the BP identifiers declared in system. Then, to improve the program, one could find those places where the size really matters, and put types like int8 (8 bit integer) or such there. Those can be declared with gpc and BP.
Peter Gerwinski wrote:
AFAIK, a "word" is defined to be the "natural" unit of the computer, but maybe this definition has changed since I have learned it.
I think this is correct. However, the word "Word" seems inappropriate to me. In natural language, this word has a different meaning (as this sentence shows), and in mathematics and computer science it has usually also another meaning (a finite sequence of elements of a given "alphabet"). I think "Word" in this sense is only common in assembler, and crept into Pascal through BP (correct me if I'm wrong).
So, what else? In mathematics, one says "natural numbers", but I don't think "natural" would be a natural name for this data type... ;-). "unsigned integer" as in C doesn't seem so good, either, simply because it's two words.
What about a name as "ordinal" (or "cardinal" -- which one would be better?)? In the following I'll write "ordinal", and in analogy to "LongInt" and "ShortInt" also "LongOrd" and "ShortOrd". (The similarity in name to the Ord function is, of course, not accidental, since ord(x) will always be "ordinal", except if x is a signed integer, or one of the "strange" enum values we discussed recently.)
To me, such a name would seem more "high level" (Pascalish) than "word" (low level, assembler) or "unsigned" (medium level, C).
It would not be difficult to implement a compiler switch (*$16-bit *) (or a command-line option `--16-bit') to make `Integer' and `Word' 16 bits etc., but I am not so sure that this is a good idea since GPC is a true 32-bit compiler. If we want to look out for a different "natural" data size for GPC, it should not be 16 bits, but 64 bits!
I don't think such a switch would be a good idea, or even necessary. If someone wants Word to be 16 bits, one can just redeclare it (e.g. in system) -- after all, "Word" is not a reserved word...
Orlando Llanes wrote:
I have an idea, perhaps there could be platform dependant types, for example: VAR MyVar : PCByte; This way it would be treated as 8-bits on all platforms, the natural types can stay, all one has to do is use the platform type to compensate for it if necessary. This way, 1) the data works consistently across all platforms because (for example) they know that a PCByte is 8 bits, 2) If a byte is a different size on another platform, then it doesn't matter because the PCByte is the same no matter where it's compiled. I guess this could be accomplished by making the PCByte a natural Byte, but in the code, anything above the 8 bits is not used.
I agree (I also don't like things like "__byte__ integer", because it reminds me of C ;-), and I find such identifiers with many underscores "ugly" -- IMHO, if they're necessary at all, they should be there only in a few low level modules/units). But I wouldn't like the name "PCByte". I don't think "PCByte" will be easy to remember...
What about the following (the meanings are obvious):
Int8 Ord8 Int16 Ord16 Int32 Ord32 Int64 Ord64 Int128 Ord128 (?) [...]
If subrange types would automatically have the correct size (currently they don't -- unless within a packed array or record), and "subranges" are also allowed for ranges that exceed integer (like "Ordinal" or "LongInt"), one could just declare all of these types in a normal way like "Int8=-$80..$7F" ... "Ord128=0..$FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF". :-)
For BP->gpc, as I said above, the types to choose depend on what's really intended. I think one could use about the following rules (please check them carefully; I made several assumptions about the types of which I'm not sure they're valid on all platforms):
BP: ShortInt gpc: Int8
BP: Byte gpc: Ord8
BP: Integer gpc: - If exactly a "16 bit signed integer" is wanted, Int16 - If an integer of at least 16 bits is wanted, ShortInt - If just "an integer" is wanted (maybe implying that it can be handled most efficiently), Integer
BP: Word gpc: - If exactly a "16 bit unsigned integer" is wanted, Ord16 - If an unsigned integer of at least 16 bits is wanted, ShortOrd - If just "an unsigned integer" is wanted (maybe implying that it can be handled most efficiently), Ordinal - If the biggest possible unsigned integer is wanted (though this might be rare, since one would probably rather use LongInt then in BP though that's signed), LongOrd
BP: LongInt gpc: - If exactly a "32 bit signed integer" is wanted, Int32 - If an integer of at least 32 bits is wanted, Integer - If an integer type is wanted that's bigger than Integer (e.g. to hold the result of a multiplication of two integers), LongInt - If the biggest possible [signed] integer type is wanted, LongInt (LongLongInt? LongestInt?) - If an unsigned (!) integer of at least 31 bits is wanted, Ordinal - If the biggest possible unsigned (!) integer is wanted, LongOrd
(Did I miss any possible intentions? ;-)
Another thing, only partly related to the above: is there any support for endianness (i.e. byte order of integers that are bigger than 1 byte) in gpc yet? (This would be necessary e.g. to read binary files on different platforms.) If not, I'd suggest funtions like "SmallEndian" and "BigEndian", defined for all integer types, which convert a value, given in the normal machine's endianness, to a small endian or big endian value, resp. (or vice versa, which is the same conversion). On each given platform, one half of the functions would do nothing, the other half would reverse the byte order. If there's a conditional define concerning endianness, such functions could be implemented in Pascal with {$IFDEF}'s, otherwise they'd have to be provided by the compiler.
Next topic:-)
While we are talking about integer sizes, what about boolean sizes? Usually, a boolean is 1 bit, of course, but in some cases (e.g. system calls) one needs booleans of 1 byte (this would only be a difference in a packed array or record) or 2 bytes (e.g. Windoze API, cf. BP's "bool" type), perhaps even more. So it might be useful to have "bool8", "bool16", ... . The internal value 0 would be false, and any other value would be interpreted as true, while the standard true value is 1 (or -1?). Syntactically they would be treated just like booleans (i.e. assignment compatibility between each other, but not with integers; can be used in if/while/until conditions; work with and/or/xor/not/...).
And one more topic (also related to some kind of sub-types, at least):
Dr John Stockton jrs@merlyn.demon.co.uk wrote once in c.l.p.b.:
A better language might include sub-types of real, R>0 R>=0 R<0 R<=0 R<>0 , with rangechecking.
...or, more generally, arbitrary intervals. Or even more generally, sub-types with a user-defined function(super_type):boolean to decide if an element belongs to the sub-type. (Of course, subranges of ordinal types are a special case of this, too, at least in theory...)
Since all of this would only affect range checking, it's not a current topic for now. With range checking, it might not even be very difficult to implement, just using the user-defined function instead of the "standard" range checking function...
BTW: Does the PXSC standard have any ideas about such things?