On 02 Aug 2006, at 22:16, Frank Heckenbach wrote:
Waldek Hebisch wrote:
There is still issue of alignment: we need alignment to avoid shifts when lower bounds do not match, also set variables must be allocated on proper (word) boundary. Looks messy.
I agree.
Allocation on a proper word boundary can be easily done by keeping the format as an array of MedCard, and only typecasting to something else inside the set/test/remove helpers.
I think the alignment issue could in that case be handled in the same way as deciding the size of sets mentioned by Waldek below (a tradeoff of size vs efficiency).
I would say proper big endian representation. Currently both GPC and FPC use little endian bit order for set hunks, which conflicts with big endian byte order.
Indeed, it's a bit strange as it is now.
I personally don't consider it strange since all cpu's I know of have the same endianess as far as bit ordering is concerned (regardless of their byte endianess). At least there's no architecture I know of where the byte with value 1 is represented as $80.
Using proper big endian representation we'd be independent of word size (though not alignment). The byte order would then be the same between big and little endian machines, though not the bit order within a byte, so no binary compatibility (but we don't have that now, so we wouldn't lose anything).
Bit swapping when porting from big to little endian is a lot less obvious than byte swapping though. We'd at least have to provide routines to do this (maybe GPC has them already, but FPC doesn't).
That said, there are two big arguments in favour of using that solution (i.e., treating sets basically as packed bit arrays on big endian architectures):
a) gdb has a define called BITS_BIG_ENDIAN which is set to the same value as the byte endianess of the target architecture. If this define is set, it currently treats sets as packed bit arrays on big endian architectures (but not on little endian architectures, there it treats them the same way as FPC and GPC currently store their sets). b) at least Think Pascal also uses this set format. I do not have MW Pascal to test against.
Using big endian bit order would remove this discrepancy. However, I think about much simpler scheme, trying to allocate 1, 2, 4, 8 bytes (in that order) and if that fails allocating sequence of 8 byte words (all that assuming 64 bit machine, with obvious changes for other wordlengths). The main reason is that with such a scheme one can perform operations on small sets inline, using just a couple of instructions.
Seems reasonable.
To me as well overall, although I'm personally in favour of always using the same cut-off and extension sizes regardless of the native word size (e.g. 1, 2, 4, 8, 12, 16, ... everywhere) to keep same sets the same size on 32 and 64 bit archs.
Making people extend their set base types so the sets are a multiple of 8 bytes on both 32 and 64 bit archs seems awkward: it may mess up bit-packed records elsewhere as well, and for enums it may amount to adding a bunch of dummy enum elements (which doesn't look nice either).
If we can get the extensions working through the backend and gdb, and perhaps even "officially" approved, I'd certainly prefer this. Otherwise, the "alignment detection", though it seems a bit backward, looks best to me.
I agree. Concerning M2: it uses a hack for (some?) larger sets (m2- valprint.c):
case TYPE_CODE_STRUCT: if (m2_is_long_set (type)) m2_print_long_set (type, valaddr, embedded_offset, address, stream, format, pretty);
else cp_print_value_fields (type, type, valaddr, embedded_offset, address, stream, format, recurse, pretty, NULL, 0);
m2_is_long_set checks if all the fields of the record are consecutive sets (i.e. sets of consecutive range types). I don't really understand why this is useful though, nor do I see at first sight what m2_print_long_set does so differently compared to the M2 print code for TYPE_CODE_SET.
But for some reason it gave me an easy idea to solve the gdb alignment problem we have: even if it's a set of 48..50 (which we will store as a set of 0..63), put in the debug info that it's a set of 0..50 and gdb will print everything correctly.
Jonas