Jonas Maebe wrote:
Anyway, you could still do the adding/subbing/... of sets by typecasting the set to a MedCard array of the appropriate size. This issue is only with inserting/removing/testing elements (and in that case you're doing more or less random access, so there is less chance that the chunk you need is already in the cache).
There is still issue of alignment: we need alignment to avoid shifts when lower bounds do not match, also set variables must be allocated on proper (word) boundary. Looks messy.
In principle we can change set representation. In fact, there is one change that I intend to do in the future: currently even the smalles set use a MedCard sized word, I plan to allocate smaller space for them (smallest unit which can represent them).
There have been plans for a long time to do that in FPC as well. If you do this as meticulously as Delphi does (its set size grows by 1 byte at a time), then you more or less need by definition a little- endian representation of sets in all cases though (because of the left-over bytes at the end).
I would say proper big endian representation. Currently both GPC and FPC use little endian bit order for set hunks, which conflicts with big endian byte order. Using big endian bit order would remove this discrepancy. However, I think about much simpler scheme, trying to allocate 1, 2, 4, 8 bytes (in that order) and if that fails allocating sequence of 8 byte words (all that assuming 64 bit machine, with obvious changes for other wordlengths). The main reason is that with such a scheme one can perform operations on small sets inline, using just a couple of instructions. Delphi way require rather complicated instruction sequence or special runtime routines.
For this reason, a number of FPC developers are in favour of using "little endian" sets on all platforms. I'm a bit wary of breaking backwards binary compatibility though, and possibly also compatibility with other big endian Pascal compilers (does anyone know how CodeWarrior/PPC and/or Think Pascal store their sets?)
I do not know. But I have looked into XL Pascal and Sun Pascal manuals. Both claim that set are implemented as bitstrings. XL Pascal limits sets to 256 elements and all of them are 0 based. It looks that Sun Pascal aligns sets, but the manulal omits exact rules. The manuals are pretty imprecise, but at least Sun example show that bits are stored in consequitive byte (but I was not able to find specification of bit order).
(and still big endian machines would use different representation than little endian).
Big and little endian machines using different representations is logical, and people porting from little to big endian (and vice versa) are used to add byte swapping code all over the place.
I got used to adding changes going from 32 bit machine to 64 bit system.
If someone sees a way to automatically detect the used set format inside gdb itself, that would be great as well of course.
What do you think?
My first thought was to change GPC representation to match gdb, but while we can rather easily (and with minor performance impact) change bit order in sets, the alignment problem remains...
Well, those are two different issues, I think, which can be solved separately.
Hmm, let me see: if we use natural alignment (to the chunk boundary) and the set needs bigger alignment then the size of variable gets bigger. So we can detect alignment used from the set bound and size of the set variable. AFAIK it is possible to tell gdb about correct size. So, if we agree to use natural alignment gdb can detect the exact amount used (sometimes it can not detect if alignment is used at all, so gdb must know if the natural alignment rule is used).
So, ATM for me bit reversal (plus teaching gdb about alignment) is the most attractive solution.
In principle gdb could try to detect the compiler: gpc uses some pretty characteristic symbols (like `_p_GPC_RTS_VERSION_20060215') and I suspect that FPC is doing something similar.
There are indeed a lof of FPC_* symbols, but I don't really like such a solution, and I'm not sure whether the gdb people would like that either.
AFAICS any method to choose between set representations is going to be non-standard. I am not sure if some ad hoc extensions to format of debug info are better.