Jonas Maebe wrote:
Hello,
gdb currently displays the contents of set variables wrongly on big endian machines for at least GPC and FPC. The reason is that it treats a set as a linear array of bits, while both GPC and FPC treat it as an array of ordinals in which bits are set using "1 shl (elementnr mod sizeof(ordinal))".
I am surprised, but AFAICS you are right (ATM I can not test on a big endian machine (no gdb there), but the problem already appears on little endian machines).
The problem with creating a patch is that in FPC the size of this ordinal is always 32 bit, while on GPC it is MedCard (which is 64 bit on 64 bit architectures).
I do not know what FPC is doing, but GPC also "aligns" sets, so for example `set of 11..37' is stored as a subset of `set of 0..37'. gdb has no idea of set alignment, so such set are printed wrong even on little endian machines.
The main reasons for FPC to always use 32 bits are
a) binary compatibility b) performance (even on 64 bit machines, loading/storing a 32 bit value from/to memory is often faster)
I think that there is a subtle iteraction between space usage and instruction count. For data-heavy program storing sets as byte sequences will minimize space usage which may pay reducing cache misses. OTOH 64 bit machines tend to have 64 bit buses, so working in cache machine should handle 64 bit chunk as fast as 32 bit chunk. Hence, for mass operations (copy, sum or intersection) on moderate or large sets 64 bit chunks should give much better speed then 32 bit chunks.
I have found several mails in the archives of this list noting that people should not rely on GPC sets having a particular format, so I wonder whether this could be changed in GPC? Regardless of those warnings, I guess it will still cause problems (the same binary compatibility mentioned above, for people who created data files containing sets on 64 bit machines with GPC), but it will also have advantages (making files containing sets compatible with both 32 and 64 bit apps compiled by GPC).
In principle we can change set representation. In fact, there is one change that I intend to do in the future: currently even the smalles set use a MedCard sized word, I plan to allocate smaller space for them (smallest unit which can represent them). OTOH I am not convinced that always using 32 bit chunks for sets is the best choice. GPC on 64 bit machines uses 64 bit types in so many places that I doubt in usefulness of having the same set representation (and still big endian machines would use different representation than little endian).
If someone sees a way to automatically detect the used set format inside gdb itself, that would be great as well of course.
What do you think?
My first thought was to change GPC representation to match gdb, but while we can rather easily (and with minor performance impact) change bit order in sets, the alignment problem remains...
Technically in the compiler proper the change would be just to set a few parameters to different value. The main change would be to runtime support. Here we depend very much on set alignment (of course removing alignment is doable, but we would get both lower performing and more complicated code).
I am affraid that set representation affects not only FPC and GPC. There is also GNU Modula-2 (would be nice to be able to have calling convention compatibility with Modula-2). AFAIK now Modula-2 uses default gcc (which is the same as gdb) representation.
Also, most RISC vendors had Pascal compilers. It seems that now vendors are dropping Pascal support, but for gdb this may be an issue.
In principle gdb could try to detect the compiler: gpc uses some pretty characteristic symbols (like `_p_GPC_RTS_VERSION_20060215') and I suspect that FPC is doing something similar.
Jonas
PS: I've attached a patch to gdb which fixes the problem for big endian FPC and 32 bit GPC apps. I think it may still have to be changed to not treat bitstrings differently from sets, because the stabs docs (http://www.cygwin.com/stabs.html) state:
S Indicate that this type is a string instead of an array of characters, or a bitstring instead of a set. It doesn't change the layout of the data being represented, but does enable the debugger to know which type it is.
What are "bitstrings" in Pascal anyway? gdb displays them as
B'1..3,5,7' (sets are printed as [1..3,5,7])
so they don't seem to be the same as a bit-packed array of boolean or so.
Bitstrings are Chill speciality (Chill has both sets and bitstrings, they share most of backend support in gcc). Chill is included in gcc-2.95.x and AFAICS its data representation agrees with current gdb.