Re: gdb, sets, big endian and 64 bit machines

1 Aug 2006

      Jonas Maebe wrote:
...
Hello,
gdb currently displays the contents of set variables wrongly on big  
endian machines for at least GPC and FPC. The reason is that it  
treats a set as a linear array of bits, while both GPC and FPC treat  
it as an array of ordinals in which bits are set using "1 shl  
(elementnr mod sizeof(ordinal))".
I am surprised, but AFAICS you are right (ATM I can not test on
a big endian machine (no gdb there), but the problem already 
appears on little endian machines).
...
The problem with creating a patch is that in FPC the size of this  
ordinal is always 32 bit, while on GPC it is MedCard (which is 64 bit  
on 64 bit architectures).
I do not know what FPC is doing, but GPC also "aligns" sets, so 
for example `set of 11..37' is stored as a subset of `set of 0..37'.
gdb has no idea of set alignment, so such set are printed wrong
even on little endian machines.
...
The main reasons for FPC to always use 32  
bits are
a) binary compatibility
b) performance (even on 64 bit machines, loading/storing a 32 bit  
value from/to memory is often faster)
I think that there is a subtle iteraction between space usage and
instruction count. For data-heavy program storing sets as byte
sequences will minimize space usage which may pay reducing cache
misses. OTOH 64 bit machines tend to have 64 bit buses, so working
in cache machine should handle 64 bit chunk as fast as 32 bit chunk.
Hence, for mass operations (copy, sum or intersection) on moderate
or large sets 64 bit chunks should give much better speed then
32 bit chunks.
...
I have found several mails in the archives of this list noting that  
people should not rely on GPC sets having a particular format, so I  
wonder whether this could be changed in GPC? Regardless of those  
warnings, I guess it will still cause problems (the same binary  
compatibility mentioned above, for people who created data files  
containing sets on 64 bit machines with GPC), but it will also have  
advantages (making files containing sets compatible with both 32 and  
64 bit apps compiled by GPC).
In principle we can change set representation. In fact, there is one
change that I intend to do in the future: currently even the smalles 
set use a MedCard sized word, I plan to allocate smaller space for
them (smallest unit which can represent them). OTOH I am not convinced
that always using 32 bit chunks for sets is the best choice. GPC
on 64 bit machines uses 64 bit types in so many places that I doubt
in usefulness of having the same set representation (and still
big endian machines would use different representation than little
endian).
...
If someone sees a way to automatically detect the used set format  
inside gdb itself, that would be great as well of course.
What do you think?
My first thought was to change GPC representation to match gdb, but
while we can rather easily (and with minor performance impact) change
bit order in sets, the alignment problem remains...
Technically in the compiler proper the change would be just to set
a few parameters to different value. The main change would be to
runtime support. Here we depend very much on set alignment (of
course removing alignment is doable, but we would get both lower
performing and more complicated code).
I am affraid that set representation affects not only FPC and GPC.
There is also GNU Modula-2 (would be nice to be able to have calling
convention compatibility with Modula-2). AFAIK now Modula-2 uses
default gcc (which is the same as gdb) representation.
Also, most RISC vendors had Pascal compilers. It seems that now
vendors are dropping Pascal support, but for gdb this may be an
issue.
In principle gdb could try to detect the compiler: gpc uses some
pretty characteristic symbols (like `_p_GPC_RTS_VERSION_20060215')
and I suspect that FPC is doing something similar.
...
Jonas
PS: I've attached a patch to gdb which fixes the problem for big  
endian FPC and 32 bit GPC apps. I think it may still have to be  
changed to not treat bitstrings differently from sets, because the  
stabs docs (http://www.cygwin.com/stabs.html) state:

S
Indicate that this type is a string instead of an array of  
characters, or a bitstring instead of a set. It doesn't change the  
layout of the data being represented, but does enable the debugger to  
know which type it is.

What are "bitstrings" in Pascal anyway? gdb displays them as
B'1..3,5,7' (sets are printed as [1..3,5,7])
so they don't seem to be the same as a bit-packed array of boolean or  
so.
Bitstrings are Chill speciality (Chill has both sets and bitstrings, they
share most of backend support in gcc). Chill is included in gcc-2.95.x and
AFAICS its data representation agrees with current gdb.
-- 
                              Waldek Hebisch
hebisch@math.uni.wroc.pl 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: gdb, sets, big endian and 64 bit machines