Re: gdb, sets, big endian and 64 bit machines

2 Aug 2006


      Jonas Maebe wrote:
...
Anyway, you could still do the adding/subbing/... of sets by  
typecasting the set to a MedCard array of the appropriate size. This  
issue is only with inserting/removing/testing elements (and in that  
case you're doing more or less random access, so there is less chance  
that the chunk you need is already in the cache).
There is still issue of alignment: we need alignment to avoid shifts
when lower bounds do not match, also set variables must be allocated
on proper (word) boundary. Looks messy.
...
...
In principle we can change set representation. In fact, there is one
change that I intend to do in the future: currently even the smalles
set use a MedCard sized word, I plan to allocate smaller space for
them (smallest unit which can represent them).
There have been plans for a long time to do that in FPC as well. If  
you do this as meticulously as Delphi does (its set size grows by 1  
byte at a time), then you more or less need by definition a little- 
endian representation of sets in all cases though (because of the  
left-over bytes at the end).
I would say proper big endian representation. Currently both GPC
and FPC use little endian bit order for set hunks, which conflicts
with big endian byte order. Using big endian bit order would
remove this discrepancy. However, I think about much simpler
scheme, trying to allocate 1, 2, 4, 8 bytes (in that order) and
if that fails allocating sequence of 8 byte words (all that assuming
64 bit machine, with obvious changes for other wordlengths). The main
reason is that with such a scheme one can perform operations on
small sets inline, using just a couple of instructions. Delphi
way require rather complicated instruction sequence or special
runtime routines.
...
For this reason, a number of FPC developers are in favour of using  
"little endian" sets on all platforms. I'm a bit wary of breaking  
backwards binary compatibility though, and possibly also  
compatibility with other big endian Pascal compilers (does anyone  
know how CodeWarrior/PPC and/or Think Pascal store their sets?)
I do not know. But I have looked into XL Pascal and Sun Pascal manuals.
Both claim that set are implemented as bitstrings. XL Pascal limits
sets to 256 elements and all of them are 0 based. It looks that 
Sun Pascal aligns sets, but the manulal omits exact rules. The manuals
are pretty imprecise, but at least Sun example show that bits
are stored in consequitive byte (but I was not able to find
specification of bit order).
...
...
(and still
big endian machines would use different representation than little
endian).
Big and little endian machines using different representations is  
logical, and people porting from little to big endian (and vice  
versa) are used to add byte swapping code all over the place.
I got used to adding changes going from 32 bit machine to 64 bit
system.
...
...
...
If someone sees a way to automatically detect the used set format
inside gdb itself, that would be great as well of course.
What do you think?
My first thought was to change GPC representation to match gdb, but
while we can rather easily (and with minor performance impact) change
bit order in sets, the alignment problem remains...
Well, those are two different issues, I think, which can be solved  
separately.
Hmm, let me see: if we use natural alignment (to the chunk boundary)
and the set needs bigger alignment then the size of variable gets
bigger. So we can detect alignment used from the set bound and
size of the set variable. AFAIK it is possible to tell gdb about
correct size. So, if we agree to use natural alignment gdb
can detect the exact amount used (sometimes it can not detect if
alignment is used at all, so gdb must know if the natural
alignment rule is used).
So, ATM for me bit reversal (plus teaching gdb about alignment)
is the most attractive solution.
...
...
In principle gdb could try to detect the compiler: gpc uses some
pretty characteristic symbols (like `_p_GPC_RTS_VERSION_20060215')
and I suspect that FPC is doing something similar.
There are indeed a lof of FPC_* symbols, but I don't really like such  
a solution, and I'm not sure whether the gdb people would like that  
either.
AFAICS any method to choose between set representations is going to
be non-standard. I am not sure if some ad hoc extensions to format of
debug info are better.
-- 
                              Waldek Hebisch
hebisch@math.uni.wroc.pl

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: gdb, sets, big endian and 64 bit machines