Re: gdb, sets, big endian and 64 bit

6 Aug 2006


      On 06 Aug 2006, at 00:40, Frank Heckenbach wrote:
...
Jonas Maebe wrote:
[snip]
...
What I think is strange is when you look at the byte order with
current (GPC and apparently also FPC) sets on big-endian machines,
e.g. for a 0-based set on a 32 bit machine:
Byte   Elements
  0     24..31
  1     16..23
  2      8..15
  3      0..7
  4     56..63
  5     48..55
etc.
This is because the bytes within a word are big-endian, but we
always treat the sequence of words as little-endian (lowest elements
first), so we actually get a mixed endianness when considering set
elements.
I disagree that for sets this is strange, if only for the reason that  
apparently both the GPC and FPC implementors did it that way  
independently, and without any hacking or ugly workarounds.
On the contrary, this representation is extremely convenient because  
it allows one to implement all set routines in an endian-agnostic way  
(both in the rtl/rts and in the compiler), and the only thing needed  
to load the values from one endianess architecture on another is byte- 
swapping.
At a very low level it may seem strange once you start to think about  
it (I also thought it to be strange the first time I realised this),  
but does that matter if it's easy to work with it at a high level?  
(both for the compiler implementors and the end-users no less)
...
Indeed, it would also reverse the order of elements in a byte
(perhaps that's what you meant above), so the set element 1, e.g.,
wouldn't correspond to the "bit #1", i.e. value 2, anymore, but
instead "bit #6", i.e. value $40.
This is indeed what I was referring to.
...
BTW, actually I used to wonder about the latter when I previously
saw it on its own (WRT packed arrays IIRC). Why put the
lowest-indexed values in the highest-values bits? But when looking
at it in the context of words, as above, it makes sense. (Because if
you don't, i.e. want to have the same byte order and bit order as on
little-endian, you get strange shifts, like 24 for element 0, 25 for
1, ..., 31 for 7, 16 for 8, 17 for 9, ... This is awkward for single
elements already, but really gets ugly when you need bit-masks for
ranges etc.)
As long as you treat the set as an "array" of components of the same  
size which is used when inserting/testing/deleting elements, your  
range code can remain generic (as e.g. GPC's SetIncludeRange  
demonstrates). Of course, to make sets binary compatible between  
little and big endian archs you have to use bytes as opposed to words  
when doing these operations, and therefore you'd have to do the same  
with the range code.
For adding/substracting/... it wouldn't make a difference, of course;  
they could keep working on words.
...
...
Bit swapping when porting from big to little endian is a lot less
obvious than byte swapping though. We'd at least have to provide
routines to do this (maybe GPC has them already, but FPC doesn't).
I don't think we have such a routine, but it wouldn't be hard to
provide one (probably table-based for efficiency).
Indeed.
...
...
Making people extend their set base types so the sets are a multiple
of 8 bytes on both 32 and 64 bit archs seems awkward: it may mess up
bit-packed records elsewhere as well, and for enums it may amount to
adding a bunch of dummy enum elements (which doesn't look nice  
either).
You seem to be very concerned with sets in binary file formats,
judging from these and previous concerns.
I am indeed. These sort of issues can waste a lot of programmer time  
when going from one platform to another, which could be spent on much  
more useful things.
...
I admit that's not such a
big concern of mine (in particular, in contrast to runtime
efficiency, or proper gdb support, of course). Actually I haven't
seen many file formats containing sets. That's probably because many
languages don't have sets, so file formats that should be accessible
from different languages probably use bit-fields or something like
this instead.
It would at least be a problem for FPC itself. Our ppu files  
(comparable to a combination of GPC's .gpd and .gpi files) contain a  
ton of sets. Flags describing properties of functions/proceedures,  
variables, types, ... are all sets in FPC, and all are stored that  
way in the .ppu files.
If sets have different sizes on 32 and 64 bit archs, the compiler  
would need hacks in the ppu writing and reading routines if we want  
to keep the ability to cross-compile usable ppu files from 32 bit to  
64 bit platforms (and vice versa). Similar for ppudump, which  
"disassembles" ppu files.
Sets may indeed be rare in datafiles used among a lot of different  
programming environments, but they are quite common in datafiles for  
use by a single Pascal program (simply because may be no need to make  
them easily accessible from other languages, like e.g. our ppu files,  
and because sets are a first class datatype in Pascal).
...
...
m2_is_long_set checks if all the fields of the record are consecutive
sets (i.e. sets of consecutive range types). I don't really
understand why this is useful though, nor do I see at first sight
what m2_print_long_set does so differently compared to the M2 print
code for TYPE_CODE_SET.
I don't currently have the time to check the code carefully, but
since you speak of a record, does the compiler divide large sets
internally into record fields (so the debugger has to reassemble
them to look as one set), or something like this?
That's what it looks like, yes. The code I posted came from gdb, not  
gcc though. I don't know Modula-2, and I know next to nothing of gcc  
internals, so I don't feel like diving into that.
...
(Otherwise I
wonder when a programmer would use "consecutive sets" manually.)
I'm also pretty sure the compiler does it, but as I said I have no  
idea why it does this since at first sight I can't determine a  
difference in binary layout compared to the conventional  
TYPE_CODE_SET way, or what gdb does differently for the two (apart  
from the fact that the routine for printing these record-sets is a  
lot more complex than the regular one because it has to hop from  
field to field).
Jonas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: gdb, sets, big endian and 64 bit