Re: gdb, sets, big endian and 64 bit

6 Aug 2006


      Jonas Maebe wrote:
...
...
...
I would say proper big endian representation. Currently both GPC
and FPC use little endian bit order for set hunks, which conflicts
with big endian byte order.
Indeed, it's a bit strange as it is now.
I personally don't consider it strange since all cpu's I know of have  
the same endianess as far as bit ordering is concerned (regardless of  
their byte endianess). At least there's no architecture I know of  
where the byte with value 1 is represented as $80.
I think that's a tautology, as we don't see the internal
representation (i.e., which bit of memory is actually used), but we
consider the value 1 (i.e., the value which behaves like 1 in
operations) as being represented as 1.
What I think is strange is when you look at the byte order with
current (GPC and apparently also FPC) sets on big-endian machines,
e.g. for a 0-based set on a 32 bit machine:
Byte   Elements
  0     24..31
  1     16..23
  2      8..15
  3      0..7
  4     56..63
  5     48..55
etc.
This is because the bytes within a word are big-endian, but we
always treat the sequence of words as little-endian (lowest elements
first), so we actually get a mixed endianness when considering set
elements.
Using proper big endian representation, as Waldek suggested, would
mean reversing the bits in a word, so we'd get
Byte   Elements
  0      0..7
  1      8..15
  2     16..23
  3     24..31
  4     32..39
  5     40..47
just like on little-endian.
Indeed, it would also reverse the order of elements in a byte
(perhaps that's what you meant above), so the set element 1, e.g.,
wouldn't correspond to the "bit #1", i.e. value 2, anymore, but
instead "bit #6", i.e. value $40.
BTW, actually I used to wonder about the latter when I previously
saw it on its own (WRT packed arrays IIRC). Why put the
lowest-indexed values in the highest-values bits? But when looking
at it in the context of words, as above, it makes sense. (Because if
you don't, i.e. want to have the same byte order and bit order as on
little-endian, you get strange shifts, like 24 for element 0, 25 for
1, ..., 31 for 7, 16 for 8, 17 for 9, ... This is awkward for single
elements already, but really gets ugly when you need bit-masks for
ranges etc.)
...
...
Using proper big endian
representation we'd be independent of word size (though not
alignment). The byte order would then be the same between big and
little endian machines, though not the bit order within a byte, so
no binary compatibility (but we don't have that now, so we wouldn't
lose anything).
Bit swapping when porting from big to little endian is a lot less  
obvious than byte swapping though. We'd at least have to provide  
routines to do this (maybe GPC has them already, but FPC doesn't).
I don't think we have such a routine, but it wouldn't be hard to
provide one (probably table-based for efficiency).
...
...
...
Using big endian bit order would
remove this discrepancy. However, I think about much simpler
scheme, trying to allocate 1, 2, 4, 8 bytes (in that order) and
if that fails allocating sequence of 8 byte words (all that assuming
64 bit machine, with obvious changes for other wordlengths). The main
reason is that with such a scheme one can perform operations on
small sets inline, using just a couple of instructions.
Seems reasonable.
To me as well overall, although I'm personally in favour of always  
using the same cut-off and extension sizes regardless of the native  
word size (e.g. 1, 2, 4, 8, 12, 16, ... everywhere) to keep same sets  
the same size on 32 and 64 bit archs.
Making people extend their set base types so the sets are a multiple  
of 8 bytes on both 32 and 64 bit archs seems awkward: it may mess up  
bit-packed records elsewhere as well, and for enums it may amount to  
adding a bunch of dummy enum elements (which doesn't look nice either).
You seem to be very concerned with sets in binary file formats,
judging from these and previous concerns. I admit that's not such a
big concern of mine (in particular, in contrast to runtime
efficiency, or proper gdb support, of course). Actually I haven't
seen many file formats containing sets. That's probably because many
languages don't have sets, so file formats that should be accessible
from different languages probably use bit-fields or something like
this instead.
...
...
If we can get the extensions working through the backend and gdb,
and perhaps even "officially" approved, I'd certainly prefer this.
Otherwise, the "alignment detection", though it seems a bit
backward, looks best to me.
I agree. Concerning M2: it uses a hack for (some?) larger sets (m2- 
valprint.c):
 case TYPE_CODE_STRUCT:
   if (m2_is_long_set (type))
     m2_print_long_set (type, valaddr, embedded_offset, address,
                        stream, format, pretty);

   else
     cp_print_value_fields (type, type, valaddr, embedded_offset,
                            address, stream, format,
                            recurse, pretty, NULL, 0);


m2_is_long_set checks if all the fields of the record are consecutive  
sets (i.e. sets of consecutive range types). I don't really  
understand why this is useful though, nor do I see at first sight  
what m2_print_long_set does so differently compared to the M2 print  
code for TYPE_CODE_SET.
I don't currently have the time to check the code carefully, but
since you speak of a record, does the compiler divide large sets
internally into record fields (so the debugger has to reassemble
them to look as one set), or something like this? (Otherwise I
wonder when a programmer would use "consecutive sets" manually.)
...
But for some reason it gave me an easy idea to solve the gdb  
alignment problem we have: even if it's a set of 48..50 (which we  
will store as a set of 0..63), put in the debug info that it's a set  
of 0..50 and gdb will print everything correctly.
Well, I think we make sure that unused bits are stored as 0, so it
wouldn't print any spurious values.
OTOH, the gcc backend generates most of the debug info from the type
information itself, so I don't know if we easily "fake" it. (In
another case we tried lying to the backend, it failed badly.) Maybe
it wouldn't be as bad here since the memory layout wouldn't change,
but we'd probably need to store the real bounds (for range-checking
etc.) elsewhere. I'm not sure if this will work well. Waldek?
Frank
-- 
Frank Heckenbach, f.heckenbach@fh-soft.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE  D101 CD02 4C9D 0FE0 E5E8

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: gdb, sets, big endian and 64 bit