Jonas Maebe wrote:
On 06 Aug 2006, at 00:40, Frank Heckenbach wrote:
Jonas Maebe wrote:
[snip]
What I think is strange is when you look at the byte order with current (GPC and apparently also FPC) sets on big-endian machines, e.g. for a 0-based set on a 32 bit machine:
Byte Elements 0 24..31 1 16..23 2 8..15 3 0..7 4 56..63 5 48..55
etc.
This is because the bytes within a word are big-endian, but we always treat the sequence of words as little-endian (lowest elements first), so we actually get a mixed endianness when considering set elements.
I disagree that for sets this is strange, if only for the reason that apparently both the GPC and FPC implementors did it that way independently, and without any hacking or ugly workarounds.
What I mean is that the memory layout is strange when looking closer at it. The code to handle it is not strange that's probably why we developed it like this.
On the contrary, this representation is extremely convenient because it allows one to implement all set routines in an endian-agnostic way (both in the rtl/rts and in the compiler), and the only thing needed to load the values from one endianess architecture on another is byte- swapping.
At a very low level it may seem strange once you start to think about it (I also thought it to be strange the first time I realised this), but does that matter if it's easy to work with it at a high level? (both for the compiler implementors and the end-users no less)
On a high-level it's not a problem. But you started this thread with gdb access which is a low-level issue. The advantage of formats without "strange" memory layout is that they're also agnostic of element sizes used (the original problem). But of course, that's not an absolute requirement. In fact, I'd prefer if there's a way to tell gdb about the element size (and bit/byte/word endianness and alignment), so different formats can be tried and used without worrying whether gdb suports them.
You seem to be very concerned with sets in binary file formats, judging from these and previous concerns.
I am indeed. These sort of issues can waste a lot of programmer time when going from one platform to another, which could be spent on much more useful things.
As you explained that FPC itself need this, I understand your concern. ;-) (It might be one of the biggest users of sets in binary files, otherwise I haven't seen many programs that do.) However, as you said, you need some conversions anyway, does it make a big difference which kind of conversions (bit, byte, word reversing), once you have generic routines for these (which is a small one-time effort, of course).
Gale Paeper wrote:
68K and PPC CodeWarrior Pascal, YHINK Pascal, and MPW Pascal all use the same data format for sets. The format is an array of bytes and (from the CodeWarrior Pascal documentation) the formula for determining a specific set element bit in the array is:
bit : BAND(SET[|SET| - 1 - EL div 8],(BSL(1,EL mod 8))
where BAND and BSL are 68K intrinsics for 68K instructions for respectively bitwise AND and bitwise shift left.
OK, so that's just the reverse of what Waldek suggested. It's also word-size-agnostic and allows for simple range operations with any word-size. (It's just quite different from the little-endian format in that the byte-order is completely reversed.)
EL is the element number which corresponds to the ord of the element in the base type the set is derived from. |SET| is the number of bytes in the set, so the byte array has an index subrange of 0..(|SET| - 1).
The byte array is always sized/allocated with an element number zero base even in the cases of a subrange base type set (e.g., set of [5..15] will be allocated the same as set of [0..15] and the bits for 0..4 won't be used).
What about negative lower bounds? I take it they're not supported in this format.
Frank