I've followed this thread so far with interest, sometimes with amazement ... I will reply to some fragments of several mails below, but before, some general comments.
I think some of you (especially Chuck ;-) are really seeing the issue more difficult than it really is. The difficulties you see might stem from two reasons:
- You tend to see things at too low a level (we've had this discussion before). When you're already on the assembler level and dealing with offsets etc., it might be hard, indeed. But when you stay at the Pascal level, many things look (and are ;-) much easier. This also holds for the actual implementation, BTW. We don't have to hand-code assembler code for the checks. We already have a backend which can produce and optimize code for all kinds of expressions and statements (represented in an internal form which is close enough to Pascal code, so for the purpose of this discussion we can assume we actually insert Pascal code for the runtime checks).
- Mixing up of different kinds of runtime checks (that applies to many of you). It's clear that GPC currently does very little runtime checks, and it should be improved. But, it doesn't help to mix them all together -- one tends to lose overview, and they'll have to be implemented separately, anyway.
In particular I can imagine the following kinds of checks (and, BTW, it's clear that all runtime checks will be disableable (is this a word? ;-) by compiler options -- there's not really a need to repeat this in every such discussion):
- Range checks -- I use this term in the following strict sense because I'm used to it from other environments; if someone has a better suggestion, let me know: A value of an ordinal type is used in a situation where only a subrange of this type is acceptable (e.g. assignments, parameters, array indexing, ...)
- Overflow checks: The result of some operation does not fit in the target type (e.g., adding two large integers). That's different from range checking, since here the result is not even a valid integer type. Whereas range checks can be done during the assignment etc., overflow checking must be done in connection with the operation itself before it yields a wrong value. (So, e.g., when you `Write' the result, overflow checking might still need to be done, whereas no range checking is ever necessary there.)
- Variant checks: Also quite different from range checks. This is about whether some field can be accessed at all (at runtime), independent of what's done with the field (reading/writing/passing).
- nil/invalid pointer checks
- undefined value checks
- and probably much more ...
J. David Bryan wrote:
GPC behaves correctly now, but it would perhaps be more useful if variant assignments were checked at run time as part of the general implementation of range checking (which is on the "to do" list).
As just explained, variant checking is not part of general range checking and it was not on the list AFAICS. I'm putting it there now.
It is even impossible if the variant selector is not stored....
Well, yes, but I don't see why that would prevent implementation in the case where it was stored. In the latter case, it should be possible at run time to verify before access that the selector is set to a proper value.
If the selector contains no tag-field and is not a discriminant-identifier, it is still possible to store the selector (the standard doesn't say what's stored or where). When allocating with `New' and a selector given, it can be stored. When a variant is assigned to, the corresponding selector value can be stored. When reading from a variant, it can be checked against the stored selector (if none was stored, it's an error, anyway). Of course, that's a little more difficult than in the case where there's an explicit tag-field or a discriminant, but it should be possible.
However, the issue also borders to (general) undefined value checks, to catch errors such as the following which a simple variant check would miss:
program Foo;
var a: record case b: Integer of 1: (c: Integer); 2: (d: Integer) end;
begin a.b := 1; a.c := 2; a.b := 2; a.b := 1; WriteLn (a.c) { WRONG } end.
A way to go might be to define a magic value for "undefined" for each type and store this value into each variant field when a variant becomes active. Of course, this really is quite some overhead ...
Prof. A Olowofoyeku (The African Chief) wrote:
I think I understand the point being made. If you write something like "r.a := b;", how does the compiler perform the range check (either at compile time or at runtime)? The value of "b" could be anything and might have been set at any point in the code, in any number of ways (which might not be obvious at compile time, because it might depend on user input). And, presumably, "b" might be a 16- bit integer, and so the value of "b" could be "19000" or whatever. Or am I missing something?
Others have replied to that already, but I really don't see why you see a problem there. Of course, the actual value might come from anywhere -- that's why it's a runtime check. At the time of assignment, the actual value is known (obviously) and therefore can be checked. I also don't see what difference a 16-bit integer would make. It has a value just as well, and this value can be inside or out of the range of the target, whether that is an 8, 16, 32 or however large value. (In some cases it's known at compile time that it can't be out of range, but this only simplifies things.)
J. David Bryan wrote:
To implement a check for this requirement, the compiler might modify the above range-checking pseudo-code to:
if rcd.a = 1 then if something >= -5 and something <= 512 then rcd.twobyte := something else range_error else variant_error
Exactly. I'd prefer to rearrange this a little, taking into account that the error procedures don't return:
if rcd.a <> 1 then variant_error; if (something < -5) or (something > 512) then range_error; rcd.twobyte := something;
This makes it quite clear that both checks (range and variant) are completely independent, and can be individually turned on and off (even the order in which they're done doesn't matter).
Of course, there are cases where some checks can be omitted by compile-time knowledge. Some cases are easy to detect, some not so easy (you get the full scale from a constant comparision to the halting problem ...). Doing the optimal thing in this regard (if there is an optimal thing) is very hard. As far as I'm concerned, when I do it, I'll only do the simplest optimizations (basically, constant comparisons), and perhaps the backend will catch a few more. Anything else is IMHO not worth the effort since we're only talking about an optional debugging aid. For the occasional tight loop where speed really matters, the programmer might want to choose types etc. in order to help the compiler avoid unnecessary checks, rather than hoping for some complicated optimizations.
CBFalconer wrote:
This logic can probably be simplified with unrestricted set size. But consider:
r = RECORD CASE i : integer OF 0..1000, 10000: (c : char); -9999, 3000..4000: (j : integer); END; (* r *)
Note the negative value!!
Also here, others have replied, but I wonder what's so special about the negative value. Comparing the selector with -9999 is just as easy as comparing with 1.
I remain unconvinced that ranges in variant selectors are a GOOD THING.
One example that comes to mind might be a parse tree for arithmetic expressions. The variant selector could be an enum type that represents the operator. Since all binary operators have the same fields, it would be natural to put them in one variant, and a range (provided the binary operators are in sequence) is just a little easier to maintain than a list of values.
J. David Bryan wrote:
But consider a "regular" (executable) CASE statement appearing in the body of a program. To decide whether a specific contained statement is to be executed, the same sequence of possibly disjoint checks must be made. The only difference between this and the variant access situation is the action taken after the check: a transfer of execution or an allowed field access. So the checking code already exists in GPC; it "simply" :-) needs to be emitted in association with a variant field access.
Not exactly. In a `case' statement, the compiler produces code for all variants at once (possibly using jump tables etc.), for a variant access we need to check only one case. In fact, the code to be used here is quite similar to that emitted for an `in' expression whose right operand is a set constructor.
Essentially, yes, although I would imagine that the loop would be unrolled to perform sequential checks to improve speed.
Indeed (or rather, a series of checks would be generated right away instead of a table and a loop; that's also easier to do).
The GNAT (Ada) compiler actually puts the selector-checking code for a specific variant into a compiler-generated function, so that accesses to any fields contained within that variant may be checked by calling the common routine (i.e., avoiding the replication of checking code at each field reference within the program).
I doubt whether that's too useful. The cases where the function call overhead might be worth it (very complex lists of selectors/ranges) are probably very rare in practice, so it doesn't seem worth optimizing for them.
[Assembler stuff skipped. This goes to you, too -- if you think something can be improved in the assembler code, then please contact the respective backend maintainers. Any improvement made there at the low level would benefit all languages and all situations (e.g., explicit comparisons as well as automatic checks). Here, let's please stay at the high level.]
Frank