Hi Folks!
I'm not sure wether it's a feature or a bug but I've read, that setting a case selector in variant records makes some variants unavailable from this point. The following program demonstrates, that setting a case selector allows further overwriting of unselected variants:
program Fail3;
type TGeoType = (Circle, Rect); TGeometry = record X, Y: Integer; case aType: TGeoType of Circle: (Radius: Integer); Rect: (Width, Height: Integer) end;
var Geometry: TGeometry;
begin with Geometry do begin X := 10; Y := 20; aType := Rect; Radius := 30; Width := 40; Height := 50 end end.
Version: ======== Reading specs from /usr/local/bin/../lib/gcc-lib/i686-pc-linux-gnu/2.95.2/specs gpc version 20021111, based on gcc-2.95.2 19991024 (release)
Eike
On 19 Nov 2002 at 10:11, Eike Lange wrote:
I'm not sure wether it's a feature or a bug but I've read, that setting a case selector in variant records makes some variants unavailable from this point.
ISO-7185 (Pascal) and ISO-10206 (Extended Pascal) say:
6.5.3.3 Field-designators
[...]
It shall be an error unless a variant of a record-variable is active for the entirety of each reference and access to each component of the variant.
So you are correct (a variant part becomes "active" when the selector value matches the associated case constant).
The following program demonstrates, that setting a case selector allows further overwriting of unselected variants:
The standards also say:
3.2 Error
A violation by a program of the requirements of this International Standard that a processor is permitted to leave undetected.
So GPC complies with the ISO standards.
- how _should_ GPC behave?
GPC behaves correctly now, but it would perhaps be more useful if variant assignments were checked at run time as part of the general implementation of range checking (which is on the "to do" list).
-- Dave
"J. David Bryan" wrote:
On 19 Nov 2002 at 10:11, Eike Lange wrote:
I'm not sure wether it's a feature or a bug but I've read, that setting a case selector in variant records makes some variants unavailable from this point.
ISO-7185 (Pascal) and ISO-10206 (Extended Pascal) say:
6.5.3.3 Field-designators
[...]
It shall be an error unless a variant of a record-variable is active for the entirety of each reference and access to each component of the variant.
So you are correct (a variant part becomes "active" when the selector value matches the associated case constant).
The following program demonstrates, that setting a case selector allows further overwriting of unselected variants:
The standards also say:
3.2 Error
A violation by a program of the requirements of this International Standard that a processor is permitted to leave undetected.
So GPC complies with the ISO standards.
- how _should_ GPC behave?
GPC behaves correctly now, but it would perhaps be more useful if variant assignments were checked at run time as part of the general implementation of range checking (which is on the "to do" list).
This particular error is **very** hard to detect at run-time in any efficient manner. It means treating variant records entirely differently from any other type of record. Note that nothing requires handling the components within a with statement.
While I am one of the biggest mutterers about the lack of run-time range-checking, I would not expect this one to ever be implemented. It is even impossible if the variant selector is not stored, as in:
athing = RECORD a, b : integer; CASE c OF c1: ( d : boolean) c2: ( e : real) END; END; (* record athing *) ... thing : athing; thingptr : ^athing;
When "thing.e := realvalue" is always legal unless the variable declaration was of the form:
new(thingptr, c1); ... WITH thingptr^ DO BEGIN c2 :=- realvalue; (* possibly detectable *) END;
because how can you then handle:
dispose(thingptr); new(thingptr, c2); etc.
and I am not willing to give up the freedom of variant records with no selector element.
On 19 Nov 2002 at 12:00, CBFalconer wrote:
This particular error is **very** hard to detect at run-time in any efficient manner.
I'm not sure I understand this. Is it not a matter of checking, before each component access, whether the variant selector (or selectors) is in the range of the corresponding case-constant-list? Isn't that a simple range check, e.g., equivalent to that performed before an assignment?
It is even impossible if the variant selector is not stored....
Well, yes, but I don't see why that would prevent implementation in the case where it was stored. In the latter case, it should be possible at run time to verify before access that the selector is set to a proper value.
...I am not willing to give up the freedom of variant records with no selector element.
There's no question here of "giving up freedom," as the standard requires that the tag-field be optional. But I do not see anything in the standard that requires that such checking be "all or nothing," i.e., must cover all possible cases or none of them. If some errors are detected and others are not, that appears to be perfectly in line with the requirements of ISO 10206.
-- Dave
"J. David Bryan" wrote:
On 19 Nov 2002 at 12:00, CBFalconer wrote:
This particular error is **very** hard to detect at run-time in any efficient manner.
I'm not sure I understand this. Is it not a matter of checking, before each component access, whether the variant selector (or selectors) is in the range of the corresponding case-constant-list? Isn't that a simple range check, e.g., equivalent to that performed before an assignment?
No. Let's consider
TYPE r = RECORD CASE a : integer OF 0: ( onebyte : 0..127); 1: ( twobyte : -5..512); END; END;
VAR rcd : r; .... r.twobyte := something; (* something may be an integer *)
The compiler tables hold something describing the location of r, and offsets from there to a, onebyte, and fourbyte. It also holds acceptable ranges for each value. It can generate code (for a stack machine) something like:
LDCA r (* address of r *) LDC offsetof twobyte ADD (forming address of twobyte, or maybe onebyte *) LOD something (* whatever is to be stored *) CHK minval, maxval (* which may cause an exception *) STI 1 (* putting it away *)
Now, where is the compiler going to get the values of minval and maxval? To do so it would have to read a, and translate that somehow into minval and maxval at run time. Where are those tables stored? Nothing insists that a be a valid initialized value before storing something in onebyte. Remember, onebyte and twobyte have the same offsets. They can store different subranges. a may have been set in another procedure far, far away.
In addition, the compiler knows that it is storing a twobyte value. If it got it from something of appropriate type it knows it need not do a check on this store, and can totally elide the CHK operation. No such efficiencies if we have to read and translate the a value first.
Without that check the only differences for storing onebyte and twobyte are the (known at compile time) values of minval and maxval, and quite possibly the operand of STI, which specifies the size of the thing to be written.
In other words full checking here is a monstrous can of worms.
Checking that the variant selector is in range never enters into it (except when storing that value at some other time). The names in the different variants are known to be disjoint (standard specified) so that the name used specifies the variant desired. Also there is no default "otherwise" clause in this flavor of a CASE statement AFAIK.
On Tue, Nov 19, 2002 at 04:55:47PM -0500, CBFalconer wrote:
"J. David Bryan" wrote:
On 19 Nov 2002 at 12:00, CBFalconer wrote:
This particular error is **very** hard to detect at run-time in any efficient manner.
I'm not sure I understand this. Is it not a matter of checking, before each component access, whether the variant selector (or selectors) is in the range of the corresponding case-constant-list? Isn't that a simple range check, e.g., equivalent to that performed before an assignment?
No. Let's consider
TYPE r = RECORD CASE a : integer OF 0: ( onebyte : 0..127); 1: ( twobyte : -5..512); END;
(BTW, one of these ENDs is spurious, I think.)
END;
VAR rcd : r; .... r.twobyte := something; (* something may be an integer *)
The compiler tables hold something describing the location of r, and offsets from there to a, onebyte, and fourbyte. It also holds acceptable ranges for each value. It can generate code (for a stack machine) something like:
LDCA r (* address of r *) LDC offsetof twobyte ADD (forming address of twobyte, or maybe onebyte *) LOD something (* whatever is to be stored *) CHK minval, maxval (* which may cause an exception *) STI 1 (* putting it away *)
Now, where is the compiler going to get the values of minval and maxval? To do so it would have to read a, and translate that somehow into minval and maxval at run time. Where are those tables stored? Nothing insists that a be a valid initialized value before storing something in onebyte. Remember, onebyte and twobyte have the same offsets. They can store different subranges. a may have been set in another procedure far, far away.
Don't understand this. The problem was to check that the component accessed is valid, which is as simple as ``r.a = 1?'' in your example. Anyway the range check which you seem to be talking about is also easy: the code stores something into r.twobyte, hence the value should be range-checked against the definition of twobyte, i.e. minval = -5 and maxval = 512. These bounds are known at compile time, and they have nothing to do with the current value of r.a.
In addition, the compiler knows that it is storing a twobyte value. If it got it from something of appropriate type it knows it need not do a check on this store, and can totally elide the CHK operation. No such efficiencies if we have to read and translate the a value first.
Without that check the only differences for storing onebyte and twobyte are the (known at compile time) values of minval and maxval, and quite possibly the operand of STI, which specifies the size of the thing to be written.
In other words full checking here is a monstrous can of worms.
Checking that the variant selector is in range never enters into it (except when storing that value at some other time). The names
Yes, it does. Here, the ``range'' was 1 .. 1, but the ISO 10206 standard permits a general range (or list of ranges) in such a situation.
type R = record case A: Integer of 0 .. 24, 29: (OneByte: 0 .. 127); 25, 30 .. 42: (TwoByte: -5 .. 512); otherwise (MoreByte: Real) end;
in the different variants are known to be disjoint (standard specified) so that the name used specifies the variant desired. Also there is no default "otherwise" clause in this flavor of a CASE statement AFAIK.
See above.
Emil
-- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address!
On 20 Nov 2002 at 14:10, Emil Jerabek wrote:
[...]
Now, where is the compiler going to get the values of minval and maxval? To do so it would have to read a, and translate that somehow into minval and maxval at run time. Where are those tables stored? Nothing insists that a be a valid initialized value before storing something in onebyte. Remember, onebyte and twobyte have the same offsets. They can store different subranges. a may have been set in another procedure far, far away.
Don't understand this. The problem was to check that the component accessed is valid, which is as simple as ``r.a = 1?'' in your example. Anyway the range check which you seem to be talking about is also easy: the code stores something into r.twobyte, hence the value should be range-checked against the definition of twobyte, i.e. minval = -5 and maxval = 512. These bounds are known at compile time, and they have nothing to do with the current value of r.a.
I think I understand the point being made. If you write something like "r.a := b;", how does the compiler perform the range check (either at compile time or at runtime)? The value of "b" could be anything and might have been set at any point in the code, in any number of ways (which might not be obvious at compile time, because it might depend on user input). And, presumably, "b" might be a 16- bit integer, and so the value of "b" could be "19000" or whatever. Or am I missing something?
Best regards, The Chief --------- Prof. Abimbola Olowofoyeku (The African Chief) Web: http://www.bigfoot.com/~african_chief/
On 19 Nov 2002 at 16:55, CBFalconer wrote:
VAR rcd : r; .... r.twobyte := something; (* something may be an integer *)
I assume you mean "rcd.twobyte := something" here.
Now, where is the compiler going to get the values of minval and maxval? To do so it would have to read a, and translate that somehow into minval and maxval at run time.
Not at all. The assignment to "twobyte" means that the range is known at compile time from twobyte's type, so the (run-time) check can be generated with constants. That's no different from any other run-time range check, e.g. in pseudo-code:
rcd.twobyte := something
might be generated by the compiler as:
if something >= -5 and something <= 512 then rcd.twobyte := something else range_error
Nothing insists that a be a valid initialized value before storing something in onebyte.
The standard does. From my original post from ISO 10206, section 6.5.3.3:
"It shall be an error unless a variant of a record-variable is active for the entirety of each reference and access to each component of the variant."
This insists that not only must "a" be a valid value, the value must correspond with the case-constant-list associated with the accessed variant, because a particular variant is only "active" if "a" is set appropriately.
To implement a check for this requirement, the compiler might modify the above range-checking pseudo-code to:
if rcd.a = 1 then if something >= -5 and something <= 512 then rcd.twobyte := something else range_error else variant_error
Remember, onebyte and twobyte have the same offsets. They can store different subranges.
That is true but not relevant, because access always designates "onebyte" or "twobyte" and so the range is known to the compiler at the point of access from the type (and therefore appropriate range-checking code may be generated).
In other words full checking here is a monstrous can of worms.
It appears to me to be quite straightforward, unless I am missing your point entirely. Checking for correct variant accesses would appear to be a matter of adding a second range check (for the variant selector) to the range check presumed already present for assignment.
Note that Ada, e.g., requires that variant accesses be checked, i.e., it's not allowed for such checks to be omitted as in the Pascal standard. Using the GNU Ada compiler (GNAT) on an Ada program equivalent to your example, the compiler proceeds as I have outlined above, e.g., it generates run-time code to compare the value of the variant selector ("discriminant" in Ada- ese) to the case-constant-list ("discrete_choice_list") values before proceeding with the assignment. If the comparison fails, a "constraint error" is raised.
-- Dave
On 20 Nov 2002 at 15:50, Prof. A Olowofoyeku (The African Chief) wrote:
If you write something like "r.a := b;", how does the compiler perform the range check (either at compile time or at runtime)?
Given the original definition of "a" as "integer", the compiler could generate code to ensure that the value of "b" was within the range of integer (assuming that "b" is of a type with a larger range).
There are really two checks of importance here. One is the simple range check upon an assignment to "a". That would be no different in implementation from any other range check, e.g., for assignment to a simple variable. The other is the variant check when a field of a variable of type "r" is accessed. This latter check simply requires that "a" be verified as having the correct value for access to the designated variant part. Given that a specific variant part may be activated by a range of values, a range check is appropriate in this case as well.
I confess that I do not know the EP standard well enough to say with certainty whether it is illegal to assign to "a" a value that is within the range of "a"'s type but not one of the case-constant-list values. In other words, is:
rcd.a := 2;
...an error? Or does an error occur only if a field reference is made subsequently (because at that point the selector is not valid for the given field reference)? I can't find language in the standard that prohibits the above assignment, so I suspect but am not certain that it is legal.
-- Dave
"J. David Bryan" wrote:
On 19 Nov 2002 at 16:55, CBFalconer wrote:
VAR rcd : r; .... r.twobyte := something; (* something may be an integer *)
I assume you mean "rcd.twobyte := something" here.
Now, where is the compiler going to get the values of minval and maxval? To do so it would have to read a, and translate that somehow into minval and maxval at run time.
Not at all. The assignment to "twobyte" means that the range is known at compile time from twobyte's type, so the (run-time) check can be generated with constants. That's no different from any other run-time range check, e.g. in pseudo-code:
rcd.twobyte := something
might be generated by the compiler as:
if something >= -5 and something <= 512 then rcd.twobyte := something else range_error
Nothing insists that a be a valid initialized value before storing something in onebyte.
The standard does. From my original post from ISO 10206, section 6.5.3.3:
"It shall be an error unless a variant of a record-variable is active for the entirety of each reference and access to each component of the variant."
This insists that not only must "a" be a valid value, the value must correspond with the case-constant-list associated with the accessed variant, because a particular variant is only "active" if "a" is set appropriately.
You are right - that provision makes a world of difference. There is still the problem of untangling nested variants. And the multiple ranges per variant, as pointed out by Emil, complicates that variant selector check, unless I am again confused. This means the variant check has to be something like (variation on your code, snipped):
selectorok := false; v := firstrange; (* How to handle otherwise?? *) while selectorok and (v <= lastrange) do if rcd.a >= v.minrangept or rcd.a <= v.maxrangept then selectorok := true else v :=- succ(v); if not selectorok then varianterror else if (value <= id.minvalue) or (value > id.maxvalue) then rangerror; else store(value, runtimeaddress^);
where everything except v, value and selectorok are compile time constants. It is still a lot of code to emit to store a single char in a variant field! :-) This allows the restrictions on the source of value to eliminate the minvalue/maxvalue checks. Anything of this nature should be suppressable without destroying generalized range checks.
This logic can probably be simplified with unrestricted set size. But consider:
r = RECORD CASE i : integer OF 0..1000, 10000: (c : char); -9999, 3000..4000: (j : integer); END; (* r *)
Note the negative value!!
I remain unconvinced that ranges in variant selectors are a GOOD THING.
On 20 Nov 2002 at 14:03, CBFalconer wrote:
There is still the problem of untangling nested variants.
True, but this is a matter of adding a range check on each of the variant selectors. One could, for example, declare a type with ten nested variants, and then an access to a field of the most deeply nested variant would involve ten sequential range checks before permitting the access, but that is what is necessary to detect the potential error. It would be slow, but it's still straightforward. (As a matter of practicality, I don't recall ever using more than two nested variants, and I would guess that the typical case is one variant, so the typical speed penalty would be nothing like the unusual case offered above.)
And the multiple ranges per variant, as pointed out by Emil, complicates that variant selector check, unless I am again confused.
No, it does indeed complicate the issue, as multiple (disjoint) checks need to be made.
But consider a "regular" (executable) CASE statement appearing in the body of a program. To decide whether a specific contained statement is to be executed, the same sequence of possibly disjoint checks must be made. The only difference between this and the variant access situation is the action taken after the check: a transfer of execution or an allowed field access. So the checking code already exists in GPC; it "simply" :-) needs to be emitted in association with a variant field access.
This means the variant check has to be something like (variation on your code, snipped)....
Essentially, yes, although I would imagine that the loop would be unrolled to perform sequential checks to improve speed. The GNAT (Ada) compiler actually puts the selector-checking code for a specific variant into a compiler-generated function, so that accesses to any fields contained within that variant may be checked by calling the common routine (i.e., avoiding the replication of checking code at each field reference within the program).
It is still a lot of code to emit to store a single char in a variant field! :-)
It certainly does involve extra code, to be sure. How much code is involved depends on the checks required and the processor instruction set. For example, the Motorola 68020 (a GPC target) can do an interval range check with a single instruction, so using your original example, the 68020 code for:
rcd.twobyte := something;
...might be (where D0 is a machine data register, "#" designates a numeric literal, and "Bxx" are "branch if tested condition is true" instructions):
MOVE a,D0 \ CMP #1,D0 > variant-checking code BNE variant_error /
MOVE something,D0 \ CMP2 bounds,D0 > range-checking code BCS range_error > ("bounds" are "-5,512" in this case) MOVE D0,twobyte /
Contrast this with the non-range-checking case:
MOVE something,twobyte
Certainly, more code is required, but not horrendously more code.
But consider:
r = RECORD CASE i : integer OF 0..1000, 10000: (c : char); -9999, 3000..4000: (j : integer); END; (* r *)
OK, now the equivalent 68020 variant-checking code becomes:
MOVE i,D0 CMP #-9999,D0 BEQ ok CMP2 bounds,D0 ; bounds are "3000,4000" BCS variant_error ok:
More complicated ranges would require correspondingly more code, but I don't see that the problem is intractable. As you have said, providing range-checking is of significant benefit. It appears to me that the benefit can be extended to variant-checking with little additional complication.
-- Dave
"J. David Bryan" wrote:
On 20 Nov 2002 at 14:03, CBFalconer wrote:
.. snip ...
And the multiple ranges per variant, as pointed out by Emil, complicates that variant selector check, unless I am again confused.
No, it does indeed complicate the issue, as multiple (disjoint) checks need to be made.
But consider a "regular" (executable) CASE statement appearing in the body of a program. To decide whether a specific contained statement is to be executed, the same sequence of possibly disjoint checks must be made. The only difference between this and the variant access situation is the action taken after the check: a transfer of execution or an allowed field access. So the checking code already exists in GPC; it "simply" :-) needs to be emitted in association with a variant field access.
I don't know what GPC does, but most CASE executors will use a table, so that execution checks are limited to the overall range of that table. Missing entries will either point to error spawning code or to the otherwise case. It takes a very sparse and small (or alternatively very large) table to encourage simulation by if then elseif.
At any rate, I am not especially worried about checking correct variant tags, but I am worried about run time checks in general. I repeat - do the easy ones first.
I've followed this thread so far with interest, sometimes with amazement ... I will reply to some fragments of several mails below, but before, some general comments.
I think some of you (especially Chuck ;-) are really seeing the issue more difficult than it really is. The difficulties you see might stem from two reasons:
- You tend to see things at too low a level (we've had this discussion before). When you're already on the assembler level and dealing with offsets etc., it might be hard, indeed. But when you stay at the Pascal level, many things look (and are ;-) much easier. This also holds for the actual implementation, BTW. We don't have to hand-code assembler code for the checks. We already have a backend which can produce and optimize code for all kinds of expressions and statements (represented in an internal form which is close enough to Pascal code, so for the purpose of this discussion we can assume we actually insert Pascal code for the runtime checks).
- Mixing up of different kinds of runtime checks (that applies to many of you). It's clear that GPC currently does very little runtime checks, and it should be improved. But, it doesn't help to mix them all together -- one tends to lose overview, and they'll have to be implemented separately, anyway.
In particular I can imagine the following kinds of checks (and, BTW, it's clear that all runtime checks will be disableable (is this a word? ;-) by compiler options -- there's not really a need to repeat this in every such discussion):
- Range checks -- I use this term in the following strict sense because I'm used to it from other environments; if someone has a better suggestion, let me know: A value of an ordinal type is used in a situation where only a subrange of this type is acceptable (e.g. assignments, parameters, array indexing, ...)
- Overflow checks: The result of some operation does not fit in the target type (e.g., adding two large integers). That's different from range checking, since here the result is not even a valid integer type. Whereas range checks can be done during the assignment etc., overflow checking must be done in connection with the operation itself before it yields a wrong value. (So, e.g., when you `Write' the result, overflow checking might still need to be done, whereas no range checking is ever necessary there.)
- Variant checks: Also quite different from range checks. This is about whether some field can be accessed at all (at runtime), independent of what's done with the field (reading/writing/passing).
- nil/invalid pointer checks
- undefined value checks
- and probably much more ...
J. David Bryan wrote:
GPC behaves correctly now, but it would perhaps be more useful if variant assignments were checked at run time as part of the general implementation of range checking (which is on the "to do" list).
As just explained, variant checking is not part of general range checking and it was not on the list AFAICS. I'm putting it there now.
It is even impossible if the variant selector is not stored....
Well, yes, but I don't see why that would prevent implementation in the case where it was stored. In the latter case, it should be possible at run time to verify before access that the selector is set to a proper value.
If the selector contains no tag-field and is not a discriminant-identifier, it is still possible to store the selector (the standard doesn't say what's stored or where). When allocating with `New' and a selector given, it can be stored. When a variant is assigned to, the corresponding selector value can be stored. When reading from a variant, it can be checked against the stored selector (if none was stored, it's an error, anyway). Of course, that's a little more difficult than in the case where there's an explicit tag-field or a discriminant, but it should be possible.
However, the issue also borders to (general) undefined value checks, to catch errors such as the following which a simple variant check would miss:
program Foo;
var a: record case b: Integer of 1: (c: Integer); 2: (d: Integer) end;
begin a.b := 1; a.c := 2; a.b := 2; a.b := 1; WriteLn (a.c) { WRONG } end.
A way to go might be to define a magic value for "undefined" for each type and store this value into each variant field when a variant becomes active. Of course, this really is quite some overhead ...
Prof. A Olowofoyeku (The African Chief) wrote:
I think I understand the point being made. If you write something like "r.a := b;", how does the compiler perform the range check (either at compile time or at runtime)? The value of "b" could be anything and might have been set at any point in the code, in any number of ways (which might not be obvious at compile time, because it might depend on user input). And, presumably, "b" might be a 16- bit integer, and so the value of "b" could be "19000" or whatever. Or am I missing something?
Others have replied to that already, but I really don't see why you see a problem there. Of course, the actual value might come from anywhere -- that's why it's a runtime check. At the time of assignment, the actual value is known (obviously) and therefore can be checked. I also don't see what difference a 16-bit integer would make. It has a value just as well, and this value can be inside or out of the range of the target, whether that is an 8, 16, 32 or however large value. (In some cases it's known at compile time that it can't be out of range, but this only simplifies things.)
J. David Bryan wrote:
To implement a check for this requirement, the compiler might modify the above range-checking pseudo-code to:
if rcd.a = 1 then if something >= -5 and something <= 512 then rcd.twobyte := something else range_error else variant_error
Exactly. I'd prefer to rearrange this a little, taking into account that the error procedures don't return:
if rcd.a <> 1 then variant_error; if (something < -5) or (something > 512) then range_error; rcd.twobyte := something;
This makes it quite clear that both checks (range and variant) are completely independent, and can be individually turned on and off (even the order in which they're done doesn't matter).
Of course, there are cases where some checks can be omitted by compile-time knowledge. Some cases are easy to detect, some not so easy (you get the full scale from a constant comparision to the halting problem ...). Doing the optimal thing in this regard (if there is an optimal thing) is very hard. As far as I'm concerned, when I do it, I'll only do the simplest optimizations (basically, constant comparisons), and perhaps the backend will catch a few more. Anything else is IMHO not worth the effort since we're only talking about an optional debugging aid. For the occasional tight loop where speed really matters, the programmer might want to choose types etc. in order to help the compiler avoid unnecessary checks, rather than hoping for some complicated optimizations.
CBFalconer wrote:
This logic can probably be simplified with unrestricted set size. But consider:
r = RECORD CASE i : integer OF 0..1000, 10000: (c : char); -9999, 3000..4000: (j : integer); END; (* r *)
Note the negative value!!
Also here, others have replied, but I wonder what's so special about the negative value. Comparing the selector with -9999 is just as easy as comparing with 1.
I remain unconvinced that ranges in variant selectors are a GOOD THING.
One example that comes to mind might be a parse tree for arithmetic expressions. The variant selector could be an enum type that represents the operator. Since all binary operators have the same fields, it would be natural to put them in one variant, and a range (provided the binary operators are in sequence) is just a little easier to maintain than a list of values.
J. David Bryan wrote:
But consider a "regular" (executable) CASE statement appearing in the body of a program. To decide whether a specific contained statement is to be executed, the same sequence of possibly disjoint checks must be made. The only difference between this and the variant access situation is the action taken after the check: a transfer of execution or an allowed field access. So the checking code already exists in GPC; it "simply" :-) needs to be emitted in association with a variant field access.
Not exactly. In a `case' statement, the compiler produces code for all variants at once (possibly using jump tables etc.), for a variant access we need to check only one case. In fact, the code to be used here is quite similar to that emitted for an `in' expression whose right operand is a set constructor.
Essentially, yes, although I would imagine that the loop would be unrolled to perform sequential checks to improve speed.
Indeed (or rather, a series of checks would be generated right away instead of a table and a loop; that's also easier to do).
The GNAT (Ada) compiler actually puts the selector-checking code for a specific variant into a compiler-generated function, so that accesses to any fields contained within that variant may be checked by calling the common routine (i.e., avoiding the replication of checking code at each field reference within the program).
I doubt whether that's too useful. The cases where the function call overhead might be worth it (very complex lists of selectors/ranges) are probably very rare in practice, so it doesn't seem worth optimizing for them.
[Assembler stuff skipped. This goes to you, too -- if you think something can be improved in the assembler code, then please contact the respective backend maintainers. Any improvement made there at the low level would benefit all languages and all situations (e.g., explicit comparisons as well as automatic checks). Here, let's please stay at the high level.]
Frank
On 21 Nov 2002 at 15:47, CBFalconer wrote:
I don't know what GPC does, but most CASE executors will use a table....
See my separate reply to Frank Heckenbach on this subject.
I repeat - do the easy ones first.
I have attempted to show that variant checks are no harder in principle than range checks, but I see that I have been unsuccessful.
-- Dave
On 22 Nov 2002 at 2:17, Frank Heckenbach wrote:
As just explained, variant checking is not part of general range checking....
Perhaps I should have said, "part of a general implementation of _constraint_ checking...." It would indeed be useful to have several independently specifiable checking options for special circumstances, but I should think that, from a user perspective, whether:
rcd.twobyte := something^ + 1;
...fails from the result being out of range, or the variant selector value being wrong, or "something" being nil, or the addition overflowing is of secondary importance to catching the error at all.
I do take your point that, from an implementation view, each of these types of constraint checks are implemented differently.
a.b := 2; a.b := 1; WriteLn (a.c) { WRONG }
But _is_ that wrong?
ISO 10206, section 6.5.3.3, says:
"When a variant becomes non-active, all of its components shall become totally-undefined."
But "totally-undefined" is not a value-bearing state, from 6.2.4, so the value of "a.c" is not changed by the change in the variant selector. Once "a.b" is set to 1, the first variant becomes active again, so references to "a.c" are again legal. On what basis, therefore, is the third statement above wrong?
Of course, there are cases where some checks can be omitted by compile-time knowledge. [...] As far as I'm concerned, when I do it, I'll only do the simplest optimizations....
That is quite reasonable. Having the compiler emit redundant checks is far better than omitting a needed one.
For the occasional tight loop where speed really matters, the programmer might want to choose types etc. in order to help the compiler avoid unnecessary checks, rather than hoping for some complicated optimizations.
Or perhaps allow checking to be turned off and on via compiler directives.
Not exactly. In a `case' statement, the compiler produces code for all variants at once (possibly using jump tables etc.), for a variant access we need to check only one case.
I fear that again I have failed to make myself clear. A variant field access would, of course, check only the particular case-constant-list associated with that variant. What I meant was that the logic used to generate a _single_ case alternative comparison is the same (or similar) logic as is needed to generate a variant selector check. Consider:
program p;
var a, b : integer;
begin case a of -9999, 1..5: b := 2 end end.
Not a jump table in sight in the GPC-generated assembly code. ;-) My point was that handling variant selector checking isn't overwhelmingly difficult, as the problem is similar to handling case-constant-lists in regular CASE statements.
I doubt whether that's too useful. The cases where the function call overhead might be worth it (very complex lists of selectors/ranges) are probably very rare in practice, so it doesn't seem worth optimizing for them.
Whether the check is inlined or a function call is simply an implementation issue. I would imagine this is done simply to limit the code size (consider inlining the identical check hundreds or thousands of times). I have not investigated specifically, but I would not be surprised if specifying "-finline-functions" would change this behavior in GNAT.
[Assembler stuff skipped. This goes to you, too -- if you think something can be improved in the assembler code, then please contact the respective backend maintainers.
Oh dear, I really must improve my use of English! ;-) I shall endeavor to be clearer in future communications.
I was not making any comment at all on the GPC-generated code. I was simply trying to counter the argument that variant checking would inevitably add an intolerable overhead to a program by offering an illustration.
Here, let's please stay at the high level.]
I agree in principle. However, it is my observation that programmers today generally have little knowledge of the code generated by compilers for specific constructs. I see this commonly expressed in erroneous assertions regarding the performance of specific compilers or language constructs, e.g., "Ada (or Pascal :-) generates inefficient code when compared to C," when an examination of the code generated for equivalent programs reveals otherwise.
My illustration at the assembly level was merely to demonstrate that variant checking need not involve large amounts of code -- something that may not be immediately evident to those not accustomed to looking at the code generated for, e.g., an "if-then-else" statement.
-- Dave
J. David Bryan wrote:
On 22 Nov 2002 at 2:17, Frank Heckenbach wrote:
As just explained, variant checking is not part of general range checking....
Perhaps I should have said, "part of a general implementation of _constraint_ checking...." It would indeed be useful to have several independently specifiable checking options for special circumstances, but I should think that, from a user perspective, whether:
rcd.twobyte := something^ + 1;
...fails from the result being out of range, or the variant selector value being wrong, or "something" being nil, or the addition overflowing is of secondary importance to catching the error at all.
For a programmer with "good intentions", that's probably true. But seeing that there are some (also in the "standards camp") who like to misuse variant records for type conversions, they'd probably want to turn off those checks.
And, of course, the implementation cost/benefit ratio might look quite different for the different kinds of checks. Range checks (in the strict sense) seem to be quite essential for many (and the good news is that I'll probably implement them soon), whereas I'm not so sure how many would benefit from variant checks (I think in my own code I hardy ever change a selector once a variable is allocated and initialized).
a.b := 2; a.b := 1; WriteLn (a.c) { WRONG }
But _is_ that wrong?
ISO 10206, section 6.5.3.3, says:
"When a variant becomes non-active, all of its components shall become totally-undefined."
But "totally-undefined" is not a value-bearing state, from 6.2.4, so the value of "a.c" is not changed by the change in the variant selector. Once "a.b" is set to 1, the first variant becomes active again, so references to "a.c" are again legal. On what basis, therefore, is the third statement above wrong?
References to a.c are valid again, but the state of a.c is (still) totally-undefined. So it follows, to my understanding, that reading a.c is invalid, until it is written to again.
For the occasional tight loop where speed really matters, the programmer might want to choose types etc. in order to help the compiler avoid unnecessary checks, rather than hoping for some complicated optimizations.
Or perhaps allow checking to be turned off and on via compiler directives.
Of course, they can also do that. (Though I prefer myself to avoid any compiler directives when the same can be achieved otherwise, e.g. by using an appropriate subrange type for an array index.)
Not exactly. In a `case' statement, the compiler produces code for all variants at once (possibly using jump tables etc.), for a variant access we need to check only one case.
I fear that again I have failed to make myself clear. A variant field access would, of course, check only the particular case-constant-list associated with that variant. What I meant was that the logic used to generate a _single_ case alternative comparison is the same (or similar) logic as is needed to generate a variant selector check. Consider:
program p;
var a, b : integer;
begin case a of -9999, 1..5: b := 2 end end.
Not a jump table in sight in the GPC-generated assembly code. ;-) My point was that handling variant selector checking isn't overwhelmingly difficult, as the problem is similar to handling case-constant-lists in regular CASE statements.
Of course, in this case a jump table would make no sense. My point was also that handling variant selector checking isn't overwhelmingly difficult -- just that we might rather reuse some other code than what you suggested.
I doubt whether that's too useful. The cases where the function call overhead might be worth it (very complex lists of selectors/ranges) are probably very rare in practice, so it doesn't seem worth optimizing for them.
Whether the check is inlined or a function call is simply an implementation issue.
Of course. Aren't we talking about implementation issues here? ;-)
I would imagine this is done simply to limit the code size (consider inlining the identical check hundreds or thousands of times).
For a single value or range, an explicit check is hardly more code than a function call. And I guess more complicated case lists are quite rare.
I have not investigated specifically, but I would not be surprised if specifying "-finline-functions" would change this behavior in GNAT.
I guess so.
Here, let's please stay at the high level.]
I agree in principle. However, it is my observation that programmers today generally have little knowledge of the code generated by compilers for specific constructs. I see this commonly expressed in erroneous assertions regarding the performance of specific compilers or language constructs, e.g., "Ada (or Pascal :-) generates inefficient code when compared to C," when an examination of the code generated for equivalent programs reveals otherwise.
Of course, this behaviour of some programmers is not new. I've seen years ago some proudly presenting in a newsgroup a hand-optimized assembler coding for `Abs' -- which turned out worse than the code that BP produced (which is not exactly famous for its sophisticated optimizations ;-). Apparently he had not even checked it before rolling his own.
Also, it's well-known that most "amateur optimizers" spend a lot of effort optimizing in the wrong places, those that are not time-critical at all. (A compiler can afford to optimize everything, but hand-coded assembly is only worth the effort in few places, and doesn't even make a measurable difference is most other places.)
I have attempted to show that variant checks are no harder in principle than range checks, but I see that I have been unsuccessful.
In general I agree. Compared to range checks, there's one thing that makes variant checks easier to implement -- it's immediately clear when to apply then (whereas range checks occur in various situations) -- and several things that make them more difficult:
- several case constants/ranges (just a little harder to do)
- otherwise branch (needs a negation of all other branches)
- no explicit tag-field (quite a bit harder, as I described)
- checks for accessing undefined fields (a topic of its own, probably quite hard in general)
Frank
On 23 Nov 2002 at 2:07, Frank Heckenbach wrote:
References to a.c are valid again, but the state of a.c is (still) totally-undefined. So it follows, to my understanding, that reading a.c is invalid, until it is written to again.
After reviewing the standard again, I believe that you are correct. I was confused by the retention of the values of the fields in the reactivated variant. Even though the values are retained, the variant-part state is still totally-undefined, as you say.
-- Dave