the magics of eof()

List overview All Threads
Download

newer

older

Re: Cannot compile gpc-20041218 on...

Francesco Bonomi

17 Dec 2004 17 Dec '04

6:30 p.m.

Dear friends,

I don't know if what I found is a bug or a correct behaviour I don't understand.

The following program halts with error 'attempt ro read past end of file';

program test; var f : file of char; begin rewrite(f,'test.dat'); seek(f,0); f^:='x'; (*GIVES ERROR HERE*) put(f); close(f); end.

If I just insert a call reading eof(f) everything seems to work:

program test; var f : file of char; begin rewrite(f,'test.dat'); seek(f,0); if eof(f) then ; (* do nothing *) f^:='x'; put(f); close(f); end.

I am puzzled!

(working with gpc 20030830)

Ciao, Francesco

Show replies by date

CBFalconer

17 Dec 17 Dec

8:37 p.m.

Francesco Bonomi wrote:

...

I don't know if what I found is a bug or a correct behaviour I don't understand.

The following program halts with error 'attempt ro read past end of file';

program test; var f : file of char; begin rewrite(f,'test.dat'); seek(f,0); f^:='x'; (*GIVES ERROR HERE*) put(f); close(f); end.

Read the definitions of file state in the standard. As far as ISO 7185 is concerned, a seek is an extension. At any rate, rewrite has created a new file, positioned it at eof, and no reading is possible. The seek has probably created an anomalous condition in which reading is actually possible, and using lazy i/o has marked the file as having a get pending. Use of f^ says that that get must be implemented, but the file is at eof and cannot do so.

...

If I just insert a call reading eof(f) everything seems to work:

"seems" is the appropriate word. Read the ISO10206 standard, which adds extensions allowing you to have this sort of random access. At any rate, reading from an empty file should still trigger an error.

-- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address!

Waldek Hebisch

30 Jan 30 Jan

3:51 a.m.

Francesco Bonomi wrote:

...

Dear friends,

I don't know if what I found is a bug or a correct behaviour I don't understand.

The following program halts with error 'attempt ro read past end of file';

program test; var f : file of char; begin rewrite(f,'test.dat'); seek(f,0); f^:='x'; (*GIVES ERROR HERE*) put(f); close(f); end.

I consider it as a bug. The following patch corrects the problem:

--- rts/files.pas.bb 2005-01-30 00:55:45.498296856 +0100 +++ rts/files.pas 2005-01-30 03:40:10.262624728 +0100 @@ -1810,7 +1813,6 @@ if IsReadable (f) and not f^.Status.EOF then begin GetN (f); - CheckReadableNotEOF (f) end else { Buffer cannot be read in. But perhaps someone only wants to

After the patch is applied `fjf490.pas' fails. However, I think that `fjf490.pas' is wrong. At least in standard mode accessing buffer variable when at the end of file is _not_ an error: the postcondition of `get' says that at the end of file value of the file buffer is undefined, and I see no restriction on access to undefined variables.

So, IMHO the following program is correct (and works with the patch):

program fjf490b(Output);

var t: Text; c: Char;

begin Rewrite (t); Reset (t); c := t^; writeln('OK') end.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

CBFalconer

12:06 p.m.

Waldek Hebisch wrote:

...

... snip ...

...

So, IMHO the following program is correct (and works with the patch):

program fjf490b(Output);

var t: Text; c: Char;

begin Rewrite (t); Reset (t);

I disagree. At this point t is open, and using lazy i/o a get is pending.

...

c := t^;

To implement this the system should perform the actual get. The get fails, because the file is at EOF, signalling "attempt to read past EOF". It has nothing to do with the definedness of t^. Similarly eoln(t) should fail.

...

writeln('OK')

So this should be "writeln('Failed');

...

end.

Accessing f^, or executing eoln or any read on t, should call something that checks whether a get is pending, and executes it if so. This is a library issue. The compiler issure is to make sure that the f^ action calls the routine, which should return the pointer. You cannot have lazy i/o and just access the file pointer as another address.

-- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson

Waldek Hebisch

5:18 p.m.

CBFalconer wrote:

...

Waldek Hebisch wrote:

...
... snip ...

...
So, IMHO the following program is correct (and works with the patch):

program fjf490b(Output);

var t: Text; c: Char;

begin Rewrite (t); Reset (t);

I disagree. At this point t is open, and using lazy i/o a get is pending.

You mix the standard with implementation details. Standard suggests, but does not requre lazy i/o. Basic criterion for correctness of lazy i/o is that programs working without lazy i/o should still work. So, from this point `t^' should be available to the programmer (read the postcondition of `Reset'). The standard says that content of `t^' is undefined (since t is at eof), which in plain language means that the implementation should put some garbage into it. AFAICS if you do not want to look at the garbage, you need explicitly test for eof.

...

...
c := t^;

To implement this the system should perform the actual get. The get fails, because the file is at EOF, signalling "attempt to read past EOF". It has nothing to do with the definedness of t^. Similarly eoln(t) should fail.

Note that accessing `t^' is logically quite different from `get'. Remember that `t^' references a record _after_ current file position. At this point `get' would attempt to move file position past the end of file and access _second_ record past the end of file, which indeed is an error. `eoln' is different.

...

Accessing f^, or executing eoln or any read on t, should call something that checks whether a get is pending, and executes it if so. This is a library issue. The compiler issure is to make sure that the f^ action calls the routine, which should return the pointer. You cannot have lazy i/o and just access the file pointer as another address.

All the magic has to be in the compiler and the runtime. The user can treat the file buffer almost like another variable.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

CBFalconer

6:12 p.m.

Waldek Hebisch wrote:

...

CBFalconer wrote:

...
Waldek Hebisch wrote:

...
... snip ...

...
So, IMHO the following program is correct (and works with the patch):

program fjf490b(Output);

var t: Text; c: Char;

begin Rewrite (t); Reset (t);

I disagree. At this point t is open, and using lazy i/o a get is pending.

You mix the standard with implementation details. Standard suggests, but does not requre lazy i/o. Basic criterion for correctness of lazy i/o is that programs working without lazy i/o should still work. So, from this point `t^' should be available to the programmer (read the postcondition of `Reset'). The standard says that content of `t^' is undefined (since t is at eof), which in plain language means that the implementation should put some garbage into it. AFAICS if you do not want to look at the garbage, you need explicitly test for eof.

...
...
c := t^;

To implement this the system should perform the actual get. The get fails, because the file is at EOF, signalling "attempt to read past EOF". It has nothing to do with the definedness of t^. Similarly eoln(t) should fail.

Note that accessing `t^' is logically quite different from `get'. Remember that `t^' references a record _after_ current file position. At this point `get' would attempt to move file position past the end of file and access _second_ record past the end of file, which indeed is an error. `eoln' is different.

Without lazyio the access to t^ should check for eof, and error out if it is set. Again, the error is "read past eof". That way there is no need for a unique invalid value in whatever actually holds the value f^. This keeps both methods agreeing with standards.

...

...
Accessing f^, or executing eoln or any read on t, should call something that checks whether a get is pending, and executes it if so. This is a library issue. The compiler issure is to make sure that the f^ action calls the routine, which should return the pointer. You cannot have lazy i/o and just access the file pointer as another address.

All the magic has to be in the compiler and the runtime. The user can treat the file buffer almost like another variable.

I don't think we really disagree. The key point is that the access to f^ has to be protected by checking EOF. BTW, eof also has to check for a pending get, and this gets tricky because it musn't execute such if it is already set. Auto final eoln insertion can be done by saying that if eof is true then eoln is true. The point is to avoid any logical differences between a lazyio system and a pure batch system.

Tha anomaly arrives when the user checks eof or eoln, and these can unexpectedly force an actual get. So a loop of the form:

WHILE NOT eoln DO BEGIN prompt('with something:'); read(values); IF eoln readln; END;

is likely to do its prompting at the wrong moment because the final get in the expansion of readln has set 'getpending' The cure is to leave the line incompletely read within the loop:

WHILE NOT eof DO BEGIN prompt('with something); WHILE NOT eoln DO BEGIN read(values); process(values); END; linendingstuff; readln; END;

but this is the sort of i/o considerations that seem to drive the C programmers out of their skulls.

Frank Heckenbach

6 Feb 6 Feb

2:31 p.m.

Waldek Hebisch wrote:

...

After the patch is applied `fjf490.pas' fails. However, I think that `fjf490.pas' is wrong. At least in standard mode accessing buffer variable when at the end of file is _not_ an error: the postcondition of `get' says that at the end of file value of the file buffer is undefined, and I see no restriction on access to undefined variables.

What about this (6.8.1: Expressions - General):

: When a primary is used, it shall be an error if the variable : denoted by a variableÂaccess of the primary is undefined. : : primary > variableÂaccess

Of course, we could say that we currently don't check for undefined access in general (as a processor is permitted to leave errors undetected), so why should we bother here?

Frank

-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE D101 CD02 4C9D 0FE0 E5E8

Waldek Hebisch

11 Feb 11 Feb

4:19 a.m.

Frank Heckenbach wrote:

...

Waldek Hebisch wrote:

...
After the patch is applied `fjf490.pas' fails. However, I think that `fjf490.pas' is wrong. At least in standard mode accessing buffer variable when at the end of file is _not_ an error: the postcondition of `get' says that at the end of file value of the file buffer is undefined, and I see no restriction on access to undefined variables.

What about this (6.8.1: Expressions - General):

: When a primary is used, it shall be an error if the variable : denoted by a variable?access of the primary is undefined. : : primary > variable?access

Yes, we may raise error in `fjf490.pas'.

...

Of course, we could say that we currently don't check for undefined access in general (as a processor is permitted to leave errors undetected), so why should we bother here?

Worse, ATM we fail correct programs (the original report was about a program _writing_ to the buffer variable). AFAICS to properly distinguish read form writes we have to do that in the core compiler. It seems that we could do that using apropriate tree code and delaying expansion of the reference. The question is how much do we care.

-- Waldek Hebisch hebisch@math.uni.wroc.pl

Frank Heckenbach

13 Feb 13 Feb

5:34 p.m.

Waldek Hebisch wrote:

...

Frank Heckenbach wrote:

...
Waldek Hebisch wrote:

...
After the patch is applied `fjf490.pas' fails. However, I think that `fjf490.pas' is wrong. At least in standard mode accessing buffer variable when at the end of file is _not_ an error: the postcondition of `get' says that at the end of file value of the file buffer is undefined, and I see no restriction on access to undefined variables.

What about this (6.8.1: Expressions - General):

: When a primary is used, it shall be an error if the variable : denoted by a variable?access of the primary is undefined. : : primary > variable?access

Yes, we may raise error in `fjf490.pas'.

...
Of course, we could say that we currently don't check for undefined access in general (as a processor is permitted to leave errors undetected), so why should we bother here?

Worse, ATM we fail correct programs (the original report was about a program _writing_ to the buffer variable). AFAICS to properly distinguish read form writes we have to do that in the core compiler. It seems that we could do that using apropriate tree code and delaying expansion of the reference.

Either that (though it may be impossible to check at compile-time when used as a `var' parameter), or do it at runtime by marking the buffer undefined and give an error on reading. For general undefined-checks (with normal variables) we'd probably have to do this anyway (which seems to be a huge effort, both in the compiler and at runtime, nothing I plan to do anytime soon).

...

The question is how much do we care.

Probably not too much. So I'm applying your change and changing fjf490.pas, but adding a comment there (for possibly future changes in this area).

Frank

7442

Age (days ago)

7500

Last active (days ago)

gpc@gnu.de

8 comments

4 participants

tags (0)

participants (4)

CBFalconer
Francesco Bonomi
Frank Heckenbach
Waldek Hebisch