turn off lazyIO on disk files

List overview All Threads
Download

newer

older

Re: turn off lazyIO on disk files

Indent program for the GPC

Wolfgang Helbig

6 May 2008 6 May '08

1:13 p.m.

Hi,

the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.

GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.

Is there a way to turn off lazy/IO on nonterminal text files?

Greetings,

Wolfgang Helbig

Show replies by date

Frank Heckenbach

3 Jul 3 Jul

6:22 p.m.

Wolfgang Helbig wrote:

...

the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.

GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.

Is there a way to turn off lazy/IO on nonterminal text files?

GPC uses buffers on reading, but only reads as much data as are available at any time. E.g., with a default buffer size of $4000, it tries to read that many bytes when it needs more input, but if the underlying read() system call returns less (e.g., because input from a terminal, pipe, socket or other device has less bytes available), it doesn't call read() again (to fill the buffer completely) until more input is actually required. So there shouldn't be a problem here (i.e., disk input should be fast because the buffer is usually filled completely, while terminal input doesn't block unwarrantedly).

On writing, GPC doesn't buffer yet at all; it's not implemented yet. It's not that it couldn't be done, but it's more tricky, because GPC supports various ways of seeking, pre-reading, getting the file position etc., which all would have to take account of the buffers.

If it's a serious problem you could kludge it by installing a user-defined file write routine which can do the buffering. If you know that your application only does sequential writes, this would work. I've done this once (see RewriteBuffer in cgi.pas in http://fjf.gnu.de/misc/cgiprogs.tar.bz2). Let me know if you need more details.

Frank

-- Frank Heckenbach, f.heckenbach@fh-soft.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE D101 CD02 4C9D 0FE0 E5E8

Emil Jerabek

4 Jul 4 Jul

12:02 p.m.

On Thu, Jul 03, 2008 at 06:22:33PM +0200, Frank Heckenbach wrote:

...

Wolfgang Helbig wrote:

...
the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.

GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.

Is there a way to turn off lazy/IO on nonterminal text files?

GPC uses buffers on reading, but only reads as much data as are available at any time. E.g., with a default buffer size of $4000, it tries to read that many bytes when it needs more input, but if the underlying read() system call returns less (e.g., because input from a terminal, pipe, socket or other device has less bytes available), it doesn't call read() again (to fill the buffer completely) until more input is actually required. So there shouldn't be a problem here (i.e., disk input should be fast because the buffer is usually filled completely, while terminal input doesn't block unwarrantedly).

On writing, GPC doesn't buffer yet at all; it's not implemented yet. It's not that it couldn't be done, but it's more tricky, because GPC supports various ways of seeking, pre-reading, getting the file position etc., which all would have to take account of the buffers.

If it's a serious problem you could kludge it by installing a user-defined file write routine which can do the buffering. If you know that your application only does sequential writes, this would work. I've done this once (see RewriteBuffer in cgi.pas in http://fjf.gnu.de/misc/cgiprogs.tar.bz2). Let me know if you need more details.

Frank

The bottleneck in TeX's I/O is the output to the dvi file, which is done byte by byte (there's no other choice in standard Pascal). Knuth himself notes that this is inefficient, the porter is expected to optimize it:

---- tex.web ----- @ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling |write_dvi(a,b)|. For best results, this procedure should be optimized to run as fast as possible on each particular system, since it is part of \TeX's inner loop. It is safe to assume that |a| and |b+1| will both be multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on many machines to use efficient methods to pack four bytes per word and to output an array of words with one system call. @^system dependencies@> @^inner loop@> @^defecation@>

@p procedure write_dvi(@!a,@!b:dvi_index); var k:dvi_index; begin for k:=a to b do write(dvi_file,dvi_buf[k]); end; ---------------

In the case of GPC, it is trivial to replace the write_dvi procedure with a single call to BlockWrite. This makes a huge difference in the running time.

Emil Jerabek

6219

Age (days ago)

6278

Last active (days ago)

gpc@gnu.de

2 comments

3 participants

tags (0)

participants (3)

Emil Jerabek
Frank Heckenbach
Wolfgang Helbig