Hi,
the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.
GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.
Is there a way to turn off lazy/IO on nonterminal text files?
Greetings,
Wolfgang Helbig
Wolfgang Helbig wrote:
the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.
GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.
Is there a way to turn off lazy/IO on nonterminal text files?
GPC uses buffers on reading, but only reads as much data as are available at any time. E.g., with a default buffer size of $4000, it tries to read that many bytes when it needs more input, but if the underlying read() system call returns less (e.g., because input from a terminal, pipe, socket or other device has less bytes available), it doesn't call read() again (to fill the buffer completely) until more input is actually required. So there shouldn't be a problem here (i.e., disk input should be fast because the buffer is usually filled completely, while terminal input doesn't block unwarrantedly).
On writing, GPC doesn't buffer yet at all; it's not implemented yet. It's not that it couldn't be done, but it's more tricky, because GPC supports various ways of seeking, pre-reading, getting the file position etc., which all would have to take account of the buffers.
If it's a serious problem you could kludge it by installing a user-defined file write routine which can do the buffering. If you know that your application only does sequential writes, this would work. I've done this once (see RewriteBuffer in cgi.pas in http://fjf.gnu.de/misc/cgiprogs.tar.bz2). Let me know if you need more details.
Frank
On Thu, Jul 03, 2008 at 06:22:33PM +0200, Frank Heckenbach wrote:
Wolfgang Helbig wrote:
the program, Don Knuth's TeX adapted to GPC version 2.1, does heavy I/O on text files. I noticed a very high percentage of system time. I believe this is due to lazy I/O, which seems to be unbuffered.
GPC should apply lazy/IO when the text file comes from the keyboard, but to use buffered I/O when it comes from the disk.
Is there a way to turn off lazy/IO on nonterminal text files?
GPC uses buffers on reading, but only reads as much data as are available at any time. E.g., with a default buffer size of $4000, it tries to read that many bytes when it needs more input, but if the underlying read() system call returns less (e.g., because input from a terminal, pipe, socket or other device has less bytes available), it doesn't call read() again (to fill the buffer completely) until more input is actually required. So there shouldn't be a problem here (i.e., disk input should be fast because the buffer is usually filled completely, while terminal input doesn't block unwarrantedly).
On writing, GPC doesn't buffer yet at all; it's not implemented yet. It's not that it couldn't be done, but it's more tricky, because GPC supports various ways of seeking, pre-reading, getting the file position etc., which all would have to take account of the buffers.
If it's a serious problem you could kludge it by installing a user-defined file write routine which can do the buffering. If you know that your application only does sequential writes, this would work. I've done this once (see RewriteBuffer in cgi.pas in http://fjf.gnu.de/misc/cgiprogs.tar.bz2). Let me know if you need more details.
Frank
The bottleneck in TeX's I/O is the output to the dvi file, which is done byte by byte (there's no other choice in standard Pascal). Knuth himself notes that this is inefficient, the porter is expected to optimize it:
---- tex.web ----- @ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling |write_dvi(a,b)|. For best results, this procedure should be optimized to run as fast as possible on each particular system, since it is part of \TeX's inner loop. It is safe to assume that |a| and |b+1| will both be multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on many machines to use efficient methods to pack four bytes per word and to output an array of words with one system call. @^system dependencies@> @^inner loop@> @^defecation@>
@p procedure write_dvi(@!a,@!b:dvi_index); var k:dvi_index; begin for k:=a to b do write(dvi_file,dvi_buf[k]); end; ---------------
In the case of GPC, it is trivial to replace the write_dvi procedure with a single call to BlockWrite. This makes a huge difference in the running time.
Emil Jerabek