Waldek Hebisch wrote:
Frank Heckenbach wrote:
Yes (both frontend and backend version). Much more in GPI files depends on endianness (besides checksums). We could detect and convert at runtime, but of course, it would slow down things even more (which I'm sure you wouldn't like too much, Adriaan). The only benefit would only to people who cross-compile *and* can't recompile on each host system for some strange reason ...
Well, I belive that we can make GPI reader/writer both faster and more portable. For example we could bulk convert integers between endiannes, so that only folks that need comatibility will pay for it. But ATM I am looking low hanging fruits ...
Yes, it's not so trivial, as the nodes don't consist only of integers, also of bytes/bit fields and strings. So at least some parsing effort would be required, comparable in size to store_node_fields and load_node, and has to be kept in sync with them (=> more maintenance effort for future changes).
As far as I'm concerned, feel free to change the checksums. I don't insist on the current algorithm (I think the comment in module.c doesn't really suggest I do ;-). But I wouldn't like to abandon checksums. AFAICS, they do catch some cases which would otherwise lead to obscure bugs. Perhaps GP will avoid such cases in the future (when all GP bugs are fixed :-), but even though --automake will then be faded out, such problems can still arise with hand-made make rules (which will probably always be used).
IIUC the most likely bug avoided due to checksums is reading inconsistent GPI file. That can be detected putting a random number (stamp) in the GPI header and checking that at the end of reading (or when the reader finds something wrong) the stamp is still the same.
This wouldn't protect against corruption of the file's contents (which could happen on bad media, after system crashes, etc.). You must be thinking about an entirely different problem, when a number in the header changes while reading the file IIUYC!? Are you thinking about two simultaneous processes writing to the same file or something like that?
But anyway, there are several uses of checksums, see the notes in internals.texi. Perhaps we've been talking about different things all the time. Protection against inconsistent GPI files is just one, and IMHO the least important, one. The more important one is to protect against inconsistent imports, including indirect imports. That's a problem that would lead to strange and very hard to trace bugs. A perfectly working GP should be able to avoid this situation from happening. GPC's automake cannot always do this with indirect imports. Hand-written make rules may or may not do this, depending on how they're written and how complex the project is. Therefore I think we should keep the check in GPC to reject such invalid imports. (Of course, checksums may not be the only way to achieve this.)
Still, a simple checksum should take less then 1% of compile time so we can live with it (BTW, backend folks are willing to risk serious bugs to get 1% speed improvement of the compiler).
Didn't I ever mention that I don't always agree with all design decisions of backend folks ...? IMHO, this is a particularly bad example (but perhaps benchmark-obsessed people force them to). To me the 1980s are over. (But according to the theory of relativity, to some people they may not. ;-)
Concering GP versus `--automake': I think we need to fix main automake problems if we want GP to work well. Basically:
Just to clarify what we're talking about:
- If you mean by "automake problems" problems with the current `--automake' implementation, I can't see why we need to fix them in order to make GP work well, as GP is there to replace automake.
- Main automake problems to me are difficulties handling indirect recompilation needs. -- That's obvious from the way of doing things: Automake only has a local view, so the best it can do is try to rescue things in the last minute, i.e. recompiling other modules when reading their GPI files was already started. Add indirect requirements and cyclic dependencies to it, and you get all the problems we have with automake. Whereas GP (just like make) has a global view and can do things in the right order from the beginning.
- to have separate GPI file for implementation (so that interface
GPI stays consistent during compilation)
This would enable a `-j' option to compile the implementation of A while another process compiles a B that uses A. IMHO, that would be a minor optimization, otherwise I can't see it as a big problem.
- compile iterface and implementation separately (even if in the same
file)
Now referring to your 1% above ... ;-) I think this adds to compilation time (e.g., by having to load all imported GPIs twice), and I'd guess it would be a little more than 1%. Therefore I'd prefer (and that's what GP does) to do so only when needed, i.e. with cyclic imports, and not in the normal case.
- store dependencies in GPI files
I deliberately avoided this in GP, to make GP independent of GPC's internals. If you want to change this, please at least put it on a high enough level, so GP won't have to know too much about GPC's internals (i.e., only the outermost layer of the GPI file format, presumably).
For `--automake' specifically:
- allow only _one_ compilation of given part, anyting more _is_
cyclic dependency, hence illegal (current way is just work-around for lack of 2)
All of --automake is just a work-around for a (previously) missing GP, so my answer is just to drop --automake. (Until we had GP, this was a compromise, as some people needed cyclic dependencies (or claimed to ;-), and could only use --automake. We couldn't do the whole thing in --automake, but at least make some common cases work most of the time.)
For make (and possibly GP):
- have option to print _all_ dependencies (includes + imports)
It might have some advantages, but also drawbacks. In particular, AFAICS, we'd need a do-nothing parse run in GPC, i.e., basically a nop-flag check in every nontrivial action.
OTOH, GP could have its own parser, initially copied from GPC's parser, but stripped of anything non-import related (which is a lot). Of course, the current Q&D parser in GP is not the final word, but I'd still tend to favour a lean parser in GP, unless you know a way to do it in GPC without rather pervasive changes in the parser.
The GP parser as I envision it would use the same lexer as GPC (the lexer doesn't do serious things in its actions anyway, so the problems I mentioned for the parser wouldn't apply), and the preprocessor (according to my plans) would be integrated with the lexer. So then GP could even extract include information directly from it's integrated preprocessor. This would actually save one process call overhead per module from GP, and may therefore be worthwhile in itself. (You may notice that we'd then have two programs that contain the same preprocessor and lexer code, but since it's the same source code, I wouldn't mind this.)
Sure, this doesn't cater for semi-automatic generation of Makefiles. But quite frankly, I don't see this as a main requirement we have to solve. (Actually, most projects have a common formatting, so a simple sed script will do the trick.) But perhaps we could even let GP take this part, by adding a --print-dependencies option to it. This would rid GPC from another side-feature, which is probably not a bad idea either, looking at its complexity ...
I'm not going into technical details of my plans for the integrated preprocessor here, so that this mail won't become too long. We can discuss them separately if wanted.
Also, please note that the buffer passed to compute_checksum is currently not aligned, so larger than byte-operations will be slow on some processors and invalid on the rest. But perhaps we should make is aligned. A consistent solution would be to round up each chunk (and the header) in a GPI file) to a multiple of the "word" size. This would add some code and add a few bytes in GPI files, but it's probably worthwhile.
I will have to check. My impression was that we just use adress we got from xmalloc, so it should be OK. Aligning data in GPI files my allow fullword access in GPI reader so it may be worthwile.
There are two calls:
checksum = compute_checksum (wb.outbuf, wb.outbufcount);
This one should be ok as you say.
if (compute_checksum (mptr (gpi_file, start_of_nodes), ...
This one depends on start_of_nodes, i.e. the alignment of a chunk in a GPI file.
But compute_checksum works on the whole buffer, so for it we only need a few bytes of zero padding at the end.
s/buffer/chunk/, yes. So we only need to pad and align chunks (alignment follows from padding when done for all chunks), which really isn't so bad (basically negligible) WRT file size.
Yes, we should store the size of gpi_int. But I think that gpidump can easily use this information at runtime to read "any" GPI file (but we should keep version check in place). Of course it costs time, but IMHO gpidump is not speed critical.
OK, might be better. I wasn't concerned so much about running time, but effort required, but perhaps it's not so bad after all. If we use LongestCard internally, and do all size/endianness conversions on input, it might be a rather localized change.
Frank