One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
static gpi_int compute_checksum_original (unsigned char *buf, gpi_int size) { gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++) sum += n * buf[n]; return sum; }
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time= 42.1 sum= 46473731586140096 compute_checksum_original time= 35.0 sum= 46473731586140096 compute_checksum_unrolled time= 18.8 sum= 8551919141529536 compute_checksum_native time= 12.2 sum= -2963523148067389 compute_checksum_shift time= 7.8 sum= -852473248 compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
The unrolled version is pretty trivial, chops about 7 seconds off and has no consequences.
Switching to native int instead of gpi_int affects the value of the checksum, reducing it's precision somewhat, but given the purpose of the checksum, I dont think it would impact on its purpose. This drops the time down by a further 16 seconds. The negative is it changes the existing gpi checksum, meaning all gpis would have to be recompiled.
Switching to using shift instead of multiply drops the time a further 6 seconds. However the PowerPC is very good at shifts (and quite good at multiplys for that matter), so this would be something that should be tested at least on an Intel system to see how it compares.
Switching to just basic addition drops the time a further 4 seconds, but at the loss of quite a lot of bits in the checksum.
I would recommend that the loop be unrolled (trivial and no consequences), and then one of the native checksum routines, or something like it, be used, and the code can then use the new routine for new gpi's, and when reading existing gpi's first check if the checksum matches the new value, or failing that if it matches the old value. That way users will never notice the change. A possible patch for this is included below. I implemented this and it dropped the time to build from 6 to 5 minutes.
I've attached the testchecksum.c file I used to check various checksum times, which could be used to compare the speed to the different routines on an Intel platform to see how the relative speed compares so that a generally good solution could be chosen.
Please note, I dont have any attachment to the particular checksum routines included in testchecksum.c, they were just some quick routines I wrote to try out various solutions to see the affect they would have. Any checksum routine that was fast would be fine - a table driven CRC might work for example.
Enjoy, Peter.
Peter N Lewis wrote:
One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
static gpi_int compute_checksum_original (unsigned char *buf, gpi_int size) { gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++) sum += n * buf[n]; return sum; }
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time= 42.1 sum= 46473731586140096 compute_checksum_original time= 35.0 sum= 46473731586140096 compute_checksum_unrolled time= 18.8 sum= 8551919141529536 compute_checksum_native time= 12.2 sum= -2963523148067389 compute_checksum_shift time= 7.8 sum= -852473248 compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
I did a little test trying also a few other checksums. Remarks:
1) I do not know if we need a checksum at all 2) loading interfaces should not be a bottleneck, we should be able to compile many modules in a single run, loading interfaces just once in the whole run 3) On 32-bit AMD when computing 64-bit checksum the bottlneck is lack of registers, while the fastest checksum is probably limited by DRAM speed 4) I slightly surprised by Mac results: G5 Mac can (and should) use 64-bit arithmetic even for 32-bit applications, also Mac has many registers 5) On 64-bit machines current checksum seem to be reasonably fast 6) The current checksum can be computed using only additions (compute_checksum_ladd) and on AMD 64 it is the second 7) On 32-bit machines current checksum can be computed using mostly 32-bit operations (compute_checksum_short and compute_checksum_sadd) 8) If we want a checksum but do not care which one we use then summing 32-bit words (compute_checksum_lladd) may be good solution
I have attached modified test program.
32 bit AMD 1.250 GHz gpi_int = long long
sizeof: 8 time= 527408 sum= 46473731586140096 compute_checksum_original time= 101484 sum= 46473731586140096 compute_checksum_short time= 365382 sum= 46473731586140096 compute_checksum_ladd time= 120329 sum= 46473731586140096 compute_checksum_sadd time= 62523 sum= -169197487615280 compute_checksum_lladd time= 503241 sum= 46473731586140096 compute_checksum_unrolled time= 140273 sum= 8551919141529536 compute_checksum_native time= 155221 sum= -2963523148067389 compute_checksum_shift time= 74935 sum= -852473248 compute_checksum_add
AMD 1.250 GHz gpi_int = long
sizeof: 4 time= 103675 sum= -694933568 compute_checksum_original time= 100753 sum= -694933568 compute_checksum_short time= 119555 sum= -694933568 compute_checksum_ladd time= 120073 sum= -694933568 compute_checksum_sadd time= 62365 sum= -1545956656 compute_checksum_lladd time= 103780 sum= -694933568 compute_checksum_unrolled time= 104034 sum= -694933568 compute_checksum_native time= 78179 sum= -8794685 compute_checksum_shift time= 72357 sum= -852473248 compute_checksum_add
AMD Athlon(tm) 64 Processor 3000+ (1.8 GHz)
sizeof: 8 time= 48315 sum= 46473731586140096 compute_checksum_original time= 76613 sum= 46473731586140096 compute_checksum_short time= 34075 sum= 46473731586140096 compute_checksum_ladd time= 47050 sum= 46473731586140096 compute_checksum_sadd time= 12772 sum= -169197487615280 compute_checksum_lladd time= 57156 sum= 46473731586140096 compute_checksum_unrolled time= 61925 sum= 8551919141529536 compute_checksum_native time= 42195 sum= -2963523148067389 compute_checksum_shift time= 28440 sum= -852473248 compute_checksum_add
Waldek Hebisch wrote:
Peter N Lewis wrote:
One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
<snip>
I did a little test trying also a few other checksums. Remarks:
- loading interfaces should not be a bottleneck, we should be able to compile many modules in a single run, loading interfaces just once in the whole run
The following table is interesting (I can't remember if I posted it before to this list, sorry if I did). I compares compiling the Mac interfaces in two versions: one huge file versus many small units.
(timing in seconds) GPC GP FPC 1 unit with 205000 lines 6 8 8 270 separate units 43 56 13
- I slightly surprised by Mac results: G5 Mac can (and should) use 64-bit arithmetic even for 32-bit applications, also Mac has many registers
Should it use 64-bit integer math instructions, even when -mpowerpc64 is not passed ?
Here are some more results (on a Mac G5 with a single 1.8 GHz PowerPC). I agree that the results are not too good ...
[G5:testgpc/adriaan/c] adriaan% gpcgcc testchecksum2.c -o testchecksum2 [G5:testgpc/adriaan/c] adriaan% ./testchecksum2 sizeof: 8 time= 598382 sum= 46473731586140096 compute_checksum_original time= 590724 sum= 46473731586140096 compute_checksum_short time= 511008 sum= 46473731586140096 compute_checksum_ladd time= 335757 sum= 46473731586140096 compute_checksum_sadd time= 103442 sum= -56390852623264 compute_checksum_lladd time= 611977 sum= 46473731586140096 compute_checksum_unrolled time= 421558 sum= 8551919141529536 compute_checksum_native time= 644395 sum= -2963523148067389 compute_checksum_shift time= 354873 sum= -852473248 compute_checksum_add
[G5:testgpc/adriaan/c] adriaan% gpcgcc testchecksum2.c -o testchecksum2 -O3 [G5:testgpc/adriaan/c] adriaan% ./testchecksum2 sizeof: 8 time= 496856 sum= 46473731586140096 compute_checksum_original time= 117594 sum= 46473731586140096 compute_checksum_short time= 141747 sum= 46473731586140096 compute_checksum_ladd time= 89039 sum= 46473731586140096 compute_checksum_sadd time= 21697 sum= -56390852623264 compute_checksum_lladd time= 221469 sum= 46473731586140096 compute_checksum_unrolled time= 118434 sum= 8551919141529536 compute_checksum_native time= 87099 sum= -2963523148067389 compute_checksum_shift time= 38747 sum= -852473248 compute_checksum_add
[G5:testgpc/adriaan/c] adriaan% gpcgcc testchecksum2.c -o testchecksum2 -O3 -mpowerpc64 [G5:testgpc/adriaan/c] adriaan% ./testchecksum2 sizeof: 8 time= 315570 sum= 46473731586140096 compute_checksum_original time= 201964 sum= 46473731586140096 compute_checksum_short time= 57288 sum= 46473731586140096 compute_checksum_ladd time= 90656 sum= 46473731586140096 compute_checksum_sadd time= 20596 sum= -56390852623264 compute_checksum_lladd time= 113470 sum= 46473731586140096 compute_checksum_unrolled time= 102893 sum= 8551919141529536 compute_checksum_native time= 64930 sum= -2963523148067389 compute_checksum_shift time= 38770 sum= -852473248 compute_checksum_add
Regards,
Adriaan van Os
Waldek Hebisch wrote:
Peter N Lewis wrote:
One thing that stood out in the profile was that compute_checksum, a two line function that is called once for each loaded gpi is responsible for 12% of the total time. It scans through the entire gpi bytes (in this case, including my GPCMacOSAll 27Meg gpi, once for each of the 250 units in my project), that adds up to around 43 seconds of the 6 minute rebuild time.
static gpi_int compute_checksum_original (unsigned char *buf, gpi_int size) { gpi_int sum = 0, n = 0;
for (n = 0; n < size; n++) sum += n * buf[n]; return sum; }
It appears that gpc_int is a 64 bit int on my system.
I did some timing of some improvements:
time= 42.1 sum= 46473731586140096 compute_checksum_original time= 35.0 sum= 46473731586140096 compute_checksum_unrolled time= 18.8 sum= 8551919141529536 compute_checksum_native time= 12.2 sum= -2963523148067389 compute_checksum_shift time= 7.8 sum= -852473248 compute_checksum_add
(time is roughly the time in seconds gpc is taking just calling compute_checksum on the GPCMacOSAll.gpi in my complete rebuild).
I did a little test trying also a few other checksums. Remarks:
- I do not know if we need a checksum at all
- loading interfaces should not be a bottleneck, we should be able to compile many modules in a single run, loading interfaces just once in the whole run
- On 32-bit AMD when computing 64-bit checksum the bottlneck is lack of registers, while the fastest checksum is probably limited by DRAM speed
- I slightly surprised by Mac results: G5 Mac can (and should) use 64-bit arithmetic even for 32-bit applications, also Mac has many registers
- On 64-bit machines current checksum seem to be reasonably fast
- The current checksum can be computed using only additions (compute_checksum_ladd) and on AMD 64 it is the second
- On 32-bit machines current checksum can be computed using mostly 32-bit operations (compute_checksum_short and compute_checksum_sadd)
- If we want a checksum but do not care which one we use then summing 32-bit words (compute_checksum_lladd) may be good solution
Two more notes:
1. compute_checksum_lladd is not endianness-neutral (but that may not (yet) be a problem).
2. gpidump.pas must be changed also, e.g. to match compute_checksum_lladd
{$local R-, X+, W-} function ComputeChecksum (const Buf: array of Byte) = Sum: GPIInt; var i, iLoopCount: GPIInt; p: PCInteger; begin Sum := 0; iLoopCount := ( High( Buf) + 1) div SizeOf( CInteger); p := PCInteger( @Buf); for i:= 1 to iLoopCount do begin Sum := Sum + p^; p := p + 1 end end; {$endlocal}
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Regards,
Adriaan van Os
{$local R-, X+, W-} function ComputeChecksum (const Buf: array of Byte) = Sum: GPIInt; var i, iLoopCount: GPIInt; p: PCInteger; begin Sum := 0; iLoopCount := ( High( Buf) + 1) div SizeOf( CInteger); p := PCInteger( @Buf); for i:= 1 to iLoopCount do begin Sum := Sum + p^; p := p + 1 end end; {$endlocal}
Sorry, I found a faster version after my post
{$local R-, X+, W-} function ComputeChecksum (const Buf: array of Byte) = Sum: GPIInt; var iLoopCount: GPIInt; pb, pe, p: PCInteger; begin Sum := 0; iLoopCount := ( High( Buf) + 1) div SizeOf( CInteger); pb := PCInteger( @Buf); pe := pb + iLoopCount - 1; for p:= pb to pe do Sum := Sum + p^; end; {$endlocal}
Regards,
Adriaan van Os
Adriaan van Os wrote:
Two more notes:
- compute_checksum_lladd is not endianness-neutral (but that may not
(yet) be a problem).
ATM gpi files depend on exact compiler version, so it seems reasonable to choose endianness of the compiler.
- gpidump.pas must be changed also, e.g. to match
compute_checksum_lladd
To make things clear: for gpc-20051104 I left the checksum "as is". I did not want to change the the checksum without looking at the consequences (need to change gpidump.pas is one of such consequences). At first I wanted just to replace current routine by an equivalent faster routine, but IMHO the speed tests does not give a clear winner (and changing checksum promises much higher gain anyway).
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
gpidump is wrong. OTOH I am slightly surprised by Mac OS X values. Such values are typical for a cross compiler running on 32-bit machine but producing 64-bit code.
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Looks like problem with the size of GPIInt.
Waldek Hebisch wrote:
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
gpidump is wrong. OTOH I am slightly surprised by Mac OS X values. Such values are typical for a cross compiler running on 32-bit machine but producing 64-bit code.
They changed HOST_WIDE_INT to eight bytes because of the -mpowerpc64 switch, which can be used to produce 64-bit code even when running the OS in 32-bit mode (to use the G5 more efficient).
See gcc discussions at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13987 http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00398.html http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02339.html.
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Looks like problem with the size of GPIInt.
I changed it to LongWord, but now run into other problems with gpidump (I will have a look at that later, maybe I can produce a patch).
Regards,
Ariaan van Os
Waldek Hebisch wrote:
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
gpidump is wrong. OTOH I am slightly surprised by Mac OS X values. Such values are typical for a cross compiler running on 32-bit machine but producing 64-bit code.
They changed HOST_WIDE_INT to eight bytes because of the -mpowerpc64 switch, which can be used to produce 64-bit code even when running the OS in 32-bit mode (to use the G5 more efficient).
See gcc discussions at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13987 http://gcc.gnu.org/ml/gcc-patches/2004-06/msg00398.html http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02339.html.
The gcc discussion seems to suggest, by the way, that we use HOST_WIDEST_FAST_INT if available instead of HOST_WIDE_INT for gpi_int.
Regards,
Ariaan van Os
Waldek Hebisch wrote:
Adriaan van Os wrote:
Two more notes:
- compute_checksum_lladd is not endianness-neutral (but that may not
(yet) be a problem).
ATM gpi files depend on exact compiler version, so it seems reasonable to choose endianness of the compiler.
Yes (both frontend and backend version). Much more in GPI files depends on endianness (besides checksums). We could detect and convert at runtime, but of course, it would slow down things even more (which I'm sure you wouldn't like too much, Adriaan). The only benefit would only to people who cross-compile *and* can't recompile on each host system for some strange reason ...
- gpidump.pas must be changed also, e.g. to match
compute_checksum_lladd
To make things clear: for gpc-20051104 I left the checksum "as is". I did not want to change the the checksum without looking at the consequences (need to change gpidump.pas is one of such consequences). At first I wanted just to replace current routine by an equivalent faster routine, but IMHO the speed tests does not give a clear winner (and changing checksum promises much higher gain anyway).
As far as I'm concerned, feel free to change the checksums. I don't insist on the current algorithm (I think the comment in module.c doesn't really suggest I do ;-). But I wouldn't like to abandon checksums. AFAICS, they do catch some cases which would otherwise lead to obscure bugs. Perhaps GP will avoid such cases in the future (when all GP bugs are fixed :-), but even though --automake will then be faded out, such problems can still arise with hand-made make rules (which will probably always be used).
Also, please note that the buffer passed to compute_checksum is currently not aligned, so larger than byte-operations will be slow on some processors and invalid on the rest. But perhaps we should make is aligned. A consistent solution would be to round up each chunk (and the header) in a GPI file) to a multiple of the "word" size. This would add some code and add a few bytes in GPI files, but it's probably worthwhile.
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
gpidump is wrong. OTOH I am slightly surprised by Mac OS X values. Such values are typical for a cross compiler running on 32-bit machine but producing 64-bit code.
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Looks like problem with the size of GPIInt.
Yes. Since these types are no standard C types, it might not be so easy to get to them in a Pascal file.
We could store the size of the type used in the GPI file (similar to the endianness marker). The size itself can always be stored in 1 byte, especially if measured in bytes not bits, so no endianness and size issues here.
Then gpidump (and gpc) could at least give a proper error message on size mismatch. Or perhaps we could even extract the size at gpidump-build-time (perhaps in mk-t-inc) from a GPI file of one of the freshly compiled (with the target compiler) RTS units, and build gpidump using a matching `Cardinal attribute (size = ...)' type.
Frank
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Adriaan van Os wrote:
Two more notes:
- compute_checksum_lladd is not endianness-neutral (but that may not
(yet) be a problem).
ATM gpi files depend on exact compiler version, so it seems reasonable to choose endianness of the compiler.
Yes (both frontend and backend version). Much more in GPI files depends on endianness (besides checksums). We could detect and convert at runtime, but of course, it would slow down things even more (which I'm sure you wouldn't like too much, Adriaan). The only benefit would only to people who cross-compile *and* can't recompile on each host system for some strange reason ...
Well, I belive that we can make GPI reader/writer both faster and more portable. For example we could bulk convert integers between endiannes, so that only folks that need comatibility will pay for it. But ATM I am looking low hanging fruits ...
- gpidump.pas must be changed also, e.g. to match
compute_checksum_lladd
To make things clear: for gpc-20051104 I left the checksum "as is". I did not want to change the the checksum without looking at the consequences (need to change gpidump.pas is one of such consequences). At first I wanted just to replace current routine by an equivalent faster routine, but IMHO the speed tests does not give a clear winner (and changing checksum promises much higher gain anyway).
As far as I'm concerned, feel free to change the checksums. I don't insist on the current algorithm (I think the comment in module.c doesn't really suggest I do ;-). But I wouldn't like to abandon checksums. AFAICS, they do catch some cases which would otherwise lead to obscure bugs. Perhaps GP will avoid such cases in the future (when all GP bugs are fixed :-), but even though --automake will then be faded out, such problems can still arise with hand-made make rules (which will probably always be used).
IIUC the most likely bug avoided due to checksums is reading inconsistent GPI file. That can be detected putting a random number (stamp) in the GPI header and checking that at the end of reading (or when the reader finds something wrong) the stamp is still the same. Still, a simple checksum should take less then 1% of compile time so we can live with it (BTW, backend folks are willing to risk serious bugs to get 1% speed improvement of the compiler).
Concering GP versus `--automake': I think we need to fix main automake problems if we want GP to work well. Basically:
1) to have separate GPI file for implementation (so that interface GPI stays consistent during compilation) 2) compile iterface and implementation separately (even if in the same file) 3) store dependencies in GPI files
For `--automake' specifically:
4) allow only _one_ compilation of given part, anyting more _is_ cyclic dependency, hence illegal (current way is just work-around for lack of 2)
For make (and possibly GP):
5) have option to print _all_ dependencies (includes + imports)
1, 3 and 5 could make GP both simpler and more reliable (AFAICS GP itself imlements/uses 2 and 4).
Also, please note that the buffer passed to compute_checksum is currently not aligned, so larger than byte-operations will be slow on some processors and invalid on the rest. But perhaps we should make is aligned. A consistent solution would be to round up each chunk (and the header) in a GPI file) to a multiple of the "word" size. This would add some code and add a few bytes in GPI files, but it's probably worthwhile.
I will have to check. My impression was that we just use adress we got from xmalloc, so it should be OK. Aligning data in GPI files my allow fullword access in GPI reader so it may be worthwile. But compute_checksum works on the whole buffer, so for it we only need a few bytes of zero padding at the end.
What I don't understand is that gpidump.pas uses the MedCard type, which is four bytes on Mac OS X.
type GPIInt = MedCard;
where gpc.h uses HOST_WIDE_INT, which is eight bytes on Mac OS X.
typedef HOST_WIDE_INT gpi_int;
gpidump is wrong. OTOH I am slightly surprised by Mac OS X values. Such values are typical for a cross compiler running on 32-bit machine but producing 64-bit code.
Using gpidump (with an unchanged gpc-200521104) returns "invalid endianness marker".
Looks like problem with the size of GPIInt.
Yes. Since these types are no standard C types, it might not be so easy to get to them in a Pascal file.
We could store the size of the type used in the GPI file (similar to the endianness marker). The size itself can always be stored in 1 byte, especially if measured in bytes not bits, so no endianness and size issues here.
Then gpidump (and gpc) could at least give a proper error message on size mismatch. Or perhaps we could even extract the size at gpidump-build-time (perhaps in mk-t-inc) from a GPI file of one of the freshly compiled (with the target compiler) RTS units, and build gpidump using a matching `Cardinal attribute (size = ...)' type.
Yes, we should store the size of gpi_int. But I think that gpidump can easily use this information at runtime to read "any" GPI file (but we should keep version check in place). Of course it costs time, but IMHO gpidump is not speed critical.
Waldek Hebisch wrote:
Frank Heckenbach wrote:
Yes (both frontend and backend version). Much more in GPI files depends on endianness (besides checksums). We could detect and convert at runtime, but of course, it would slow down things even more (which I'm sure you wouldn't like too much, Adriaan). The only benefit would only to people who cross-compile *and* can't recompile on each host system for some strange reason ...
Well, I belive that we can make GPI reader/writer both faster and more portable. For example we could bulk convert integers between endiannes, so that only folks that need comatibility will pay for it. But ATM I am looking low hanging fruits ...
Yes, it's not so trivial, as the nodes don't consist only of integers, also of bytes/bit fields and strings. So at least some parsing effort would be required, comparable in size to store_node_fields and load_node, and has to be kept in sync with them (=> more maintenance effort for future changes).
As far as I'm concerned, feel free to change the checksums. I don't insist on the current algorithm (I think the comment in module.c doesn't really suggest I do ;-). But I wouldn't like to abandon checksums. AFAICS, they do catch some cases which would otherwise lead to obscure bugs. Perhaps GP will avoid such cases in the future (when all GP bugs are fixed :-), but even though --automake will then be faded out, such problems can still arise with hand-made make rules (which will probably always be used).
IIUC the most likely bug avoided due to checksums is reading inconsistent GPI file. That can be detected putting a random number (stamp) in the GPI header and checking that at the end of reading (or when the reader finds something wrong) the stamp is still the same.
This wouldn't protect against corruption of the file's contents (which could happen on bad media, after system crashes, etc.). You must be thinking about an entirely different problem, when a number in the header changes while reading the file IIUYC!? Are you thinking about two simultaneous processes writing to the same file or something like that?
But anyway, there are several uses of checksums, see the notes in internals.texi. Perhaps we've been talking about different things all the time. Protection against inconsistent GPI files is just one, and IMHO the least important, one. The more important one is to protect against inconsistent imports, including indirect imports. That's a problem that would lead to strange and very hard to trace bugs. A perfectly working GP should be able to avoid this situation from happening. GPC's automake cannot always do this with indirect imports. Hand-written make rules may or may not do this, depending on how they're written and how complex the project is. Therefore I think we should keep the check in GPC to reject such invalid imports. (Of course, checksums may not be the only way to achieve this.)
Still, a simple checksum should take less then 1% of compile time so we can live with it (BTW, backend folks are willing to risk serious bugs to get 1% speed improvement of the compiler).
Didn't I ever mention that I don't always agree with all design decisions of backend folks ...? IMHO, this is a particularly bad example (but perhaps benchmark-obsessed people force them to). To me the 1980s are over. (But according to the theory of relativity, to some people they may not. ;-)
Concering GP versus `--automake': I think we need to fix main automake problems if we want GP to work well. Basically:
Just to clarify what we're talking about:
- If you mean by "automake problems" problems with the current `--automake' implementation, I can't see why we need to fix them in order to make GP work well, as GP is there to replace automake.
- Main automake problems to me are difficulties handling indirect recompilation needs. -- That's obvious from the way of doing things: Automake only has a local view, so the best it can do is try to rescue things in the last minute, i.e. recompiling other modules when reading their GPI files was already started. Add indirect requirements and cyclic dependencies to it, and you get all the problems we have with automake. Whereas GP (just like make) has a global view and can do things in the right order from the beginning.
- to have separate GPI file for implementation (so that interface
GPI stays consistent during compilation)
This would enable a `-j' option to compile the implementation of A while another process compiles a B that uses A. IMHO, that would be a minor optimization, otherwise I can't see it as a big problem.
- compile iterface and implementation separately (even if in the same
file)
Now referring to your 1% above ... ;-) I think this adds to compilation time (e.g., by having to load all imported GPIs twice), and I'd guess it would be a little more than 1%. Therefore I'd prefer (and that's what GP does) to do so only when needed, i.e. with cyclic imports, and not in the normal case.
- store dependencies in GPI files
I deliberately avoided this in GP, to make GP independent of GPC's internals. If you want to change this, please at least put it on a high enough level, so GP won't have to know too much about GPC's internals (i.e., only the outermost layer of the GPI file format, presumably).
For `--automake' specifically:
- allow only _one_ compilation of given part, anyting more _is_
cyclic dependency, hence illegal (current way is just work-around for lack of 2)
All of --automake is just a work-around for a (previously) missing GP, so my answer is just to drop --automake. (Until we had GP, this was a compromise, as some people needed cyclic dependencies (or claimed to ;-), and could only use --automake. We couldn't do the whole thing in --automake, but at least make some common cases work most of the time.)
For make (and possibly GP):
- have option to print _all_ dependencies (includes + imports)
It might have some advantages, but also drawbacks. In particular, AFAICS, we'd need a do-nothing parse run in GPC, i.e., basically a nop-flag check in every nontrivial action.
OTOH, GP could have its own parser, initially copied from GPC's parser, but stripped of anything non-import related (which is a lot). Of course, the current Q&D parser in GP is not the final word, but I'd still tend to favour a lean parser in GP, unless you know a way to do it in GPC without rather pervasive changes in the parser.
The GP parser as I envision it would use the same lexer as GPC (the lexer doesn't do serious things in its actions anyway, so the problems I mentioned for the parser wouldn't apply), and the preprocessor (according to my plans) would be integrated with the lexer. So then GP could even extract include information directly from it's integrated preprocessor. This would actually save one process call overhead per module from GP, and may therefore be worthwhile in itself. (You may notice that we'd then have two programs that contain the same preprocessor and lexer code, but since it's the same source code, I wouldn't mind this.)
Sure, this doesn't cater for semi-automatic generation of Makefiles. But quite frankly, I don't see this as a main requirement we have to solve. (Actually, most projects have a common formatting, so a simple sed script will do the trick.) But perhaps we could even let GP take this part, by adding a --print-dependencies option to it. This would rid GPC from another side-feature, which is probably not a bad idea either, looking at its complexity ...
I'm not going into technical details of my plans for the integrated preprocessor here, so that this mail won't become too long. We can discuss them separately if wanted.
Also, please note that the buffer passed to compute_checksum is currently not aligned, so larger than byte-operations will be slow on some processors and invalid on the rest. But perhaps we should make is aligned. A consistent solution would be to round up each chunk (and the header) in a GPI file) to a multiple of the "word" size. This would add some code and add a few bytes in GPI files, but it's probably worthwhile.
I will have to check. My impression was that we just use adress we got from xmalloc, so it should be OK. Aligning data in GPI files my allow fullword access in GPI reader so it may be worthwile.
There are two calls:
checksum = compute_checksum (wb.outbuf, wb.outbufcount);
This one should be ok as you say.
if (compute_checksum (mptr (gpi_file, start_of_nodes), ...
This one depends on start_of_nodes, i.e. the alignment of a chunk in a GPI file.
But compute_checksum works on the whole buffer, so for it we only need a few bytes of zero padding at the end.
s/buffer/chunk/, yes. So we only need to pad and align chunks (alignment follows from padding when done for all chunks), which really isn't so bad (basically negligible) WRT file size.
Yes, we should store the size of gpi_int. But I think that gpidump can easily use this information at runtime to read "any" GPI file (but we should keep version check in place). Of course it costs time, but IMHO gpidump is not speed critical.
OK, might be better. I wasn't concerned so much about running time, but effort required, but perhaps it's not so bad after all. If we use LongestCard internally, and do all size/endianness conversions on input, it might be a rather localized change.
Frank
Frank Heckenbach wrote:
Concering GP versus `--automake': I think we need to fix main automake problems if we want GP to work well. Basically:
Just to clarify what we're talking about:
If you mean by "automake problems" problems with the current `--automake' implementation, I can't see why we need to fix them in order to make GP work well, as GP is there to replace automake.
Main automake problems to me are difficulties handling indirect recompilation needs. -- That's obvious from the way of doing things: Automake only has a local view, so the best it can do is try to rescue things in the last minute, i.e. recompiling other modules when reading their GPI files was already started. Add indirect requirements and cyclic dependencies to it, and you get all the problems we have with automake. Whereas GP (just like make) has a global view and can do things in the right order from the beginning.
- to have separate GPI file for implementation (so that interface
GPI stays consistent during compilation)
This would enable a `-j' option to compile the implementation of A while another process compiles a B that uses A. IMHO, that would be a minor optimization, otherwise I can't see it as a big problem.
- compile iterface and implementation separately (even if in the same
file)
Now referring to your 1% above ... ;-) I think this adds to compilation time (e.g., by having to load all imported GPIs twice), and I'd guess it would be a little more than 1%. Therefore I'd prefer (and that's what GP does) to do so only when needed, i.e. with cyclic imports, and not in the normal case.
Compiling interface and implementation parts separately indeed adds the overhead of importing GPIs twice (unless, of course, the symbol tables are stored in shared memory and passed as a parameter to GPC). However, this applies to a full build. In actual practice, you often are working on one unit at a time (maybe making small changes to type definitions in earlier units). So, what you want is the fastest way to compile *that one* file. Compiling implementation parts of other units is at that moment just a waste of time.
Typically, while developing there will be many more compiiations of single units than full builds. For example, I am porting now a big application to GPC and it is progressing very nicely, but compile times are horrible. The scenario:
1. compile a unit 2. fix problems in the unit (mostly in the implementation part) 3. recompile the same unit (takes up to 20 minutes, even without any interface changes) 4. fix more problems, 5. goto 1
I guess there is a bug in GP (probably related to cyclic dependencies) that triggers (too many) recompilations, even when only the implementation part of a unit has changed (or maybe even when nothing has changed).
Anyway (since I have a lot of time available during compile runs) I volunteer: * to find the GP dependency bug or bugs * to add a switch to GP to compile only the interface parts of used units, to speed up compiling a single unit.
Regards,
Adriaan van Os
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Frank Heckenbach wrote:
Yes (both frontend and backend version). Much more in GPI files depends on endianness (besides checksums). We could detect and convert at runtime, but of course, it would slow down things even more (which I'm sure you wouldn't like too much, Adriaan). The only benefit would only to people who cross-compile *and* can't recompile on each host system for some strange reason ...
Well, I belive that we can make GPI reader/writer both faster and more portable. For example we could bulk convert integers between endiannes, so that only folks that need comatibility will pay for it. But ATM I am looking low hanging fruits ...
Yes, it's not so trivial, as the nodes don't consist only of integers, also of bytes/bit fields and strings. So at least some parsing effort would be required, comparable in size to store_node_fields and load_node, and has to be kept in sync with them (=> more maintenance effort for future changes).
As far as I'm concerned, feel free to change the checksums. I don't insist on the current algorithm (I think the comment in module.c doesn't really suggest I do ;-). But I wouldn't like to abandon checksums. AFAICS, they do catch some cases which would otherwise lead to obscure bugs. Perhaps GP will avoid such cases in the future (when all GP bugs are fixed :-), but even though --automake will then be faded out, such problems can still arise with hand-made make rules (which will probably always be used).
IIUC the most likely bug avoided due to checksums is reading inconsistent GPI file. That can be detected putting a random number (stamp) in the GPI header and checking that at the end of reading (or when the reader finds something wrong) the stamp is still the same.
This wouldn't protect against corruption of the file's contents (which could happen on bad media, after system crashes, etc.). You must be thinking about an entirely different problem, when a number in the header changes while reading the file IIUYC!? Are you thinking about two simultaneous processes writing to the same file or something like that?
About writing to GPI during compilation. It caused by multiple recompilations during a run but probably makes problem with multiple recompilations much worse. Yes, random stamp does not detect bad media. It can protect against crashes (if you write a copy at the end of file).
But anyway, there are several uses of checksums, see the notes in internals.texi. Perhaps we've been talking about different things all the time. Protection against inconsistent GPI files is just one, and IMHO the least important, one. The more important one is to protect against inconsistent imports, including indirect imports.
AFAICS checksums buys you only one advantage over random stamps: you can verify that the checksum you wrote out agrees with the content you read in. In other words, checksums protect against bad media (buggy OS counts as bad media too). For all other uses random stamps work as well.
I am not sure how important is protection against bad media: most external media use ECC, so probability of error is quite low, and when undetected error occur it is likely to be so serious that we detect incosistency on reading.
Concering GP versus `--automake': I think we need to fix main automake problems if we want GP to work well. Basically:
Just to clarify what we're talking about:
If you mean by "automake problems" problems with the current `--automake' implementation, I can't see why we need to fix them in order to make GP work well, as GP is there to replace automake.
Main automake problems to me are difficulties handling indirect recompilation needs. -- That's obvious from the way of doing things: Automake only has a local view, so the best it can do is try to rescue things in the last minute, i.e. recompiling other modules when reading their GPI files was already started. Add indirect requirements and cyclic dependencies to it, and you get all the problems we have with automake. Whereas GP (just like make) has a global view and can do things in the right order from the beginning.
My point is: cyclic dependencies are an artifact which will vanish when we properly distinguish implementations from interfaces. That is fundamental problem, common to both automake and GP. Neither make nor automake will work well with cyclic dependencies.
- to have separate GPI file for implementation (so that interface
GPI stays consistent during compilation)
This would enable a `-j' option to compile the implementation of A while another process compiles a B that uses A. IMHO, that would be a minor optimization, otherwise I can't see it as a big problem.
ATM GP want to recompile the same file over and over again. You think that more clever check in GP will solve this. Maybe. But the traditional method, which works quite well is: compare timestamps. If timestamps of dependencies are earlier then of target, then target is up to date. That works well if dependencies form a dag.
Mixing implementation with interface breaks timestamps. I understand that in GP you want to do better than timestamps allow. But I find it slightly inconsistent that you want checksums in GPI files (here you want redundancy for better error checking), but in case of GP you belive that you can ignore possibility of using timestamps as a sanity check.
We wrote about this multiple time without a conlusion. So a simple question is: would you object if I add a new GPI file, say `module-imp.gpi' and put all implementation info here?
- compile iterface and implementation separately (even if in the same
file)
Now referring to your 1% above ... ;-) I think this adds to compilation time (e.g., by having to load all imported GPIs twice), and I'd guess it would be a little more than 1%. Therefore I'd prefer (and that's what GP does) to do so only when needed, i.e. with cyclic imports, and not in the normal case.
Sure, you can try to optimize. But the process should be equivalent, otherwise you will have spurious recompilations and lose more time.
For make (and possibly GP):
- have option to print _all_ dependencies (includes + imports)
It might have some advantages, but also drawbacks. In particular, AFAICS, we'd need a do-nothing parse run in GPC, i.e., basically a nop-flag check in every nontrivial action.
We do have `-fsyntax-only' already. In fact, prefered way is to compile file and find dependencies at the same time -- then make uses old dependencies, if they are OK (most of the time) then it goes on, if not make compiles new dependencies and again compiles the file.
Compiling interface and implementation parts separately indeed adds the overhead of importing GPIs twice (unless, of course, the symbol tables are stored in shared memory and passed as a parameter to GPC). However, this applies to a full build. In actual practice, you often are working on one unit at a time (maybe making small changes to type definitions in earlier units). So, what you want is the fastest way to compile *that one* file. Compiling implementation parts of other units is at that moment just a waste of time.
Typically, while developing there will be many more compiiations of single units than full builds. For example, I am porting now a big application to GPC and it is progressing very nicely, but compile times are horrible. The scenario:
- compile a unit
- fix problems in the unit (mostly in the implementation part)
- recompile the same unit (takes up to 20 minutes, even without any
interface changes) 4. fix more problems, 5. goto 1
I guess there is a bug in GP (probably related to cyclic dependencies) that triggers (too many) recompilations, even when only the implementation part of a unit has changed (or maybe even when nothing has changed).
Anyway (since I have a lot of time available during compile runs) I volunteer:
- to find the GP dependency bug or bugs
- to add a switch to GP to compile only the interface parts of used
units, to speed up compiling a single unit.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies that successive linking will fail. Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast as possible. Imported units that need recompilation are compiled --interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain is dramatic.
Regards,
Adriaan van Os
Adriaan van Os wrote:
Compiling interface and implementation parts separately indeed adds the overhead of importing GPIs twice (unless, of course, the symbol tables are stored in shared memory and passed as a parameter to GPC). However, this applies to a full build. In actual practice, you often are working on one unit at a time (maybe making small changes to type definitions in earlier units). So, what you want is the fastest way to compile *that one* file. Compiling implementation parts of other units is at that moment just a waste of time.
Typically, while developing there will be many more compiiations of single units than full builds. For example, I am porting now a big application to GPC and it is progressing very nicely, but compile times are horrible. The scenario:
- compile a unit
- fix problems in the unit (mostly in the implementation part)
- recompile the same unit (takes up to 20 minutes, even without any
interface changes) 4. fix more problems, 5. goto 1
I guess there is a bug in GP (probably related to cyclic dependencies) that triggers (too many) recompilations, even when only the implementation part of a unit has changed (or maybe even when nothing has changed).
Anyway (since I have a lot of time available during compile runs) I volunteer:
- to find the GP dependency bug or bugs
- to add a switch to GP to compile only the interface parts of used
units, to speed up compiling a single unit.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies that successive linking will fail. Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast as possible. Imported units that need recompilation are compiled --interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain is dramatic.
IIUC, this is only necessary because of the bug you suppose in your previous mail (otherwise no other units would have to be recompiled if not changed)? So it might be a better use of time to concentrate on the acutal bug than on temporary work-arounds. Did you already send a reproducible test case (if you can't or don't want to fix the problem in gp yourself)?
Frank
Frank Heckenbach wrote:
Adriaan van Os wrote:
Compiling interface and implementation parts separately indeed adds the overhead of importing GPIs twice (unless, of course, the symbol tables are stored in shared memory and passed as a parameter to GPC). However, this applies to a full build. In actual practice, you often are working on one unit at a time (maybe making small changes to type definitions in earlier units). So, what you want is the fastest way to compile *that one* file. Compiling implementation parts of other units is at that moment just a waste of time.
Typically, while developing there will be many more compiiations of single units than full builds. For example, I am porting now a big application to GPC and it is progressing very nicely, but compile times are horrible. The scenario:
- compile a unit
- fix problems in the unit (mostly in the implementation part)
- recompile the same unit (takes up to 20 minutes, even without any
interface changes) 4. fix more problems, 5. goto 1
I guess there is a bug in GP (probably related to cyclic dependencies) that triggers (too many) recompilations, even when only the implementation part of a unit has changed (or maybe even when nothing has changed).
Anyway (since I have a lot of time available during compile runs) I volunteer:
- to find the GP dependency bug or bugs
- to add a switch to GP to compile only the interface parts of used
units, to speed up compiling a single unit.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies that successive linking will fail. Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast as possible. Imported units that need recompilation are compiled --interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain is dramatic.
IIUC, this is only necessary because of the bug you suppose in your previous mail (otherwise no other units would have to be recompiled if not changed)? So it might be a better use of time to concentrate on the acutal bug than on temporary work-arounds.
No, I consider this a valuable feature, although the impact may in general not be so dramatic as in my current project, porting an application to GPC with many cyclic dependencies (which I didn't write myself).
Did you already send a reproducible test case (if you can't or don't want to fix the problem in gp yourself)?
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the compilation I see the IDE's progress bar and line counter move line by line, often lines with calculations).
Regards,
Adriaan van Os
Hi all,
Just my two cents: Like Mr Van Os, I have myself the problem of cyclic dependencies in many programs I ported from SUN Pascal to GNU Pascal. There are hundreds of separate modules with cyclic dependencies up to 8th level.
For now, I use make to build the programs because of these dependencies. I have to call it two times to build the interfaces, because there are also cyclic dependencies in some interface sections that import others. Then I call it a third time to build the objects and link the programs. I know this is not the cleanest way but at least it works.
I tried gp and it did not worked because of these dependencies problem. But at least, it was useful because it helped me removing some of them. However, it would be a big job to remove them all and I have no time to do this for now.
If gp could manage the cyclic dependencies (I think it should continue to display them anyway) I would use it for sure. So I agree with Mr Van Os that it is a valuable feature in this case (porting programs to GNU Pascal).
Although I am not sure if I can completely build the programs with it since they link with many third party libraries and I have not found the way to do it in the gp documentation yet.
Regards
Pascal Viandier pascal@accovia.com
-----Message d'origine----- De : gpc-owner@gnu.de [mailto:gpc-owner@gnu.de] De la part de Adriaan van Os Envoyé : November 11, 2005 02:04 À : gpc@gnu.de Objet : Re: compiling with gp (was
Frank Heckenbach wrote:
Adriaan van Os wrote:
Compiling interface and implementation parts separately indeed adds the overhead of importing GPIs twice (unless, of course, the symbol tables are stored in shared memory and passed as a parameter to GPC). However, this applies to a full build. In actual practice, you often are working on one unit at a time (maybe making small changes to type definitions in earlier units). So, what you want is the fastest way to compile *that one* file. Compiling implementation parts of other units is at that moment just a waste of time.
Typically, while developing there will be many more compiiations of single units than full builds. For example, I am porting now a big application to GPC and it is progressing very nicely, but compile times are horrible. The scenario:
- compile a unit
- fix problems in the unit (mostly in the implementation part)
- recompile the same unit (takes up to 20 minutes, even without any
interface changes) 4. fix more problems, 5. goto 1
I guess there is a bug in GP (probably related to cyclic dependencies) that triggers (too many) recompilations, even when only the implementation part of a unit has changed (or maybe even when nothing has changed).
Anyway (since I have a lot of time available during compile runs) I volunteer:
- to find the GP dependency bug or bugs
- to add a switch to GP to compile only the interface parts of used
units, to speed up compiling a single unit.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies that successive linking will fail. Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast as possible. Imported units that need recompilation are compiled --interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain is dramatic.
IIUC, this is only necessary because of the bug you suppose in your previous mail (otherwise no other units would have to be recompiled if not changed)? So it might be a better use of time to concentrate on the acutal bug than on temporary work-arounds.
No, I consider this a valuable feature, although the impact may in general not be so dramatic as in my current project, porting an application to GPC with many cyclic dependencies (which I didn't write myself).
Did you already send a reproducible test case (if you can't or don't want to fix the problem in gp yourself)?
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the compilation I see the IDE's progress bar and line counter move line by line, often lines with calculations).
Regards,
Adriaan van Os
Pascal Viandier wrote:
Just my two cents: Like Mr Van Os, I have myself the problem of cyclic dependencies in many programs I ported from SUN Pascal to GNU Pascal. There are hundreds of separate modules with cyclic dependencies up to 8th level.
To avoid misunderstandings: GP generally should be able to handle cyclic dependencies. I've used it compile a project with ~200 units, all interdependent (not written by me, obviously :-).
So we don't need to discuss whether to do that.
For now, I use make to build the programs because of these dependencies. I have to call it two times to build the interfaces, because there are also cyclic dependencies in some interface sections that import others.
Cyclic dependencies within interfaces are forbidden, both in Borland and Extended Pascal. (I.e., interface A uses interface B, and interface B uses interface A.) GP will also reject this, and there's really no sane way to do this, as you could easily construct circular definitions and contradictions this way.
What you can do is: Interface A uses interface B, and implementation B uses interface A. Then GP should, in this order, compile interface B, all of A, implementation B.
If that's what you need, then GP should work for you. There may be abug, perhaps the same one Adriaan found, and then, of course, I'd like to fix it, when I know what it is ...
Although I am not sure if I can completely build the programs with it since they link with many third party libraries and I have not found the way to do it in the gp documentation yet.
Libraries are easy. There are two ways, both work the same with GP as with automake, and the second one also with manual make:
- Put {$L} directives in the source (see e.g., the GMP and RegEx units that come with GPC).
- Add -l options on the command line.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies that successive linking will fail. Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast as possible. Imported units that need recompilation are compiled --interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain is dramatic.
IIUC, this is only necessary because of the bug you suppose in your previous mail (otherwise no other units would have to be recompiled if not changed)? So it might be a better use of time to concentrate on the acutal bug than on temporary work-arounds.
No, I consider this a valuable feature, although the impact may in general not be so dramatic as in my current project, porting an application to GPC with many cyclic dependencies (which I didn't write myself).
Perhaps I don't yet quite understand it. Provided GP makes no mistakes and imports are not recompiled unless necessary. So I suppose you mean the situation that you changed the implementation of an imported unit? But you don't want to compile (and therefore check for syntax errors) this other unit?
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the compilation I see the IDE's progress bar and line counter move line by line, often lines with calculations).
I see. BTW, Waldek are you interested in the general solution of this issue (including overloaded routines)? I'm not too much ATM (there are many more urgent issues). If you aren't either, perhaps I could at least try some quick fix to ease this problem somewhat. -- Still using name-based search, with all its disadvantages, but perhaps storing a list of actually defined operators in the OPERATOR_DECL (this will also be a preparation for the real solution), and then for each operator retrieve the type names and compare them -- I suppose if we get rid of the ACONCAT and get_identifier in the loop this way, this might make things much faster.
Frank
Frank Heckenbach wrote:
No, I consider this a valuable feature, although the impact may in general not be so dramatic as in my current project, porting an application to GPC with many cyclic dependencies (which I didn't write myself).
Perhaps I don't yet quite understand it. Provided GP makes no mistakes and imports are not recompiled unless necessary. So I suppose you mean the situation that you changed the implementation of an imported unit? But you don't want to compile (and therefore check for syntax errors) this other unit?
OK, I will try to explain what I mean. Suppose you have a project like this (without USES clauses in IMPLEMENTATION sections, all very well structured)
Unit A001 Unit A002 .... Unit A134 .... Unit A200 Program P
Say the build time for the whole project is about half an hour or an hour (I am not exaggerating). Now, you are working on Unit A134 happily porting the project to GPC, fixing numerous compilation problems. While doing so, you discover that you need to add a standard compatibility routine to Unit A001 or that you must change a string parameter from VAR to CONST in Unit A002. Doing that would mean ... recompiling Units A001 to A133, in other words waiting half an hour ! But ... with --compile-file you can quickly recompile only the interface sections of Unit A001 to A133 to continue working on Unit A134. You reduce the time you have to wait from half an hour to half a minute. That is a huge gain of time, even in a perfectly well structured project and with an absolutely perfect gp !
I hope this explains why I am so fond of --compile-file and --check-file.
Regards,
Adriaan van Os
Hi,
The schema of the cyclic dependencies I face is:
. Module A implementation section imports Module B . Module B implementation section imports Module A and Module C . Module C implementation section imports Module A and Module C
Please note the dependencies are in the implementation sections, not in the interfaces. Although for each module the interface section is in a separate file included in the appropriate implementation file (i.e. A.pas includes A.h but no other file), I think this is not the cause of the problem.
With this, gp says: cyclic interface dependency A.pas -> B.pas -> C.pas and it stops.
Hope this schema will help clarify the situation.
Regards
Pascal pascal@accovia.com
-----Message d'origine----- De : gpc-owner@gnu.de [mailto:gpc-owner@gnu.de] De la part de Frank Heckenbach Envoyé : November 11, 2005 09:03 À : gpc@gnu.de Objet : Re: RE : compiling with gp (was
Pascal Viandier wrote:
Just my two cents: Like Mr Van Os, I have myself the problem of cyclic dependencies in many programs I ported from SUN Pascal to GNU Pascal. There are hundreds of separate modules with cyclic dependencies up to 8th level.
To avoid misunderstandings: GP generally should be able to handle cyclic dependencies. I've used it compile a project with ~200 units, all interdependent (not written by me, obviously :-).
So we don't need to discuss whether to do that.
For now, I use make to build the programs because of these dependencies.
I
have to call it two times to build the interfaces, because there are
also
cyclic dependencies in some interface sections that import others.
Cyclic dependencies within interfaces are forbidden, both in Borland and Extended Pascal. (I.e., interface A uses interface B, and interface B uses interface A.) GP will also reject this, and there's really no sane way to do this, as you could easily construct circular definitions and contradictions this way.
What you can do is: Interface A uses interface B, and implementation B uses interface A. Then GP should, in this order, compile interface B, all of A, implementation B.
If that's what you need, then GP should work for you. There may be abug, perhaps the same one Adriaan found, and then, of course, I'd like to fix it, when I know what it is ...
Although I am not sure if I can completely build the programs with it
since
they link with many third party libraries and I have not found the way
to do
it in the gp documentation yet.
Libraries are easy. There are two ways, both work the same with GP as with automake, and the second one also with manual make:
Put {$L} directives in the source (see e.g., the GMP and RegEx units that come with GPC).
Add -l options on the command line.
I am now implementing for gp:
--compile-file Compile a file as fast as possible. Imported units that need recompilation are compiled --interface-only, which implies
that
successive linking will fail.
Therefore, this option is useful during the development phase only. This option implies -c --check-file Check the syntax of a file as fast
as
possible. Imported units that need recompilation are compiled
--interface-only. This option implies -c
It mostly works already (I will post a diff later). The time gain
is
dramatic.
IIUC, this is only necessary because of the bug you suppose in your previous mail (otherwise no other units would have to be recompiled if not changed)? So it might be a better use of time to concentrate on the acutal bug than on temporary work-arounds.
No, I consider this a valuable feature, although the impact may in general not be so dramatic as in my current project, porting an application to GPC with many cyclic dependencies (which I didn't write myself).
Perhaps I don't yet quite understand it. Provided GP makes no mistakes and imports are not recompiled unless necessary. So I suppose you mean the situation that you changed the implementation of an imported unit? But you don't want to compile (and therefore check for syntax errors) this other unit?
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the
compilation
I see the IDE's progress bar and line counter move line by line, often lines with calculations).
I see. BTW, Waldek are you interested in the general solution of this issue (including overloaded routines)? I'm not too much ATM (there are many more urgent issues). If you aren't either, perhaps I could at least try some quick fix to ease this problem somewhat. -- Still using name-based search, with all its disadvantages, but perhaps storing a list of actually defined operators in the OPERATOR_DECL (this will also be a preparation for the real solution), and then for each operator retrieve the type names and compare them -- I suppose if we get rid of the ACONCAT and get_identifier in the loop this way, this might make things much faster.
Frank
-- Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E GPC To-Do list, latest features, fixed bugs: http://www.gnu-pascal.de/todo.html GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE D101 CD02 4C9D 0FE0 E5E8
Pascal Viandier wrote:
The schema of the cyclic dependencies I face is:
. Module A implementation section imports Module B . Module B implementation section imports Module A and Module C . Module C implementation section imports Module A and Module B (not Module C)
Please note the dependencies are in the implementation sections, not in the interfaces.
Although for each module the interface section is in a separate file included in the appropriate implementation file (i.e. A.pas includes A.h but no other file), I think this is not the cause of the problem. With this, gp says: cyclic interface dependency A.pas -> B.pas -> C.pas and it stops.
Oh yes, it's a bug, thanks for finding.
Here's a patch (yes, it's a quick fix, but this code will change anyway ...). You need flex to rebuild, or get the archive from my web site which includes the flex generated file.
This bug *might* also cause some spurious recompilations, so those who noticed any, you might want to try it as well.
Frank
Frank Heckenbach wrote:
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the compilation I see the IDE's progress bar and line counter move line by line, often lines with calculations).
I see. BTW, Waldek are you interested in the general solution of this issue (including overloaded routines)? I'm not too much ATM (there are many more urgent issues). If you aren't either, perhaps I could at least try some quick fix to ease this problem somewhat. -- Still using name-based search, with all its disadvantages, but perhaps storing a list of actually defined operators in the OPERATOR_DECL (this will also be a preparation for the real solution), and then for each operator retrieve the type names and compare them -- I suppose if we get rid of the ACONCAT and get_identifier in the loop this way, this might make things much faster.
As I wrote operator search is inconsistent with the rest of GPC. Namely, operator definition is associated with the type declaration (or, equivalently type name). I think that in principle we should still do a search, but only for _supertypes_ of a given type. Here by a supertype I mean bigger type for ordinal types and ancestor in case of objects (maybe we should also consider undiscriminated schema as a supertype of a discriminated one). I would say that operator "matches" if types of is formal parameters are supertypes of actual parameters.
If multiple operator match given arguments then we need some extra rules. If there is most specific operator, then we should use it. Otherwise we have to decide what to do -- there is semantic problem here, but I expect the list of "closely matching" operators to be short, so almost any method to resolve final ambiguity is likely to be fast enough.
Search on tuples of supertypes still may be slow (especially if we generalize the search to overloaded functions with multiple arguments), so we may need better algorithm and smarter data structures.
ATM I would prefer to think more before implementing general solution. OTOH I think that I can quickly make the search cleaner and faster by associating operators with types. Namely Apple interfaces generate many alias names for types and GPC spends quite a lot of time searching for all pairs of aliases. Also, for many types there are no operators defined, and current GPC still is doing the full search. A flag on types should be enough to prevent such search.
Waldek Hebisch wrote:
Frank Heckenbach wrote:
I will try to be useful and fix gp myself. Compile times are also spoiled by the quadratic operator search (http://www.gnu-pascal.de/crystal/gpc/en/mail10055.html) and that is something I cannot fix easily myself (at some points in the compilation I see the IDE's progress bar and line counter move line by line, often lines with calculations).
I see. BTW, Waldek are you interested in the general solution of this issue (including overloaded routines)? I'm not too much ATM (there are many more urgent issues). If you aren't either, perhaps I could at least try some quick fix to ease this problem somewhat. -- Still using name-based search, with all its disadvantages, but perhaps storing a list of actually defined operators in the OPERATOR_DECL (this will also be a preparation for the real solution), and then for each operator retrieve the type names and compare them -- I suppose if we get rid of the ACONCAT and get_identifier in the loop this way, this might make things much faster.
As I wrote operator search is inconsistent with the rest of GPC. Namely, operator definition is associated with the type declaration (or, equivalently type name).
(I agree, but here I was just looking for a quick fix.)
I think that in principle we should still do a search, but only for _supertypes_ of a given type. Here by a supertype I mean bigger type for ordinal types and ancestor in case of objects (maybe we should also consider undiscriminated schema as a supertype of a discriminated one). I would say that operator "matches" if types of is formal parameters are supertypes of actual parameters.
(This is already somewhat questionable, an alternative would be to say if they're valid for parameters, in particular assignment-compatible for value parameters. And this easily leads to the question of types of expressions, e.g. is the sum of two Byte variables matching to a CInteger parameter? It might not be, according to your rule, if the sum is computed as Integer and Integer is larger than CInteger. But if Integer and CInteger are of the same size, it would match. I wouldn't like operator matching to become platform-dependent.)
If multiple operator match given arguments then we need some extra rules. If there is most specific operator, then we should use it.
(In a case like above, this could mean that, depending on the platform, silently another operator is chosen -- no error on either platform, which seems a bit dangerous to me ...)
Otherwise we have to decide what to do -- there is semantic problem here, but I expect the list of "closely matching" operators to be short, so almost any method to resolve final ambiguity is likely to be fast enough.
(I'm more worried about consistent rules here.)
Search on tuples of supertypes still may be slow (especially if we generalize the search to overloaded functions with multiple arguments), so we may need better algorithm and smarter data structures.
ATM I would prefer to think more before implementing general solution.
That's exactly what I mean. Postpone the more difficult work and controversial issues, and do something quick for the current performance problem now.
OTOH I think that I can quickly make the search cleaner and faster by associating operators with types. Namely Apple interfaces generate many alias names for types and GPC spends quite a lot of time searching for all pairs of aliases. Also, for many types there are no operators defined, and current GPC still is doing the full search. A flag on types should be enough to prevent such search.
But this flag must be consistently preserved in variants, ordinal subtypes, etc. I'm don't know offhand how hard this is. Even if it's not so easy and we don't do it, I think we can gain some speed, as I described above -- make use of OPERATOR_DECL, also as a step towards the real solution later, and avoid ACONCAT and get_identifier. This should reduce the constant factor considerably (pointer comparisons instead of string operations) and make the complexity dependent on #typenames * #operators instead of #typenames ^ 2. Provided #operators (per symbol) << #typenames (per operator argument type), as I understand is the case here, this should be quite a bit faster.
Frank
Frank Heckenbach wrote:
Waldek Hebisch wrote:
OTOH I think that I can quickly make the search cleaner and faster by associating operators with types. Namely Apple interfaces generate many alias names for types and GPC spends quite a lot of time searching for all pairs of aliases. Also, for many types there are no operators defined, and current GPC still is doing the full search. A flag on types should be enough to prevent such search.
But this flag must be consistently preserved in variants, ordinal subtypes, etc. I'm don't know offhand how hard this is. Even if it's not so easy and we don't do it, I think we can gain some speed, as I described above -- make use of OPERATOR_DECL, also as a step towards the real solution later, and avoid ACONCAT and get_identifier. This should reduce the constant factor considerably (pointer comparisons instead of string operations) and make the complexity dependent on #typenames * #operators instead of #typenames ^ 2. Provided #operators (per symbol) << #typenames (per operator argument type), as I understand is the case here, this should be quite a bit faster.
Maybe I was not clear enough: I want to associate definiton with main variant. Keeping the other rules as is we will have at most 4 searches for operators. Flag may reduce 4 to 0.
Waldek Hebisch wrote:
Frank Heckenbach wrote:
Waldek Hebisch wrote:
OTOH I think that I can quickly make the search cleaner and faster by associating operators with types. Namely Apple interfaces generate many alias names for types and GPC spends quite a lot of time searching for all pairs of aliases. Also, for many types there are no operators defined, and current GPC still is doing the full search. A flag on types should be enough to prevent such search.
But this flag must be consistently preserved in variants, ordinal subtypes, etc. I'm don't know offhand how hard this is. Even if it's not so easy and we don't do it, I think we can gain some speed, as I described above -- make use of OPERATOR_DECL, also as a step towards the real solution later, and avoid ACONCAT and get_identifier. This should reduce the constant factor considerably (pointer comparisons instead of string operations) and make the complexity dependent on #typenames * #operators instead of #typenames ^ 2. Provided #operators (per symbol) << #typenames (per operator argument type), as I understand is the case here, this should be quite a bit faster.
Maybe I was not clear enough: I want to associate definiton with main variant. Keeping the other rules as is we will have at most 4 searches for operators. Flag may reduce 4 to 0.
I'm not quite sure how you mean to do this, without changing existing rules (where the given variant is tried first, so currently one can have different operators for equivalent types of different names).
Say we have variants a and b of Integer, and operators + on pairs of a and of b. Currently they're stored as BPlus_A_A and BPlus_B_B and looked up by name. Associating both operators with the main variant (Integer) would give the same name and thus a conflict. Or am I overlooking something?
BTW, I'm not trying to defend current rules, I'd just like to avoid changing them for a quick fix, when we know they'll have to be changed again. In the worst case, affected code would have to be rewritten twice within "short" time.
I was thinking of storing a list of operators defined for a given symbol in a OPERATOR_DECL because (a) we'll probably need this anyway for the real solution and (b) this part should be uncontroversial (the question will be how we use this list later). The main implementation issues about this way are GPI handling (but shouldn't be too hard), and possibly scoping (that's why I started the other thread).
Frank
Frank Heckenbach wrote:
Waldek Hebisch wrote:
Maybe I was not clear enough: I want to associate definiton with main variant. Keeping the other rules as is we will have at most 4 searches for operators. Flag may reduce 4 to 0.
I'm not quite sure how you mean to do this, without changing existing rules (where the given variant is tried first, so currently one can have different operators for equivalent types of different names).
Say we have variants a and b of Integer, and operators + on pairs of a and of b. Currently they're stored as BPlus_A_A and BPlus_B_B and looked up by name. Associating both operators with the main variant (Integer) would give the same name and thus a conflict. Or am I overlooking something?
I wrote that I _want_ to change the rules. Two variants of Integer _are_ the same type, so current overloading really breaks normal rules.
A nice example how strange current rules are:
program foo4; type t1 = record i1: integer end; operator f (x, y : t1) = c : integer; begin writeln('outer f'); c := 1; end; operator f (x, y: char) = c : integer; begin writeln('second f'); c := 1; end; procedure p; type t1 = char; var c1, c2 : t1; begin writeln(c1 f c2); end; begin end .
the above gives me:
foo4.p: In procedure `p': foo4.p:17: error: incompatible type for argument 1 of `f (t1, t1)' foo4.p:3: error: routine declaration
so the search returns integer version, even though arguments are of char type and char version is defined.
BTW, I'm not trying to defend current rules, I'd just like to avoid changing them for a quick fix, when we know they'll have to be changed again. In the worst case, affected code would have to be rewritten twice within "short" time.
I think that fundamental rule is: overloading should depend only on types of the arguments, not on their names.
BTW, it looks that current rules are a quick hack: we need a piece of information which uniqely identifies a type and is propagated via interfaces. Name of main variant is first approximation, but some types it is not defined. OTOH function arguments always have names and look nicer in error messages.
I was thinking of storing a list of operators defined for a given symbol in a OPERATOR_DECL because (a) we'll probably need this anyway for the real solution and (b) this part should be uncontroversial (the question will be how we use this list later). The main implementation issues about this way are GPI handling (but shouldn't be too hard), and possibly scoping (that's why I started the other thread).
Well, we need a map: operator x type x type -> function. We may have a global list of all tuples, we can attach list to operator, we can attach list to types, we can use a hash table ... Whatever representation we choose we need to add/remove imported operators as needed.
Waldek Hebisch wrote:
I wrote that I _want_ to change the rules. Two variants of Integer _are_ the same type, so current overloading really breaks normal rules.
A nice example how strange current rules are:
As I wrote, I don't need to be convinced how strange current rules are.
BTW, it looks that current rules are a quick hack:
Yes, they are! Haven't I mentioned this often enough yet (in this and other threads)?
I think there's something strange about this discussion, so let me recapitulate:
There's no question that we need better rules in the long run. But it's also clear, as previous discussions have shown, that there are some controversial issues that need to be resolved, and then it will be some work to implement better rules.
OTOH, Adriaan's problem is not one of strange rules (AFAIK), but a simple performance problem that can probably be solved, or at least alleviated, without long discussions (or could have ... ;-), and with less implementation work.
That's why I initially asked whther you are interested in the general solution of this issue (and have the time and willingness to resolve the issues in discussions, and then do the implementation, I should have added)? As you replied: "ATM I would prefer to think more before implementing general solution.", I took this for a no, so I assumed we were talking about the quick fix only.
Anyway, I see 3 options now:
1. Do a quick fix (you or me), without changing the rules. (And do the whole thing sometime later.)
2. Do a quick fix, with changing the rules. I explained in my previous mail why I dislike this. (Of course, you may disagree, but I don't think you've said that/why you think that changing the rules a little now, and more later, would be useful.)
3. Do the whole thing, resolve controversies etc. (you -- I don't have time for it anytime soon), of course with changing the rules.
Which one are we talking about?
Now talking about the whole thing (which I'd actually rather not do now, unless you really want to implement it soon):
we need a piece of information which uniqely identifies a type and is propagated via interfaces. Name of main variant is first approximation, but some types it is not defined.
(And different types can have the same name, say in different modules.)
OTOH function arguments always have names and look nicer in error messages.
Indeed, for error messages we should make some effort to get reasonable names, but it may not have to be perfect. The identification will *then* surely be by types, not by names. That's always been clear to me, and I don't think I've ever said otherwise, neither in previous threads, so why are we discussing this again and again?
I was thinking of storing a list of operators defined for a given symbol in a OPERATOR_DECL because (a) we'll probably need this anyway for the real solution and (b) this part should be uncontroversial (the question will be how we use this list later). The main implementation issues about this way are GPI handling (but shouldn't be too hard), and possibly scoping (that's why I started the other thread).
Well, we need a map: operator x type x type -> function. We may have a global list of all tuples, we can attach list to operator, we can attach list to types, we can use a hash table ... Whatever representation we choose we need to add/remove imported operators as needed.
Then we'd need to know the exact types we search for, so if (later) we want to implement closest-match rules etc., we'd need to normalize types in a way that guarantees that all possibly matching types will be found. Sure, it's possible, but probably not the least-effort solution. (Yes, I know, more effort can be justified by performance issues, but compared to the current way with c n^2 and a large c, maybe anything else will just be fast enough for any reasonable code ...)
I thought about OPERATOR_DECL because we have this as a placeholder anyway (which currently doesn't have much other purpose), and we often look up the operator name before anyway, so it would reduce the search space by the number of operator symbols (not important for a hash table, but for other implementations). And it could take care of scoping with little effort -- if we decide we want that.
Apropos remove imported operators: I suppose you refer to local imports. So we have scoping issues regardless of whether we restrict operator declarations to the global level, don't we?
Frank
Frank Heckenbach wrote:
Waldek Hebisch wrote:
I wrote that I _want_ to change the rules. Two variants of Integer _are_ the same type, so current overloading really breaks normal rules.
A nice example how strange current rules are:
As I wrote, I don't need to be convinced how strange current rules are.
BTW, it looks that current rules are a quick hack:
Yes, they are! Haven't I mentioned this often enough yet (in this and other threads)?
I think there's something strange about this discussion, so let me recapitulate:
There's no question that we need better rules in the long run. But it's also clear, as previous discussions have shown, that there are some controversial issues that need to be resolved, and then it will be some work to implement better rules.
OTOH, Adriaan's problem is not one of strange rules (AFAIK), but a simple performance problem that can probably be solved, or at least alleviated, without long discussions (or could have ... ;-), and with less implementation work.
That's why I initially asked whther you are interested in the general solution of this issue (and have the time and willingness to resolve the issues in discussions, and then do the implementation, I should have added)? As you replied: "ATM I would prefer to think more before implementing general solution.", I took this for a no, so I assumed we were talking about the quick fix only.
There is Grand Design, involving function overloading and objects. It will take more then two weeks to implement. Also Grand Design depends on (or interacts with) currently unimplemented features. And of course, there are issues to be resoved before Grand Design can be called a design. For me Grand Design = general solution.
To make my answer clearer: I do not accept your dichotmy between general solution and a quick fix. Or, putting it differntly, for me quick fix is making implemented things work correctly and barfing on unimplemented stuff.
Anyway, I see 3 options now:
- Do a quick fix (you or me), without changing the rules. (And do the whole thing sometime later.)
My point is: the current rules are so strage that improving speed without changing them requires some real work. Less then I thought, but still comparable to doing things correctly. In particular, one have to seach for operators in a very specific order.
I guess that you realy mean:
1'. Change the rules and hope that nobody will notice. I admit that in my proposal there is a chance to invalidate some real programs.
- Do a quick fix, with changing the rules. I explained in my previous mail why I dislike this. (Of course, you may disagree, but I don't think you've said that/why you think that changing the rules a little now, and more later, would be useful.)
The idea was: accept uncontroversial and easy to handle cases and reject everthing else. In other words, I want a quick fix which fits into general solution.
Why changing rules now:
1) someone may be tempted to take advantage of current behaviour, removing strangeness quickly reduces risk of breaking such programs 2) the later changes should be compatible (do not change meaning of existing programs)
- Do the whole thing, resolve controversies etc. (you -- I don't have time for it anytime soon), of course with changing the rules.
Which one are we talking about?
I certainly do not have time to do now what I consider a general solution. But I want a fix which is part of general solution, which means that we need to agree on some parts of general solution. Or we may conclude that general solution is too controversial to try to implement part of it.
I have actually tried to implement what I proposed in the first post, and I get it to pass the test suite. But now I see problems with scope/modules:
1) my hack will break down when exporting operators acting on builtin types 2) it has similar strange behaviour when types names are shadowed in inner scopes
I must admit that I do not know how to correctly handle the following program:
program foo4; type t1 = string(20); operator f (x, y : t1) = c : integer; begin writeln('outer f'); c := 1; end; procedure p(n : integer); type t1 = string(n); var c1, c2 : t1; operator f (x, y: t1) = c : integer; begin writeln('second f'); c := 1; end; procedure q; type t1 = string(20); var c1, c2 : t1; begin writeln(c1 f c2); end; begin q; end; begin p(5); p(20); end .
ATM we just ICE, so we can forbid operators on dynamic types. If we make it legal, for n = 5 procedure q should see outer declaration, but for n = 20 the inner declaration should shadow the outer one, which means dispatch at runtime.
Now talking about the whole thing (which I'd actually rather not do now, unless you really want to implement it soon):
we need a piece of information which uniqely identifies a type and is propagated via interfaces. Name of main variant is first approximation, but some types it is not defined.
(And different types can have the same name, say in different modules.)
Or just going to inner scope ...
OTOH function arguments always have names and look nicer in error messages.
Indeed, for error messages we should make some effort to get reasonable names, but it may not have to be perfect. The identification will *then* surely be by types, not by names. That's always been clear to me, and I don't think I've ever said otherwise, neither in previous threads, so why are we discussing this again and again?
I was thinking of storing a list of operators defined for a given symbol in a OPERATOR_DECL because (a) we'll probably need this anyway for the real solution and (b) this part should be uncontroversial (the question will be how we use this list later). The main implementation issues about this way are GPI handling (but shouldn't be too hard), and possibly scoping (that's why I started the other thread).
Well, we need a map: operator x type x type -> function. We may have a global list of all tuples, we can attach list to operator, we can attach list to types, we can use a hash table ... Whatever representation we choose we need to add/remove imported operators as needed.
Then we'd need to know the exact types we search for, so if (later) we want to implement closest-match rules etc., we'd need to normalize types in a way that guarantees that all possibly matching types will be found. Sure, it's possible, but probably not the least-effort solution. (Yes, I know, more effort can be justified by performance issues, but compared to the current way with c n^2 and a large c, maybe anything else will just be fast enough for any reasonable code ...)
I thought about OPERATOR_DECL because we have this as a placeholder anyway (which currently doesn't have much other purpose), and we often look up the operator name before anyway, so it would reduce the search space by the number of operator symbols (not important for a hash table, but for other implementations). And it could take care of scoping with little effort -- if we decide we want that.
Apropos remove imported operators: I suppose you refer to local imports. So we have scoping issues regardless of whether we restrict operator declarations to the global level, don't we?
Yes, local imports. We could forbid importing operators in local scopes, but such restriction looks to kludgy for me (still, may be good for a quick fix). So we probably should handle scope.
Concerning OPERATOR_DECL: I agree that attaching list of operators to OPERATOR_DECL may give easy and reasonably fast solution. But I am affraid that for scoping we need a separate list anyway. We also need to attach/ detach operators on import, so I am not sure how much such list buys us.
Frank Heckenbach wrote:
OTOH, Adriaan's problem is not one of strange rules (AFAIK), but a simple performance problem that can probably be solved, or at least alleviated, without long discussions (or could have ... ;-), and with less implementation work.
That's why I initially asked whther you are interested in the general solution of this issue (and have the time and willingness to resolve the issues in discussions, and then do the implementation, I should have added)? As you replied: "ATM I would prefer to think more before implementing general solution.", I took this for a no, so I assumed we were talking about the quick fix only.
As I wrote I would like to have a fix which is part of a general solution. But I must admit that ATM I do not see a really quick fix. I intend to go back to this problem in few weeks, but I you want to do something in the meantime just go on.