On Sun, 8 Jun 2003, Frank Heckenbach wrote:
Mirsad Todorovac wrote:
You might think that the function is very optimized, since it requires only two comparisons and a lookup in table per character checked?
Alas, GPC does a proper call to the real memcpy() function of complete ``v'' array on each call of function DigitValue() !!!
Normal local variables are (by definition!) created each time the routine is called. To avoid this, give them the `static' attribute (non-standard), or declare them as `const' (non-standard, BP). The latter is obviously preferrable since the array is really constant.
``Attribute (static)'' pushes copying out of function, but ``attribute (const)'' is still copying the array in each function call!, giving (i686):
Digitvalue.0: pushl %ebp movl %esp, %ebp subl $296, %esp movl %ebx, -4(%ebp) leal -280(%ebp), %edx movzbl 8(%ebp), %ebx movl %edx, (%esp) movl %ecx, -12(%ebp) movl $256, 8(%esp) movl $.LC0, 4(%esp) call memcpy ;-( movzbl %bl, %edx movl -4(%ebp), %ebx movsbl -280(%edx,%ebp),%eax movl %ebp, %esp popl %ebp ret
instead of ``attribute (static)'' version:
Digitvalue.0: pushl %ebp movl %esp, %ebp subl $4, %esp movzbl 8(%ebp), %edx movsbl V.1(%edx),%eax movl %ebp, %esp popl %ebp ret
Besides this, since array [0..255] can be and probably should be properly aligned on 32-bit boundary, compiler could call (considerably faster) mempcpy_by4, which copies 32-bit-wise, and/or could (at least with -O3) inline the code.
(IMHO the former especially since the compiler knows the length of the array at compile time.)
BTW, since you only access the array in the range '0' .. 'z', you only need to declare this part and can omit the lots of `-1' entries. Also, char indices are perfectly alright, so you don't have to use `Ord' here.
Thanks for reminding me. I chose to remove that if a in ['0'..'z'] ...
Just FYI, making ``v'' array [0..255] of Integer (for aligned access) made it even 10s slower (probably problems with FSB and cache), instead of what is commonly said,
Probably because of the initialization (see above) which takes 4 or 8 times as long then, of course (and which takes most time at all).
No, I am comparing static versions of array [0..255] of ByteInt and array [0..255] of MedInt ... the latter is still 10% slower. IMHO this is because former occupies less space in cache, obviously, which is on 1.1 GHZ/133 MHz FSB machine probably quite relevant.
and complete code is not a bit faster from this variant:
function DigitValue (Dig: Char): Integer; attribute (inline, const); var d : Integer; attribute (register); begin if (Dig >= '0') and (Dig <= '9' ) then DigitValue := Ord (Dig) - Ord ('0') else if (Dig >= 'a') and (Dig <= 'z') then DigitValue := Ord (Dig) - Ord ('a') + 10 else if (Dig >= 'A') and (Dig <= 'Z') then DigitValue := Ord (Dig) - Ord ('A') + 10 else DigitValue := -1 end;
... even though this code has six branches.
Only 3 branches (the backend optimized better than you think -- unfortunately in this case only with `{$B+}', since Boolean shortcuts require special handling) which makes the array above look rather questionable ...
Not completelly ;-)
Here are timings:
mtm017 arr byteint 252.01 mtm017 arr medint 279.63 (+10.96%) mtm017 iteite 286.36 (+13.63%) mtm017 iteite + and_then 285.77 (+13.40%)
(And_then is an equivalent of {B+}, isn't it?)
So, the answer is: using array still *is* faster ;-)
Besides, it is a small function and it's fit for inlining, and
begin Digitvalue := v[ Dig ]; end;
inlines better than all that
if ... then ... else if ... then ... else if ... then ...;
stuff, IMHO. So, we save both time and space :-)
(Once we made the array constant to evade copying on each call, of course.)
Mirsad