Re: GPC preformance kludge with local

9 Jun 2003


      On Sun, 8 Jun 2003, Frank Heckenbach wrote:
...
Mirsad Todorovac wrote:
...
You might think that the function is very optimized, since it requires
only two comparisons and a lookup in table per character checked?
Alas, GPC does a proper call to the real memcpy() function of complete
``v'' array on each call of function DigitValue() !!!
Normal local variables are (by definition!) created each time the
routine is called. To avoid this, give them the `static' attribute
(non-standard), or declare them as `const' (non-standard, BP). The
latter is obviously preferrable since the array is really constant.
``Attribute (static)'' pushes copying out of function, but ``attribute
(const)'' is still copying the array in each function call!, giving
(i686):
Digitvalue.0:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $296, %esp
        movl    %ebx, -4(%ebp)
        leal    -280(%ebp), %edx
        movzbl  8(%ebp), %ebx
        movl    %edx, (%esp)
        movl    %ecx, -12(%ebp)
        movl    $256, 8(%esp)
        movl    $.LC0, 4(%esp)
        call    memcpy                           ;-(
        movzbl  %bl, %edx
        movl    -4(%ebp), %ebx
        movsbl  -280(%edx,%ebp),%eax
        movl    %ebp, %esp
        popl    %ebp
        ret
instead of ``attribute (static)'' version:
Digitvalue.0:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $4, %esp
        movzbl  8(%ebp), %edx
        movsbl  V.1(%edx),%eax
        movl    %ebp, %esp
        popl    %ebp
        ret
Besides this, since array [0..255] can be and probably should be properly
aligned on 32-bit boundary, compiler could call (considerably faster)
mempcpy_by4, which copies 32-bit-wise, and/or could (at least with -O3)
inline the code.
(IMHO the former especially since the compiler knows the length of
the array at compile time.)
...
BTW, since you only access the array in the range '0' .. 'z', you
only need to declare this part and can omit the lots of `-1'
entries. Also, char indices are perfectly alright, so you don't have
to use `Ord' here.
Thanks for reminding me. I chose to remove that if a in ['0'..'z'] ...
...
...
Just FYI, making ``v'' array [0..255] of Integer (for aligned access) made
it even 10s slower (probably problems with FSB and cache), instead of what
is commonly said,
Probably because of the initialization (see above) which takes 4 or
8 times as long then, of course (and which takes most time at all).
No, I am comparing static versions of array [0..255] of ByteInt and
                                      array [0..255] of MedInt ...
the latter is still 10% slower. IMHO this is because former occupies less
space in cache, obviously, which is on 1.1 GHZ/133 MHz FSB machine
probably quite relevant.
...
...
and complete code is not a bit faster from this variant:
function DigitValue (Dig: Char): Integer; attribute (inline, const);
  var d : Integer; attribute (register);
  begin
    if      (Dig >= '0') and (Dig <= '9' ) then
      DigitValue  := Ord (Dig) - Ord ('0')
    else if (Dig >= 'a') and (Dig <= 'z') then
      DigitValue  := Ord (Dig) - Ord ('a') + 10
    else if (Dig >= 'A') and (Dig <= 'Z') then
      DigitValue  := Ord (Dig) - Ord ('A') + 10
    else
      DigitValue := -1
  end;
... even though this code has six branches.
Only 3 branches (the backend optimized better than you think --
unfortunately in this case only with `{$B+}', since Boolean
shortcuts require special handling) which makes the array above look
rather questionable ...
Not completelly ;-)
Here are timings:
mtm017 arr byteint       252.01
mtm017 arr medint        279.63 (+10.96%)
mtm017 iteite            286.36 (+13.63%)
mtm017 iteite + and_then 285.77 (+13.40%)
(And_then is an equivalent of {B+}, isn't it?)
So, the answer is: using array still *is* faster ;-)
Besides, it is a small function and it's fit for inlining, and
begin
     Digitvalue := v[ Dig ];
   end;
inlines better than all that
if ... then ...
   else if ... then ...
   else if ... then ...;
stuff, IMHO. So, we save both time and space :-)
(Once we made the array constant to evade copying on each call, of
course.)
Mirsad
-- 
"I have a dream!" -- Martin Luther King Jr.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: GPC preformance kludge with local