Frank Heckenbach wrote:
Mirsad Todorovac wrote:
You might think that the function is very optimized, since it requires only two comparisons and a lookup in table per character checked?
Alas, GPC does a proper call to the real memcpy() function of complete ``v'' array on each call of function DigitValue() !!!
Normal local variables are (by definition!) created each time the routine is called. To avoid this, give them the `static' attribute (non-standard), or declare them as `const' (non-standard, BP). The latter is obviously preferrable since the array is really constant.
BTW, since you only access the array in the range '0' .. 'z', you only need to declare this part and can omit the lots of `-1' entries. Also, char indices are perfectly alright, so you don't have to use `Ord' here.
Just FYI, making ``v'' array [0..255] of Integer (for aligned access) made it even 10s slower (probably problems with FSB and cache), instead of what is commonly said,
Probably because of the initialization (see above) which takes 4 or 8 times as long then, of course (and which takes most time at all).
and complete code is not a bit faster from this variant:
function DigitValue (Dig: Char): Integer; attribute (inline, const); var d : Integer; attribute (register); begin if (Dig >= '0') and (Dig <= '9' ) then DigitValue := Ord (Dig) - Ord ('0') else if (Dig >= 'a') and (Dig <= 'z') then DigitValue := Ord (Dig) - Ord ('a') + 10 else if (Dig >= 'A') and (Dig <= 'Z') then DigitValue := Ord (Dig) - Ord ('A') + 10 else DigitValue := -1 end;
... even though this code has six branches.
Only 3 branches (the backend optimized better than you think -- unfortunately in this case only with `{$B+}', since Boolean shortcuts require special handling) which makes the array above look rather questionable ...
My view:
PROGRAM try(output);
CONST digits = ['0'..'9']; upcase = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']; lowcase = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']; ordmaxchar = 255; (* not guaranteed immutable *)
(* and so far I am char set independant - EBCDIC is fine *)
TYPE base = 2..36;
VAR digval : ARRAY[char] OF integer; alfamerics : SET OF char;
(* 1--------------1 *)
PROCEDURE initdigval;
VAR c : char;
BEGIN (* initdigval *) alfamerics := digits + upcase + lowcase; FOR c := chr(0) TO chr(ordmaxchar) DO BEGIN digval := -1; IF c IN digits THEN digval := ord(c) - ord('0'); END; digval['a'] := 10; digval['A'] := 10; digval['b'] := 11; digval['B'] := 11; digval['c'] := 12; digval['C'] := 12; (* I'm tired -- you get the idea *) END; (* initdigval *)
(* 1--------------1 *)
(* Now we are forced to get into 'what is a string' * Feel free to adjust to other possibilities * My attitude is that a number can be a substring * and that an invalid char signifies the end of * that substring. So this routine doesn't deserve * existance, the process should extract a number * from a string, and indicate the end of the substr. *) FUNCTION isvalidnumber(VAR s : string; b : base) : boolean;
VAR cv : integer;
BEGIN (* isvalidnumber *) cv := digval[s[1]]; isvalidnumber := (cv < b) AND (cv >= 0); (* this statement may indicate that -1 is a poor choice *) (* Maybe the default should be MAXINT in initdigval *) END; (* isvalidnumber *)
and I won't tire you with further code. However, portability should IMO include proper adaptation to char sets, which the above handles. Simple variants can handle languages with other characters, other casing rules, etc. For number conversion we need not make any such allowances, but we do need to spell things out.
There should be no need for assert statements, with properly restricted subranges, as in the type definition of base above.
Note that the above table can be used in an EBCDIC machine to translate alfamerics to ASCII.
aside: Frank, I am still mulling a reply to your previous note on initialization - I think you misunderstand my attitude.