Packing strings () - Gpc

14 Jul 2005


      ...
A perfect hash table allows for constant search time.
At runtime, you cannot generally generate a perfect hash table
(without overhead that is often not worth it), so O(log N) is
usually ok.
But you're explicitly talking about compile-time constant strings,
and in this case, you can usually make a perfact hash table. So
that's your comparison.
As I wrote in PM, have you looked at programs such as gperf?
These are some of the interesting issues when dealing with a lot of
compile-time constant strings, rather than squeezing out a byte or
two in return for worse performance and unstable programs.
Making a perfect hash requires 7 additional tables, 3 independent hashes of 
the string, 3 tables to convert the hash values into summands, and one table to 
convert the sums into pointers to the strings.  (There is a trick that can 
reduce this to 5 tables, but it makes the first hashing round slower.)
On the other hand, organizing the indices into a heap, as I suggested, 
requires no extra storage.
As I pointed out in earlier posts, the issue of whether to use a hash, or a 
tree, or even a hash tree is independent of how the strings are stored.  It 
just happens that my method of storing the strings allows you to create a heap 
that speeds up searching without using any additional storage.
Frank Rubin