A perfect hash table allows for constant search time.
At runtime, you cannot generally generate a perfect hash table (without overhead that is often not worth it), so O(log N) is usually ok.
But you're explicitly talking about compile-time constant strings, and in this case, you can usually make a perfact hash table. So that's your comparison.
As I wrote in PM, have you looked at programs such as gperf?
These are some of the interesting issues when dealing with a lot of compile-time constant strings, rather than squeezing out a byte or two in return for worse performance and unstable programs.
Making a perfect hash requires 7 additional tables, 3 independent hashes of the string, 3 tables to convert the hash values into summands, and one table to convert the sums into pointers to the strings. (There is a trick that can reduce this to 5 tables, but it makes the first hashing round slower.)
On the other hand, organizing the indices into a heap, as I suggested, requires no extra storage.
As I pointed out in earlier posts, the issue of whether to use a hash, or a tree, or even a hash tree is independent of how the strings are stored. It just happens that my method of storing the strings allows you to create a heap that speeds up searching without using any additional storage.
Frank Rubin
Contestcen@aol.com wrote:
A perfect hash table allows for constant search time.
At runtime, you cannot generally generate a perfect hash table (without overhead that is often not worth it), so O(log N) is usually ok.
But you're explicitly talking about compile-time constant strings, and in this case, you can usually make a perfact hash table. So that's your comparison.
As I wrote in PM, have you looked at programs such as gperf?
These are some of the interesting issues when dealing with a lot of compile-time constant strings, rather than squeezing out a byte or two in return for worse performance and unstable programs.
Making a perfect hash requires 7 additional tables, 3 independent hashes of the string, 3 tables to convert the hash values into summands, and one table to convert the sums into pointers to the strings. (There is a trick that can reduce this to 5 tables, but it makes the first hashing round slower.)
Any references for these claims?
gperf needs only one additional table. It doesn't need several "hashing rounds", only one very short hash function. Did you actually look at it (or comparable programs)?
As I pointed out in earlier posts, the issue of whether to use a hash, or a tree, or even a hash tree is independent of how the strings are stored. It just happens that my method of storing the strings allows you to create a heap that speeds up searching without using any additional storage.
Often, speed is more important than some constant storage overhead.
Frank