Re: Packing strings ()

20 Jul 2005


      Contestcen@aol.com wrote:
...
...
...
Making a perfect hash requires 7 additional tables, 3 independent hashes
of
...
...
the string, 3 tables to convert the hash values into summands, and one
table to
...
...
convert the sums into pointers to the strings.  (There is a trick that can 
reduce this to 5 tables, but it makes the first hashing round slower.)
Any references for these claims?
gperf needs only one additional table. It doesn't need several
"hashing rounds", only one very short hash function. Did you
actually look at it (or comparable programs)?
First, let me clarify some terms.  There are two properties of hash functions 
involved here.  The first is that every keyword hashes to a different value.  
This is called "collision-free" hashing, or "single-probe" hashing.  The 
second property is that hash function for the N keywords is into a set of N things 
(such as N pointers, or the integers from 0 to N-1).  This is called 
"minimal" hashing.  If the hash function is both collision-free and minimal, then it 
is called "perfect."
According to other definitions I know (see e.g. Wikipedia), the
first one is called "perfect hashing", the second one is called
"minimal perfect hashing".
The former is really more important. If you get a few unused slots
because you can't achieve minimal perfect hashing, that's probably
still much better than your solution for minimal perfect hashing,
both in space (if if needs 5 or 7 tables as you say, plus the
accompanying code) and in time, obviously.
...
I looked up the gperf program, and found that it satisfies the first property
most of the time, except for large sets of keywords, and the second property 
rarely, only for small keyword sets.  So, from my perspective, it does not 
achieve perfect hashing.  Also, it appears to use 3 tables, not 1.
The one I know produces one hash table, and of course, the array
with the strings, so at most 2 tables if you count so.
...
In any case, we are getting off the subject, which was how to store a set of 
varying-length strings with minimal wasted space.
Yawn!
BTW, this apparently is your chosen subject which noone else here
seems to be interested in. The original subject was about runtime
storage.
Also, most people here, including the Chief and me, are more
interested in efficient access than saving a small constant number
of bytes. The penalties, e.g., for unaligned access, even on systems
where it doesn't crash right away, might more than make up for the
savings, by larger code alone.
...
I mentioned
that an additional advantage is that its pointer table or index table could be 
arranged to provide O(log N) time searching without using any additional 
tables or space.
And I mentioned that this is a rather bad value for compile-time
known strings.
...
If a practical method for searching the table in minimal time is required,
then a self-expanding hash function would be a more suitable solution, since 
these can be built during runtime in linear time, with moderate wasted space.
For runtime, yes. And one of the issues is how to allocate space for
the strings in a non-wasteful way, which was the original question
and has long been answered.
For compile-time, you probably want a compile-time built hash table.
Of course, this is probably out of the scope of a macro processor,
so sorry, your macro processor is just not releveant here.
I understand that you want to advertize it, but it's really getting
too much. It might be better if you find suitable applications to
refer to -- this means problems that people are really having (not
only yourself apparently; which excludes this topic), and which
concern GPC if you post on this list (this excludes certain problems
another compiler has with longer strings).
Frank
-- 
Frank Heckenbach, frank@g-n-u.de, http://fjf.gnu.de/, 7977168E
GPC To-Do list, latest features, fixed bugs:
http://www.gnu-pascal.de/todo.html
GPC download signing key: ACB3 79B2 7EB2 B7A7 EFDE  D101 CD02 4C9D 0FE0 E5E8

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

Re: Packing strings ()