Packing strings () - Gpc

13 Jul 2005


      ...
...
If the strings are not known until run time, the most
efficient way is to allocate a large block of storage, and move the
strings in end-to-end, using a separate array to hold either pointers or
indices into the large block.
When the block is full, allocate another large block using New.  There
are two ways to manage the storage.  (1) Extension block method:  Use a
second array to point to the set of large blocks.  Each string will be
accessed using two pointers or two indices.  The first will point to the
particular block, and the second will point to the string within the
block.  (2) Single block method:  Make the new block twice as large as
the previous block, move the existing strings into the new block and
then Dispose the old block.
Very convoluted. A linked list or string collection would do just fine. 
And I am still not sure how you can make the claim that one method or 
the other is the "most efficient" way of doing something.
In this case I should have said "the most efficient ways that I know."
A linked list of strings may or may not be comparably efficient in terms of 
space usage.  In the extension block method, for each block of strings you also 
need a table of integer indices into the block.  The pointers in the linked 
list are probably twice as large as the integer indices, so they take more room.
  In addition, the pointers may need to be aligned, so they may take an 
average of 5.5 bytes, compared to 2 bytes for the indices.  On the other hand, you 
don't know beforehand how many strings you will have, so you might allocate 
the index table larger than is actually needed.
However, the index table offers an advantage.  It can be sorted according to 
the alphabetic order of the strings, so the table can be searched by a binary 
search, or they may be arranged into a heap, which also makes for log(N) time 
searching.  Both of these methods require no extra storage.
On the other hand, the linked list has to be searched serially.  If you chose 
to use a hash table with the linked list, that would occupy even more 
storage.  As I indicated in an earlier posting, the decision to use a hash table, 
heap or tree to speed up access to the strings is independent of the method used 
to store the strings.
To head off any future argument that doubling the size of the single block 
can waste a lot of storage, let me point out that doubling is just an example.  
You could increase it by half, or by 10% each time, at the expense of copying 
the list of strings more frequently.  This is simply a time-vs-space 
trade-off.
Frank Rubin