If the strings are not known until run time, the most efficient way is to allocate a large block of storage, and move the strings in end-to-end, using a separate array to hold either pointers or indices into the large block.
When the block is full, allocate another large block using New. There are two ways to manage the storage. (1) Extension block method: Use a second array to point to the set of large blocks. Each string will be accessed using two pointers or two indices. The first will point to the particular block, and the second will point to the string within the block. (2) Single block method: Make the new block twice as large as the previous block, move the existing strings into the new block and then Dispose the old block.
Very convoluted. A linked list or string collection would do just fine. And I am still not sure how you can make the claim that one method or the other is the "most efficient" way of doing something.
In this case I should have said "the most efficient ways that I know."
A linked list of strings may or may not be comparably efficient in terms of space usage. In the extension block method, for each block of strings you also need a table of integer indices into the block. The pointers in the linked list are probably twice as large as the integer indices, so they take more room. In addition, the pointers may need to be aligned, so they may take an average of 5.5 bytes, compared to 2 bytes for the indices. On the other hand, you don't know beforehand how many strings you will have, so you might allocate the index table larger than is actually needed.
However, the index table offers an advantage. It can be sorted according to the alphabetic order of the strings, so the table can be searched by a binary search, or they may be arranged into a heap, which also makes for log(N) time searching. Both of these methods require no extra storage.
On the other hand, the linked list has to be searched serially. If you chose to use a hash table with the linked list, that would occupy even more storage. As I indicated in an earlier posting, the decision to use a hash table, heap or tree to speed up access to the strings is independent of the method used to store the strings.
To head off any future argument that doubling the size of the single block can waste a lot of storage, let me point out that doubling is just an example. You could increase it by half, or by 10% each time, at the expense of copying the list of strings more frequently. This is simply a time-vs-space trade-off.
Frank Rubin
Contestcen@aol.com wrote:
However, the index table offers an advantage. It can be sorted according to the alphabetic order of the strings, so the table can be searched by a binary search, or they may be arranged into a heap, which also makes for log(N) time searching.
A perfect hash table allows for constant search time.
At runtime, you cannot generally generate a perfect hash table (without overhead that is often not worth it), so O(log N) is usually ok.
But you're explicitly talking about compile-time constant strings, and in this case, you can usually make a perfact hash table. So that's your comparison.
As I wrote in PM, have you looked at programs such as gperf?
These are some of the interesting issues when dealing with a lot of compile-time constant strings, rather than squeezing out a byte or two in return for worse performance and unstable programs.
Frank