I don't know the internals of the GPC New procedure, but in various environments where I have worked in the past there was no guarantee that
successive
calls to New would allocate adjacent blocks of storage, hence no guarantee
that
there would be no gaps or wasted space. It all depended on what sequence
of
calls to New and Dispose had occurred earlier.
You can never rely on successive `New' calls to return adjacent blocks of storage. That's not a reliable way to enlarge an array. And I don't know of any compiler for Pascal, or any other language for that matter, where such a thing would work. Sometimes there are special routines for that (in GPC the non-standard ReAllocMem, use with caution), otherwise you have to allocate a new array and copy the existing elements, or use another data structure that doesn't require contiguous memory, such as lists or trees.
I think we have lost the trail here.
Peter's original posting on this subject asked if there was a way to store character strings of different lengths without wasting space for the short strings. Frank Heckenbach suggested that Peter store each string using the New procedure, so each string would get a space its own size. I said that the Pascal Macro Compiler could store the strings without any gaps or wasted space. However, this works only if the strings are known at compile time.
I also pointed out that using New does not assure that there would be no gaps because New does not always allocate consecutive adjacent blocks of storage. In the storage-allocation system that I use most often, storage is allocated in units of 16 bytes, because free storage is kept in a tree structure with each element containing a length, an up pointer and two down pointers. So if you allocated a separate block for each character string there could be a lot of wasted space.
Conclusion: If the strings are known at compile time, the most efficient way to store them without gaps is to use the Pascal Macro Compiler. If the strings are not known until run time, the most efficient way is to allocate a large block of storage, and move the strings in end-to-end, using a separate array to hold either pointers or indices into the large block.
When the block is full, allocate another large block using New. There are two ways to manage the storage. (1) Extension block method: Use a second array to point to the set of large blocks. Each string will be accessed using two pointers or two indices. The first will point to the particular block, and the second will point to the string within the block. (2) Single block method: Make the new block twice as large as the previous block, move the existing strings into the new block and then Dispose the old block.
Frank Rubin
Contestcen@aol.com wrote:
Peter's original posting on this subject asked if there was a way to store character strings of different lengths without wasting space for the short strings. Frank Heckenbach suggested that Peter store each string using the New procedure, so each string would get a space its own size. I said that the Pascal Macro Compiler could store the strings without any gaps or wasted space. However, this works only if the strings are known at compile time.
I also pointed out that using New does not assure that there would be no gaps because New does not always allocate consecutive adjacent blocks of storage. In the storage-allocation system that I use most often, storage is allocated in units of 16 bytes, because free storage is kept in a tree structure with each element containing a length, an up pointer and two down pointers.
That's one method. However, AFAIK, modern memory managers use something more elaborate, such as keeping free blocks of different sizes in separate lists/trees/pages.
So if you allocated a separate block for each character string there could be a lot of wasted space.
OTOH, strings must be aligned at least to `Integer' size. (Accessing the Capacity and Length fields requires alignment on some platforms and is at least more efficient on others; accessing the characters can also be more efficient when aligned.) Does your preprocessor take care of this?
(But again, I'm quite sure that Peter's question referred to runtime storage, so that's all kind of beside the point.)
Conclusion: If the strings are known at compile time, the most efficient way to store them without gaps is to use the Pascal Macro Compiler. If the strings are not known until run time, the most efficient way is to allocate a large block of storage, and move the strings in end-to-end, using a separate array to hold either pointers or indices into the large block.
Not necessarily. There are various data structures such as hashes, lists and trees for a reaons. Each have their particular advantages and disadvantages. Which one is best, depends on the application and is not always easy to decide. Claiming that any particular method is "the most efficient" without qualification doesn't give your statements much credibility, sorry.
PS: Can you please turn off HTML mail. At least you got the quoting right this time. :-)
Frank
At 19:58 +0200 11/7/05, Frank Heckenbach wrote:
(But again, I'm quite sure that Peter's question referred to runtime storage, so that's all kind of beside the point.)
Yep, that's true. And while I appreciate the elegance of the New( s, size ) approach, I'd still be interested to know how, or if, it is possible, at run time, to pack together GPC schema strings in a block of allocated memory, such that:
a) there is not much wasted memory (ie, wasting SizeOf(Integer) is fine, wasting 200+ bytes for a short string is not so good). b) the strings can be accessed directly (with ^String())
New is very clean, but it has a number of disadvantages:
a) Memory managers do not always work well with vast numbers of small allocations. If you know in advance that strings will be added in sequence in large quantities, you can do much better than a general memory manager.
b) Memory allocation can be a reasonably expensive operation.
c) Memory blocks inevitably have some overhead, often as much as 16 bytes, to handle the memory allocator storage information.
Of course, any solution should only be implemented after profiling demonstrates that the simple clean solution is an expensive bottleneck - I'd just like to know if there is a way to pack Schema strings together to have that as an an option. Then I can compare and contrast the expense.
The Pascal Macro Compiler is not rally what I was looking for. It's a clever little chunk of code, but it's not what I had in mind.
Thanks, Peter.
Peter N Lewis wrote:
And while I appreciate the elegance of the New( s, size ) approach, I'd still be interested to know how, or if, it is possible, at run time, to pack together GPC schema strings in a block of allocated memory, such that:
a) there is not much wasted memory (ie, wasting SizeOf(Integer) is fine, wasting 200+ bytes for a short string is not so good).
The latter never happens AFAIK, unless you allocate a string of a fixed "maximum" size.
b) the strings can be accessed directly (with ^String())
New is very clean, but it has a number of disadvantages:
a) Memory managers do not always work well with vast numbers of small allocations.
Do you have details? Of course, it depends on waht you mean by "not always". There probably are some better and some worse, but do you know which systems have bad ones? Fixing the MM would be a much more general solution than a hand-made solution for Pascal, strings, if explicitly used ...
Of course, any solution should only be implemented after profiling demonstrates that the simple clean solution is an expensive bottleneck - I'd just like to know if there is a way to pack Schema strings together to have that as an an option. Then I can compare and contrast the expense.
You can, of course, always create your pointers whichever way you like, taking care of alignment and sizes yourself. To get the size, you can use `SizeOf'. However, since it requires a type, or expressions of this type, it's a little more tricky to use here. This is one solution, perhaps the easiest one (pass the length of the string to be created).
function SizeOfString (Length: Integer): SizeType; type t = String (Length); begin SizeOfString := SizeOf (t) end;
But there's another problem. The string (in particular, the capacity) must be initialized. `New' does this automatically. You can call `Initialize (MyNewPointer^)', but for this to work, the pointer must know the capacity. So it must be declared as such. The following should work. In this example, you can replace the `GetMem' call with anything that points p to a properly aligned memory block of the given size.
program StringNewDemo;
type PString = ^String;
function StringNew (Length: Integer): PString; type t = String (Length); var p: ^t; begin GetMem (p, SizeOf (p^)); Initialize (p^); StringNew := p end;
var p: PString;
begin p := StringNew (7); p^ := 'foo bar'; WriteLn (p^) end.
Alternatively, you can hook into GPC's memory allocation. See GetMemPtr, FreeMemPtr, ReAllocMemPtr in gpc.pas. This way you can redirect *all* de-/allocations, but you can swap these routine pointers any time, to restrict it to certain pointers. (Or, e.g., you could check the requested size, handle small allocations yourself and pass larger to the original routines, which you saved before hooking the pointers. This way, you can handle not only strings, but all smaller allocations. This may or may not be what you want.)
When doing this, you can then allocate your strings simply with `New'.
Note that memory allocated differently must not be freed via the default routines, i.e. either never `Dispose' your strings at all, or hook FreeMemPtr as well. Also note that some library routines may de-/allocate memory, so to avoid confusion better make sure de-/allocation is either always or never hooked while library functions may be called.
Frank
At 2:05 +0200 12/7/05, Frank Heckenbach wrote:
a) there is not much wasted memory (ie, wasting SizeOf(Integer) is fine, wasting 200+ bytes for a short string is not so good).
The latter never happens AFAIK, unless you allocate a string of a fixed "maximum" size.
Which would be what you'd need to do if you wanted to allocate a block of memory containing multiple strings - or at least, that is what I'm trying to avoid while allocating a block of memory.
a) Memory managers do not always work well with vast numbers of small allocations.
Do you have details? Of course, it depends on waht you mean by "not always". There probably are some better and some worse, but do you know which systems have bad ones? Fixing the MM would be a much more general solution than a hand-made solution for Pascal, strings, if explicitly used ...
Memory manages vary from system to system, version to version. I know early Mac memory manages did not like large numbers of small pointers. Of course, that was a different processor and a different operating system to modern Macs. Memory managers are tuned very carefully, but they are very general purpose in nature, it is inevitable they will not be optimum for many particular tasks.
Of course, any solution should only be implemented after profiling demonstrates that the simple clean solution is an expensive bottleneck - I'd just like to know if there is a way to pack Schema strings together to have that as an an option. Then I can compare and contrast the expense.
You can, of course, always create your pointers whichever way you like, taking care of alignment and sizes yourself. To get the size, you can use `SizeOf'. However, since it requires a type, or expressions of this type, it's a little more tricky to use here. This is one solution, perhaps the easiest one (pass the length of the string to be created).
function SizeOfString (Length: Integer): SizeType; type t = String (Length); begin SizeOfString := SizeOf (t) end;
Ok, great. That's what I wanted to know.
I wrote up a test program (attached below). It allocates 10000 x 20 strings, ranging in size from 5 to 24 characters. On my Mac, the New one took about 50% more time to run than the single block allocation one. Caveats:
* The single block size is pre-calculated, to be fair it needs to be calculated based on the sum of the required sizes.
* On the flip side, nothing is ever released, and Disposing all the pointers would certainly take longer than disposing the single block.
* Perhaps most importantly, the New solution took a few minutes to write and run reliably. The Ptr based one took much longer to write and required several corrections to the code before it past the tests. Clearly it is far more complicated, convoluted, and error prone.
I fear I am still stuck in the old days of caring too much about memory and cycles. Mind you, even with a machine running several billion instructions per second, it is amazing how often I am waiting for it to finish some task, so perhaps these are still issues - just issues that should be carefully chosen...
Thanks muchly for the assistance though, I feel I am much more in command of Schemas and the way things work now (both the clean way and the dirty way), although I certainly have a lot of new GPC features to learn to use to simplify some of the crud left over in my code from 68k days! Peter.
program peterZ;
type UInt8 = Cardinal attribute( size = 8 ); UInt8Ptr = ^UInt8; UInt16 = Cardinal attribute( size = 16 ); UInt16Ptr = ^UInt16; UInt32 = Cardinal attribute( size = 32 ); UInt32Ptr = ^UInt32; UInt64 = Cardinal attribute( size = 64 ); UInt64Ptr = ^UInt64; String255 = String(255); String255Ptr = ^String255; StringPtr = ^String;
function TheTime: UInt64; var t: TimeStamp; begin GetTimeStamp( t ); return ((t.Hour * 60 + t.Minute) * 60 + t.Second) * 1000000 + t.Microsecond; end;
const kMaxStrings = 20; kRepeats = 10000; kTotalStrings = kMaxStrings * kRepeats;
var gTheStrings: array[1..kMaxStrings] of String255;
type StringArray = array[1..kTotalStrings] of StringPtr; StringArrayPtr = ^StringArray;
procedure InitTheStrings; var s: String255; i: Integer; begin s := 'test'; for i := 1 to kMaxStrings do begin s := s + Chr(Ord('A') + i - 1); gTheStrings[i] := s; end; end;
procedure TestNew( var sap: StringArrayPtr ); var i, j, p: Integer; begin New(sap); p := 1; for i := 1 to kRepeats do begin for j := 1 to kMaxStrings do begin New( sap^[p], Length( gTheStrings[j] ) ); sap^[p]^ := gTheStrings[j]; Inc(p); end; end; end;
procedure TestPtr( var sap: StringArrayPtr );
procedure StringNew( var storagep: StringPtr; var sp: StringPtr; const s: String ); type StringN = String (Length(s)); StringNPtr = ^StringN; var p: StringNPtr; siz: Integer; begin sp := StringPtr(storagep); p := StringNPtr(storagep); Initialize (p^); p^ := s; siz := (SizeOf(StringN)+3) and not 3; // Round up to four bytes Inc(UInt32(storagep),siz); end;
type StorageSpace = array[1..kTotalStrings*10] of UInt32; // 40 bytes per string var storage: ^StorageSpace; storagep: StringPtr; i, j, p: Integer; begin New(sap); New(storage); storagep := StringPtr(storage); p := 1; for i := 1 to kRepeats do begin for j := 1 to kMaxStrings do begin StringNew( storagep, sap^[p], gTheStrings[j] ); Inc(p); end; end; end;
function CheckSap( sap: StringArrayPtr ): String255; var good: Boolean; i, j, p: Integer; begin good := true; for i := 1 to kRepeats do begin for j := 1 to kMaxStrings do begin good := good and (sap^[(i-1)*kMaxStrings+j]^ = gTheStrings[j]); end; end; if good then begin return 'OK'; end else begin return 'failed'; end; end;
type TestProcType = procedure( var sap: StringArrayPtr );
procedure RunTest( const name: String; TestProc: TestProcType ); var start: UInt64; sap: StringArrayPtr; begin start := TheTime; TestProc( sap ); WriteLn( name, ': ', TheTime - start, ' ', CheckSap( sap ) ); Dispose( sap ); sap := nil; end;
var start: UInt64; sap: StringArrayPtr; begin WriteLn( TheTime ); InitTheStrings; RunTest( 'TestNew', TestNew ); RunTest( 'TestPtr', TestPtr ); RunTest( 'TestNew', TestNew ); RunTest( 'TestPtr', TestPtr ); end.
Peter N Lewis wrote:
- Perhaps most importantly, the New solution took a few minutes to
write and run reliably. The Ptr based one took much longer to write and required several corrections to the code before it past the tests. Clearly it is far more complicated, convoluted, and error prone.
Obviously. ;-)
I fear I am still stuck in the old days of caring too much about memory and cycles. Mind you, even with a machine running several billion instructions per second, it is amazing how often I am waiting for it to finish some task, so perhaps these are still issues - just issues that should be carefully chosen...
Since I got my current machine, I'm often "shocked" how many simultaneous tasks I have to start to keep it busy (or lose overview myself ;-) ...
Just two comments to make your code more portable:
siz := (SizeOf(StringN)+3) and not 3; // Round up to four bytes
Better `SizeOf (Integer) - 1)' instead of 3.
Inc(UInt32(storagep),siz);
Better PtrCard instead of UInt32 (a pointer is not always 32 bit).
Frank
Frank Heckenbach wrote:
Just two comments to make your code more portable:
PS: I think this type cast is unnecessary:
sp := StringPtr(storagep);
Frank
On 11 Jul 2005 at 13:37, Contestcen@aol.com wrote:
[...]
Conclusion: If the strings are known at compile time, the most efficient way to store them without gaps is to use the Pascal Macro Compiler.
A very bold claim indeed - and I am not sure how you can make such a claim, without seeing every string-handling routine that every programmer has written.
If the strings are not known until run time, the most efficient way is to allocate a large block of storage, and move the strings in end-to-end, using a separate array to hold either pointers or indices into the large block.
When the block is full, allocate another large block using New. There are two ways to manage the storage. (1) Extension block method: Use a second array to point to the set of large blocks. Each string will be accessed using two pointers or two indices. The first will point to the particular block, and the second will point to the string within the block. (2) Single block method: Make the new block twice as large as the previous block, move the existing strings into the new block and then Dispose the old block.
Very convoluted. A linked list or string collection would do just fine. And I am still not sure how you can make the claim that one method or the other is the "most efficient" way of doing something.
Best regards, The Chief -------- Prof. Abimbola A. Olowofoyeku (The African Chief) web: http://www.greatchief.plus.com/