This issue came up on the MacPascal list, but since I recently ran across an RTS thread safety issue, I thought I'd post it here too, at least for reference if not for anything else.
There is a multi-platform multi-threading unit for gpc, see http://www.gnu-pascal.de/crystal/gpc/en/mail10658.html (but also read http://www.gnu-pascal.de/crystal/gpc/en/thread10512.html).
One thing I didn't see mentioned in the thread, but I discovered recently is that the routines like ReadStr/WriteStr are not thread safe. In particular they use a global variable (LastReadWriteStrFDR) for the file handle internally (to avoid re-allocating it each time), which is not MP thread safe. Also, they use an IOResult global variable (InOutRes variable).
To properly handle MP threading, the RTS would, at the least, require everything to have a returned error result, rather than using IOResult, which would be a big change (although this would be desirable for New as well).
This change would be incompatible with existing RTS routines, although it could possible be accomplished by placing a thread safe layer underneath the thread-unsafe standard RTS API, for example (very loose pseudo code) :
procedure WriteStr( xxx ) var error: yyy; begin fh := GetReadWriteStrFDR; error := WriteStrSafe( fh, xxx ) InOutRes := error; conditionally exit with a runtime error end;
It would also likely require additions/changes to the way the compiler interacted with the RTS.
Any way you slice it, it'd be a fair amount of work to be entirely thread safe.
Enjoy, Peter.
Peter N Lewis wrote:
One thing I didn't see mentioned in the thread, but I discovered recently is that the routines like ReadStr/WriteStr are not thread safe. In particular they use a global variable (LastReadWriteStrFDR) for the file handle internally (to avoid re-allocating it each time), which is not MP thread safe. Also, they use an IOResult global variable (InOutRes variable).
That's right. And there are more global variables in the RTS, including user-visible ones such as `Input' and `Output', and internal ones such as `FDRList'. Though a few are only initialized once (e.g., by the special RTS command-line options) and never changed, so probably harmless. Others such as `ExpEpsReal' in `Ln1Plus' are initilaized on demand, so they can be shared, but initialization has to avoid races (it doesn't currently, and I'm not sure if just moving the `Inited := True' statement would be really safe -- it may need a mutex in the worst case).
So I think we should start with a list of RTS variables and classify them to find out how to deal with each. Did you start making such a list already perhaps?
To properly handle MP threading, the RTS would, at the least, require everything to have a returned error result, rather than using IOResult, which would be a big change (although this would be desirable for New as well).
This change would be incompatible with existing RTS routines,
And unfortunately also with existing Pascal standards (for `New') or dialects (for `IOResult').
although it could possible be accomplished by placing a thread safe layer underneath the thread-unsafe standard RTS API, for example (very loose pseudo code) :
procedure WriteStr( xxx ) var error: yyy; begin fh := GetReadWriteStrFDR; error := WriteStrSafe( fh, xxx ) InOutRes := error; conditionally exit with a runtime error end;
It would also likely require additions/changes to the way the compiler interacted with the RTS.
This way would not be BP compatible, since the raising of runtime errors must be done in the caller as it depends on the {$I+} setting (at the place of call). Therefore (and because BP allows direct access to InOutRes), it probably has to be a global variable. -- Or at least look like one. The compiler could rewrite read and write usage of it, but what about passing it by reference? BP allows this, and I'm not sure this isn't done in practice (ATM I'm not even sure WRT my own code, though I could check that).
If we look at C, they have a similar issue with `errno'. AFAIK, thread libraries rewrite it (via a macro) to something like `(foo^)' (in Pascal syntax) where foo is a thread-dependent function or expression that returns the address of a per-thread variable. This allows all kinds of usage including reference passing (address taking in C), so it might work with some effort.
Any way you slice it, it'd be a fair amount of work to be entirely thread safe.
I suppose so. It's probably obvious from previous mails that I'm not really a fan of multithreading -- so far, in every case others would have (or have) said requires threading, I solved it with either multiplexed I/O (poll, select) or separate processes (with separate data) -- which are not without problems, either, but to me they seemed easier to handle. I just mention this to say that I'm not going to spend much effort in this direction myself.
Anyway, before someone starts working on it, we should carefully discuss how to do it, and weigh the advantages and disadvantages.
One disadvantages I see with your approach is that it adds overhead, even to non-threaded programs, something I obviously don't like. I also wouldn't like requiring a thread library to start with. (I've had bad experience trying to even build some foreign programs that required particular thread libraries ... incompatible versions and what not.) This probably menas we'll need two versions of the RTS library, though, of course, we should be able to build them from the same source, using conditionals -- or even a compiler flag (we might need one for the compiler/RTS interface anyway, as you say, so perhaps we can test this flag with {$ifopt} in the RTS, but that's just a detail).
We should also, as far as possible, concentrate the differences in as few places as possible, even if this means using some macros or other ugly tricks. But first of all, the fewer variables we need to deal with, the easier it will become.
Of course, apart from variables, we have to look at non-thread-safe library routines. (I'm not sure which are -- what about malloc etc. for a start? Does libc make it thread-safe already?)
Frank
At 1:54 +0100 24/2/06, Frank Heckenbach wrote:
So I think we should start with a list of RTS variables and classify them to find out how to deal with each. Did you start making such a list already perhaps?
It's a good idea. No, I haven't made a start.
My solution in general is simply not to use the RTS at all (so for example, I just use my own NumToStr routine instead of using ReadStr to parse numbers, and I've always used my own memory allocation routines (except for New(obj) which can't be avoided) and file access routines.
This change would be incompatible with existing RTS routines,
And unfortunately also with existing Pascal standards (for `New') or dialects (for `IOResult').
Yes, that was really what I meant to say. Incompatible with not only the existing RTS (and hence programs using it), but existing Pascal standards (and hence programs using them being ported to GPC).
although it could possible be accomplished by placing a thread safe layer underneath the thread-unsafe standard RTS API, for example (very loose pseudo code) :
procedure WriteStr( xxx ) var error: yyy; begin fh := GetReadWriteStrFDR; error := WriteStrSafe( fh, xxx ) InOutRes := error; conditionally exit with a runtime error end;
It would also likely require additions/changes to the way the compiler interacted with the RTS.
This way would not be BP compatible, since the raising of runtime errors must be done in the caller as it depends on the {$I+} setting (at the place of call).
My point was more that we could implement a thread safe RTS underneath a thread-unsafe API. People needing thread safeness could then use the thread safe API to the RTS instead of the thread unsafe standard API.
Therefore (and because BP allows direct access to InOutRes), it probably has to be a global variable. -- Or at least look like one. The compiler could rewrite read and write usage of it, but what about passing it by reference? BP allows this, and I'm not sure this isn't done in practice (ATM I'm not even sure WRT my own code, though I could check that).
If we look at C, they have a similar issue with `errno'. AFAIK, thread libraries rewrite it (via a macro) to something like `(foo^)' (in Pascal syntax) where foo is a thread-dependent function or expression that returns the address of a per-thread variable. This allows all kinds of usage including reference passing (address taking in C), so it might work with some effort.
Yes, I believe errno is handled that way, as a macro to access a per-thread stored variable.
We could presumably do the same thing with InOutRes and some compiler tricks (so the compiler silently converts InOutRes := 123 to ThreadSetInOutRes( 123 ) for example). But that requires the RTS knowing about per-thread variables, so I'm not sure it is the right way to go.
Any way you slice it, it'd be a fair amount of work to be
entirely thread safe.
I suppose so. It's probably obvious from previous mails that I'm not really a fan of multithreading -- so far, in every case others would have (or have) said requires threading, I solved it with either multiplexed I/O (poll, select) or separate processes (with separate data) -- which are not without problems, either, but to me they seemed easier to handle. I just mention this to say that I'm not going to spend much effort in this direction myself.
Same here, but for different reasons - I use multithreading, but don't use the RTS, so its not a high priority for me either.
One disadvantages I see with your approach is that it adds overhead, even to non-threaded programs, something I obviously don't like. I also wouldn't like requiring a thread library to start with. (I've had bad experience trying to even build some foreign programs that required particular thread libraries ... incompatible versions and what not.) This probably menas we'll need two versions of the RTS library, though, of course, we should be able to build them from the same source, using conditionals -- or even a compiler flag (we might need one for the compiler/RTS interface anyway, as you say, so perhaps we can test this flag with {$ifopt} in the RTS, but that's just a detail).
I agree that the RTS should not depend on the thread library, especially because it varies so much and there is no single thread library (there are at least three different ones on Mac OS, probably more). Similarly, there are different APIs for different mutexes and such.
That's why I'd suggest rather than have the RTS become thread aware, instead have the RTS provide an API that is inherently thread safe because it has kicked the thread-unsafeness up to the programmer. They can then cheerfully ignore the issue if they aren't threaded, or handle it appropriately for their threading architecture if they are.
As far as overhead, generally I would expect this to be very minimal, since the typical requirement is that the thread safe RTS API does part of the job, and the thread unsafe standard API does the rest in the same way it does now. So it probably is not much more than an extra procedure call overhead, though obviously there may be worse cases than that.
We should also, as far as possible, concentrate the differences in as few places as possible, even if this means using some macros or other ugly tricks. But first of all, the fewer variables we need to deal with, the easier it will become.
Of course, apart from variables, we have to look at non-thread-safe library routines. (I'm not sure which are -- what about malloc etc. for a start? Does libc make it thread-safe already?)
That varies from system to system. Most everything under Mac OS X is thread safe now, but not all of it was in earlier releases.
This is another issue which makes the whole problem quite hard. Saying the RTS (or even some API of it) is thread safe is probably not possible without qualifying what systems it is running on.
Anyway, I wasn't really suggesting that a major exercise be undertaken to make the RTS thread safe, more to ensure people are aware that it isn't (and to keep in peoples minds that those parts of the RTS that cannot be avoided, eg parts that the compiler calls without direct knowledge by the programmer, really need to be thread safe).
Enjoy, Peter.
Peter N Lewis wrote:
At 1:54 +0100 24/2/06, Frank Heckenbach wrote:
So I think we should start with a list of RTS variables and classify them to find out how to deal with each. Did you start making such a list already perhaps?
It's a good idea. No, I haven't made a start.
My solution in general is simply not to use the RTS at all (so for example, I just use my own NumToStr routine instead of using ReadStr to parse numbers, and I've always used my own memory allocation routines (except for New(obj) which can't be avoided) and file access routines.
As you prefer, but since many features, including most I/O, memory allocation and string routines, call the RTS internally, I'm not sure I'd call the remains really Pascal. ;-) And we don't actually guarantee that any particular built-in feature does/will not use RTS routines. The RTS is tied rather strictly to the compiler (unlike C and libc, e.g. -- more like GCC and the internally called libgcc.a).
This change would be incompatible with existing RTS routines,
And unfortunately also with existing Pascal standards (for `New') or dialects (for `IOResult').
Yes, that was really what I meant to say. Incompatible with not only the existing RTS (and hence programs using it), but existing Pascal standards (and hence programs using them being ported to GPC).
Or newly written for GPC, using any of those standard or dialect features, which includes pretty much everything of my own code, for example ...
although it could possible be accomplished by placing a thread safe layer underneath the thread-unsafe standard RTS API, for example (very loose pseudo code) :
procedure WriteStr( xxx ) var error: yyy; begin fh := GetReadWriteStrFDR; error := WriteStrSafe( fh, xxx ) InOutRes := error; conditionally exit with a runtime error end;
It would also likely require additions/changes to the way the compiler interacted with the RTS.
This way would not be BP compatible, since the raising of runtime errors must be done in the caller as it depends on the {$I+} setting (at the place of call).
My point was more that we could implement a thread safe RTS underneath a thread-unsafe API. People needing thread safeness could then use the thread safe API to the RTS instead of the thread unsafe standard API.
For user-called routines, sure. But IMHO the tricky parts are the compiler-called ones such as your example WriteStr, and of course, the whole I/O-result handling (which depends to a large degree on compiler magic).
Therefore (and because BP allows direct access to InOutRes), it probably has to be a global variable. -- Or at least look like one. The compiler could rewrite read and write usage of it, but what about passing it by reference? BP allows this, and I'm not sure this isn't done in practice (ATM I'm not even sure WRT my own code, though I could check that).
If we look at C, they have a similar issue with `errno'. AFAIK, thread libraries rewrite it (via a macro) to something like `(foo^)' (in Pascal syntax) where foo is a thread-dependent function or expression that returns the address of a per-thread variable. This allows all kinds of usage including reference passing (address taking in C), so it might work with some effort.
Yes, I believe errno is handled that way, as a macro to access a per-thread stored variable.
We could presumably do the same thing with InOutRes and some compiler tricks (so the compiler silently converts InOutRes := 123 to ThreadSetInOutRes( 123 ) for example). But that requires the RTS knowing about per-thread variables, so I'm not sure it is the right way to go.
This way wouldn't cope with reference passing (address taking), so we'd need something more like `ThreadGetInOutResPointer^ := 123'. But according to Waldek, this won't be necessary when the backend supports per-thread variables.
That's why I'd suggest rather than have the RTS become thread aware, instead have the RTS provide an API that is inherently thread safe because it has kicked the thread-unsafeness up to the programmer.
For some parts, this could work. And for some, we can perhaps even eliminate the problems. (Some global variables might be avoidable, though probably not many, as I usually don't use them unwarranted. Some are only there for optimization, such as LastReadWriteStrFDR, and some are mostly read-only such as ExpEpsReal, perhaps we can do something easier with them.)
For non-thread-safe routines only called by the user we can probably add thread-safe interfaces where necessary (and perhaps declare obsolete and later remove the other ones, unless they are required for compatibility with something).
But again, the real problem are the compiler-called ones. Just one example, the I/O system keeps a list of all currently open files. This list must obviously be global. So it seems its access should be somehow synchronized (mutex etc.) -- which, in this particular case is not a matter of efficiency, as opening and closing are "rare" and costly operations anyway, but it raises the questions of thread library selection and dependence. Unless perhaps, we provide hooks for the synchronization code that default to nops and that the user has to fill in if threaded. This is just a quick idea, and may be a kludge, I'd have to think about it more. (And whether it's really viable might depend on how many such areas really are there, which I don't know offhand.)
BTW, I'm aware of the irony -- hooks = global (pointer) variables, which are generally to be avoided in threading. (But obviously, these hooks would have to be set before starting threads, and not changed within threads, so they're not actually a problem AFAICS.)
Of course, apart from variables, we have to look at non-thread-safe library routines. (I'm not sure which are -- what about malloc etc. for a start? Does libc make it thread-safe already?)
That varies from system to system.
So we can only assume what's guaranteed by POSIX etc.
This is another issue which makes the whole problem quite hard. Saying the RTS (or even some API of it) is thread safe is probably not possible without qualifying what systems it is running on.
POSIX-compatible (well, mostly ;-) systems, as usual.
Anyway, I wasn't really suggesting that a major exercise be undertaken to make the RTS thread safe,
Since several people have asked for this over time, it suggest it should perhaps be done sometime. But I'm not one of them. ;-)
more to ensure people are aware that it isn't (and to keep in peoples minds that those parts of the RTS that cannot be avoided, eg parts that the compiler calls without direct knowledge by the programmer, really need to be thread safe).
Which is probably the hardest part anyway, unfortunately ...
Frank
Peter N Lewis wrote:
This issue came up on the MacPascal list, but since I recently ran across an RTS thread safety issue, I thought I'd post it here too, at least for reference if not for anything else.
There is a multi-platform multi-threading unit for gpc, see http://www.gnu-pascal.de/crystal/gpc/en/mail10658.html (but also read http://www.gnu-pascal.de/crystal/gpc/en/thread10512.html).
One thing I didn't see mentioned in the thread, but I discovered recently is that the routines like ReadStr/WriteStr are not thread safe. In particular they use a global variable (LastReadWriteStrFDR) for the file handle internally (to avoid re-allocating it each time), which is not MP thread safe. Also, they use an IOResult global variable (InOutRes variable).
To properly handle MP threading, the RTS would, at the least, require everything to have a returned error result, rather than using IOResult, which would be a big change (although this would be desirable for New as well).
There is no need to eliminate variables. GCC (starting from version 3.3) supports thread local variables. At least on Linux access to thread local variables is reasonably cheap (only slightly more expensive then nomal variables). So, one can turn IOResult into thread local variable and use it as before.
Of course, for this also GPC should support thread local variables. ATM GPC does not support thread local variables, mainly because we want to have the same feature set with all backends and older backends (IIRC 3.2 and earlier) do not support them.
Also, thread safety may have considerable cost on single-threaded programs. For example Java and glibc try to be thread safe, and in effect some single-threaded run about 5 times slower than thread unsafe versions. I think that we shall pay the price, but I do not know what other think.
Waldek Hebisch wrote:
There is no need to eliminate variables. GCC (starting from version 3.3) supports thread local variables. At least on Linux access to thread local variables is reasonably cheap (only slightly more expensive then nomal variables). So, one can turn IOResult into thread local variable and use it as before.
Ah, good! Still, IMHO, we should check which RTS variables don't need to be thread-local to avoid wasting effort. I.e., those that are initialized only once (possibly taking care of the initalization).
And we need to consider external side-effects, i.e. does it make sense to have thread-local `Input', each reading into its own, thread-local buffer? I suppose not, so we probably do need other synchronisation mechanisms in the RTS.
Of course, for this also GPC should support thread local variables. ATM GPC does not support thread local variables, mainly because we want to have the same feature set with all backends and older backends (IIRC 3.2 and earlier) do not support them.
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
BTW, how are they activated? As a special attribute or something like this? I.e., special handling in the frontend required only at declaration time, or also on usage?
Also, thread safety may have considerable cost on single-threaded programs. For example Java and glibc try to be thread safe, and in effect some single-threaded run about 5 times slower than thread unsafe versions. I think that we shall pay the price, but I do not know what other think.
I don't agree, i.e. we should be able to turn it on when needed only. As I wrote, this probably requires two versions of the RTS to be compiled and some options etc., but this seems a smaller price to pay in the end.
Frank
On Fri, 24 Feb 2006, Frank Heckenbach wrote:
Waldek Hebisch wrote:
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
Actually, I'm still running GCC 2.95 on several of my older UNIX boxes because my efforts to compile newer versions there have failed. Not that I need a new GPC on any of them, but it goes to show 2.95 is still in active use in at least a few places.
That said, I think you can safely drop support for GCC versions prior to 2.95.
Back to lurking...
--------------------------| John L. Ries | Salford Systems | Phone: (619)543-8880 x107 | or (435)865-5723 | --------------------------| --- [This E-mail scanned for viruses by Declude EVA]
John L. Ries wrote:
On Fri, 24 Feb 2006, Frank Heckenbach wrote:
Waldek Hebisch wrote:
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
Actually, I'm still running GCC 2.95 on several of my older UNIX boxes because my efforts to compile newer versions there have failed. Not that I need a new GPC on any of them, but it goes to show 2.95 is still in active use in at least a few places.
That said, I think you can safely drop support for GCC versions prior to 2.95.
Sooner or later support for 2.95 will be dropped -- in fact, probably together with 2.8.1, as many relevant internal differences (mostly the backend memory management) are the same, and we really should get rid of them sometime. So, while it's not urgent yet, you might try, when you find the time, to build newer versions on those machines and report any problems. Which kind of systems are they, BTW?
Frank
On Fri, 24 Feb 2006, Frank Heckenbach wrote:
John L. Ries wrote:
On Fri, 24 Feb 2006, Frank Heckenbach wrote:
Waldek Hebisch wrote:
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
Actually, I'm still running GCC 2.95 on several of my older UNIX boxes because my efforts to compile newer versions there have failed. Not that I need a new GPC on any of them, but it goes to show 2.95 is still in active use in at least a few places.
That said, I think you can safely drop support for GCC versions prior to 2.95.
Sooner or later support for 2.95 will be dropped -- in fact, probably together with 2.8.1, as many relevant internal differences (mostly the backend memory management) are the same, and we really should get rid of them sometime. So, while it's not urgent yet, you might try, when you find the time, to build newer versions on those machines and report any problems. Which kind of systems are they, BTW?
Frank
IBM AIX 4.2 DEC UNIX 5.0 SGI IRIX 6.5
--------------------------| John L. Ries | Salford Systems | Phone: (619)543-8880 x107 | or (435)865-5723 | --------------------------| --- [This E-mail scanned for viruses by Declude EVA]
John L. Ries wrote:
Sooner or later support for 2.95 will be dropped -- in fact, probably together with 2.8.1, as many relevant internal differences (mostly the backend memory management) are the same, and we really should get rid of them sometime. So, while it's not urgent yet, you might try, when you find the time, to build newer versions on those machines and report any problems. Which kind of systems are they, BTW?
IBM AIX 4.2 DEC UNIX 5.0 SGI IRIX 6.5
I succesfully built GPC with gcc-3.2.3 on AIX 5.1, and with gcc-3.3.3 on IRIX 64 6.5. Both were in 2004, but I don't know that newer GPC or GCC versions will cause new problems. (The GCC versions are just those I happened to use then, and probably not strict requriemets.)
I don't know if AIX 4.2 is much different from 5.1, and I haven't used DEC UNIX at all.
If you want to try it, I suggest you start with Waldek's last GPC snapshot and a 3.4 version of GCC (trying older GCC versions only if there are unsourmountable problems, as those will eventually be dropped as well). If it doesn't work, you might try building GCC without GPC, to find out whether the problems are in GCC or GPC.
Frank
Hi,
On Sat, Feb 25, 2006 at 05:27:17AM +0100, Frank Heckenbach wrote:
I don't know if AIX 4.2 is much different from 5.1, and I haven't used DEC UNIX at all.
AIX 4.2 is something completely different from 5.1 (no 64 bit support, etc.) - but as GCC is available for it, I would be surprised if GPC won't work "out of the box".
gert
Gert Doering wrote:
Hi,
On Sat, Feb 25, 2006 at 05:27:17AM +0100, Frank Heckenbach wrote:
I don't know if AIX 4.2 is much different from 5.1, and I haven't used DEC UNIX at all.
AIX 4.2 is something completely different from 5.1 (no 64 bit support, etc.) - but as GCC is available for it, I would be surprised if GPC won't work "out of the box".
Do recent GCC versions (3.x, in particular 3.4.x) work well on 4.2? If so, GPC should also, indeed.
Frank
Forwarding the reply from Jonas Mabe to the MacPascal mailing list:
a) first an index into some structure which contains the value of this variable in case of a multi-threaded program
How are these indices allocated? AFAICS, they must be globally unique, even across units. Do you do something like a base-index per unit, and running indices within unit? Or an extra pass in the main program to sort all indices in the units used?
All threadvars in a compilation unit are collected in a threadvarlist, e.g. in case of the example program it becomes:
.globl THREADVARLIST_P$PROGRAM THREADVARLIST_P$PROGRAM: .long U_P$PROGRAM_A .long 4,0
As you can see, it consists of "address of variable" followed by the size of the variable. A zero marks the end of the threadvarlist.
The threadvarlists of all compilation units are collected when compiling a main program or library (similar to how the initialisation routines of all used units are collected, I guess GPC also already contains some mechanism for this). We then generate a threadvartable from this in the program/library:
.globl FPC_THREADVARTABLES FPC_THREADVARTABLES: .long 2 .long THREADVARLIST_SYSTEM .long THREADVARLIST_P$PROGRAM
(format: nr of entries followed by the addresses of the threadvarlists)
When the first thread is started (via BeginThread(), we don't detect if someone uses pthread_create or so), the thread manager walks the threadvartable and the referenced lists and fills in all indexes. This thread manager is fully pluggable, so everyone is free to use his own (e.g. if you implement some user space fibers or so).
My speaking of PPC assembler is not really good, but I read it as something like this, right?
if FPC_THREADVAR_RELOCATE <> nil then r3 := FPC_THREADVAR_RELOCATE^ (a) else r3 := Pointer (@a) + 4; LongInt (r3^) := 5;
Correct.
Of course, such a solution for ioresult is slower than returning an error from a function,
Sure, the function call is expensive. Of course, it's only needed when actually multi-threaded which is a plus as far as I'm concerned. Still, it would be good if it could be avoided. One way might be to store offsets instead of indices, but computing globally-contiguous offsets is even harder than globally-unique indices. AFAICS, it could be done in an extra pass in the compiler or perhaps easier by automatically "registering" all thread variables at runtime.)
At least FPC's pthread-based thread manager, it is an offset. But this offset is still relative to something thread-unique, which you have to lookup each time the variable is accessed. That's what FPC_RELOCATE_THREADVAR does:
function CRelocateThreadvar(offset : dword) : pointer; begin CRelocateThreadvar:=pthread_getspecific(tlskey)+Offset; end;
Hmm, I guess this dword should be replaced by a ptruint :) (although someone allocating more than 4GB of threadvars possibly deserves to crash ;)
So this might be an option. (But again, first I'd like to see which variables are actually affected and thus how big the effects would actually be. InOutRes might well be the worst, because most-often used one, but not the only one, of course.)
These are the threadvars in FPC's system unit:
ThreadVar ThreadID : TThreadID; { Standard In- and Output } ErrOutput, Output, Input, StdOut, StdErr : Text; InOutRes : Word; { Stack checking } StackBottom : Pointer; StackLength : SizeUInt;
There's a few more in TP-compatibility units (like doserror in the Dos unit and some crt things) and some for Delphi-style exception handling, but that's about it. Of course, users can also declare their own threadvars in their programs and units.
You can download our rtl via svn (http://www.freepascal.org/ develop.html#svn) and have a look how it's done. The support routines are in rtl/inc/threadvr.inc, rtl/unix/cthreads.pp (pthreads-based thread manager, in a separate unit because this is dependent on libc and most of our targets do not require/depend on libc by default) and rtl/win32systhrd.inc, rtl/emx/systhrd.inc, rtl/netware/systhrd.inc, rtl/os2/systhrd.inc (but the pthreads-based one can be used for all targets which have a libpthread, so I guess that's enough for GPC).
Since our RTL is under a slightly modified LGPL (allows static linking as long as you make the modifications to the FPC-RTL-licensed code available), license-wise I don't think there is any problem for you to reuse things. If there is, we could dual-license it under the regular LGPL as well I suppose (the main reason for the static linking exception is that some OS'es, like Dos, simply do not support dynamic linking, and support for creating dynamic libraries was not available for all OS'es in our compiler from the start either).
Jonas
_______________________________________________ MacPascal mailing list http://lists.sonic.net/mailman/listinfo/mac-pascal
Frank Heckenbach wrote:
Of course, for this also GPC should support thread local variables. ATM GPC does not support thread local variables, mainly because we want to have the same feature set with all backends and older backends (IIRC 3.2 and earlier) do not support them.
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
BTW, how are they activated? As a special attribute or something like this? I.e., special handling in the frontend required only at declaration time, or also on usage?
See http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html.
Thread-local storage (TLS) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The run-time model GCC uses to implement this originates in the IA-64 processor-specific ABI, but has since been migrated to other processors as well. It requires significant support from the linker (ld), dynamic linker (ld.so), and system libraries (libc.so and libpthread.so), so it is not available everywhere.
Not many targets support it (e.g. Darwin doesn't). So, I don't think we can use the feature (unfortunately).
At the user level, the extension is visible with a new storage class keyword: __thread. For example:
__thread int i; extern __thread struct state s; static __thread char *p;
Regards,
Adriaan van Os
Adriaan van Os wrote:
Frank Heckenbach wrote:
Of course, for this also GPC should support thread local variables. ATM GPC does not support thread local variables, mainly because we want to have the same feature set with all backends and older backends (IIRC 3.2 and earlier) do not support them.
Same feature set where reasonably possible. As earlier GCC versions are already obsolescent (I don't want to drop them immediately, but is there any real need for versions before 3.4 anymore?), we should not waste any effort back-porting things. An error message when trying to use them with an earlier backend should be clear enough, so those who need threads will know they have to use 3.3 or newer.
BTW, how are they activated? As a special attribute or something like this? I.e., special handling in the frontend required only at declaration time, or also on usage?
See http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html.
Thread-local storage (TLS) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The run-time model GCC uses to implement this originates in the IA-64 processor-specific ABI, but has since been migrated to other processors as well. It requires significant support from the linker (ld), dynamic linker (ld.so), and system libraries (libc.so and libpthread.so), so it is not available everywhere.
BTW, if this means it only works with dynamic linking, this would be another reason against. I wouldn't like to give up the option of static linking. (I'm not sure if it means this, or if for static linking the requirements of ld.so simply disappear and .so is replaced with .a.)
Not many targets support it (e.g. Darwin doesn't). So, I don't think we can use the feature (unfortunately).
Not so good. Unless it's foreseeable that a later backend version supports it everywhere, I agree, we shouldn't use it. So we'd have to roll our own, probably similar do how glibc does it for errno.
AFAIUI, the basic idea is to get some thread-unique value (which the thread library probably provides), and use it to retrieve a pointer to the thread-local variable (as an array index, via hashing, list search, etc. -- that's a matter of efficiency which has to be well considered) and automatically dereference the pointer (in C, this is done in a macro).
Frank
Forwarding a message from Jonas Mabe to the MacPascal mailing list:
Peter N Lewis wrote:
To properly handle MP threading, the RTS would, at the least, require everything to have a returned error result, rather than using IOResult, which would be a big change (although this would be desirable for New as well).
FWIW, in Free Pascal we solved this problem with the introduction of the "threadvar" keyword (which may have been copied from Delphi, I don't know). A threadvar consists of two parts:
a) first an index into some structure which contains the value of this variable in case of a multi-threaded program b) next room to contain the value of this variable in case of a single-threaded program.
E.g. the assignment in this program:
threadvar a: longint;
begin a := 5; end.
is compiled into this:
# [5] a := 5; lis r2,ha16(FPC_THREADVAR_RELOCATE) lwz r2,lo16(FPC_THREADVAR_RELOCATE)(r2) cmplwi cr0,r2,0 beq cr0,L5 lis r4,ha16(U_P$PROGRAM_A) lwz r3,lo16(U_P$PROGRAM_A)(r4) mtctr r2 bctrl b L6 L5: lis r3,ha16(U_P$PROGRAM_A+4) addi r3,r3,lo16(U_P$PROGRAM_A+4) L6: li r2,5 stw r2,0(r3)
Of course, such a solution for ioresult is slower than returning an error from a function, but on the other hand it is fully compatible with existing code (also in the RTL/RTS, except if such variables are somewhere used in assembler code).
Jonas
Adriaan van Os wrote:
Forwarding a message from Jonas Mabe to the MacPascal mailing list:
Peter N Lewis wrote:
To properly handle MP threading, the RTS would, at the least, require everything to have a returned error result, rather than using IOResult, which would be a big change (although this would be desirable for New as well).
FWIW, in Free Pascal we solved this problem with the introduction of the "threadvar" keyword (which may have been copied from Delphi, I don't know). A threadvar consists of two parts:
a) first an index into some structure which contains the value of this variable in case of a multi-threaded program
How are these indices allocated? AFAICS, they must be globally unique, even across units. Do you do something like a base-index per unit, and running indices within unit? Or an extra pass in the main program to sort all indices in the units used?
b) next room to contain the value of this variable in case of a single-threaded program.
Interesting idea. Of course, it adds a little storage (one index per variable) even in the single-threaded case, but since there shouldn't be too many thread variable, that seems quite acceptable.
E.g. the assignment in this program:
threadvar a: longint;
begin a := 5; end.
is compiled into this:
# [5] a := 5; lis r2,ha16(FPC_THREADVAR_RELOCATE) lwz r2,lo16(FPC_THREADVAR_RELOCATE)(r2) cmplwi cr0,r2,0 beq cr0,L5 lis r4,ha16(U_P$PROGRAM_A) lwz r3,lo16(U_P$PROGRAM_A)(r4) mtctr r2 bctrl b L6 L5: lis r3,ha16(U_P$PROGRAM_A+4) addi r3,r3,lo16(U_P$PROGRAM_A+4) L6: li r2,5 stw r2,0(r3)
My speaking of PPC assembler is not really good, but I read it as something like this, right?
if FPC_THREADVAR_RELOCATE <> nil then r3 := FPC_THREADVAR_RELOCATE^ (a) else r3 := Pointer (@a) + 4; LongInt (r3^) := 5;
Of course, such a solution for ioresult is slower than returning an error from a function,
Sure, the function call is expensive. Of course, it's only needed when actually multi-threaded which is a plus as far as I'm concerned. Still, it would be good if it could be avoided. One way might be to store offsets instead of indices, but computing globally-contiguous offsets is even harder than globally-unique indices. (AFAICS, it could be done in an extra pass in the compiler or perhaps easier by automatically "registering" all thread variables at runtime.)
but on the other hand it is fully compatible with existing code (also in the RTL/RTS, except if such variables are somewhere used in assembler code).
Fortunately, we don't have to worry about assembler code in the RTS.
So this might be an option. (But again, first I'd like to see which variables are actually affected and thus how big the effects would actually be. InOutRes might well be the worst, because most-often used one, but not the only one, of course.)
Frank
Forwarding the reply from Jonas Mabe to the MacPascal mailing list:
a) first an index into some structure which contains the value of this variable in case of a multi-threaded program
How are these indices allocated? AFAICS, they must be globally unique, even across units. Do you do something like a base-index per unit, and running indices within unit? Or an extra pass in the main program to sort all indices in the units used?
All threadvars in a compilation unit are collected in a threadvarlist, e.g. in case of the example program it becomes:
.globl THREADVARLIST_P$PROGRAM THREADVARLIST_P$PROGRAM: .long U_P$PROGRAM_A .long 4,0
As you can see, it consists of "address of variable" followed by the size of the variable. A zero marks the end of the threadvarlist.
The threadvarlists of all compilation units are collected when compiling a main program or library (similar to how the initialisation routines of all used units are collected, I guess GPC also already contains some mechanism for this). We then generate a threadvartable from this in the program/library:
.globl FPC_THREADVARTABLES FPC_THREADVARTABLES: .long 2 .long THREADVARLIST_SYSTEM .long THREADVARLIST_P$PROGRAM
(format: nr of entries followed by the addresses of the threadvarlists)
When the first thread is started (via BeginThread(), we don't detect if someone uses pthread_create or so), the thread manager walks the threadvartable and the referenced lists and fills in all indexes. This thread manager is fully pluggable, so everyone is free to use his own (e.g. if you implement some user space fibers or so).
My speaking of PPC assembler is not really good, but I read it as something like this, right?
if FPC_THREADVAR_RELOCATE <> nil then r3 := FPC_THREADVAR_RELOCATE^ (a) else r3 := Pointer (@a) + 4; LongInt (r3^) := 5;
Correct.
Of course, such a solution for ioresult is slower than returning an error from a function,
Sure, the function call is expensive. Of course, it's only needed when actually multi-threaded which is a plus as far as I'm concerned. Still, it would be good if it could be avoided. One way might be to store offsets instead of indices, but computing globally-contiguous offsets is even harder than globally-unique indices. AFAICS, it could be done in an extra pass in the compiler or perhaps easier by automatically "registering" all thread variables at runtime.)
At least FPC's pthread-based thread manager, it is an offset. But this offset is still relative to something thread-unique, which you have to lookup each time the variable is accessed. That's what FPC_RELOCATE_THREADVAR does:
function CRelocateThreadvar(offset : dword) : pointer; begin CRelocateThreadvar:=pthread_getspecific(tlskey)+Offset; end;
Hmm, I guess this dword should be replaced by a ptruint :) (although someone allocating more than 4GB of threadvars possibly deserves to crash ;)
So this might be an option. (But again, first I'd like to see which variables are actually affected and thus how big the effects would actually be. InOutRes might well be the worst, because most-often used one, but not the only one, of course.)
These are the threadvars in FPC's system unit:
ThreadVar ThreadID : TThreadID; { Standard In- and Output } ErrOutput, Output, Input, StdOut, StdErr : Text; InOutRes : Word; { Stack checking } StackBottom : Pointer; StackLength : SizeUInt;
There's a few more in TP-compatibility units (like doserror in the Dos unit and some crt things) and some for Delphi-style exception handling, but that's about it. Of course, users can also declare their own threadvars in their programs and units.
You can download our rtl via svn (http://www.freepascal.org/ develop.html#svn) and have a look how it's done. The support routines are in rtl/inc/threadvr.inc, rtl/unix/cthreads.pp (pthreads-based thread manager, in a separate unit because this is dependent on libc and most of our targets do not require/depend on libc by default) and rtl/win32systhrd.inc, rtl/emx/systhrd.inc, rtl/netware/systhrd.inc, rtl/os2/systhrd.inc (but the pthreads-based one can be used for all targets which have a libpthread, so I guess that's enough for GPC).
Since our RTL is under a slightly modified LGPL (allows static linking as long as you make the modifications to the FPC-RTL-licensed code available), license-wise I don't think there is any problem for you to reuse things. If there is, we could dual-license it under the regular LGPL as well I suppose (the main reason for the static linking exception is that some OS'es, like Dos, simply do not support dynamic linking, and support for creating dynamic libraries was not available for all OS'es in our compiler from the start either).
Jonas
_______________________________________________ MacPascal mailing list http://lists.sonic.net/mailman/listinfo/mac-pascal
Adriaan van Os wrote:
The threadvarlists of all compilation units are collected when compiling a main program or library (similar to how the initialisation routines of all used units are collected, I guess GPC also already contains some mechanism for this).
Actually, it doesn't. It used to do until several years ago. But since this method wasn't supported by all linkers (GNU ld does, sure, but e.g., IIRC the SGI linker didn't), so unit initialization wouldn't work there, we switched to calling initializers explicitly (automatically, of course) from the main program.
When the first thread is started (via BeginThread(), we don't detect if someone uses pthread_create or so), the thread manager walks the threadvartable and the referenced lists and fills in all indexes. This thread manager is fully pluggable, so everyone is free to use his own (e.g. if you implement some user space fibers or so).
Yes, this is something I'd like to see as well.
So this might be an option. (But again, first I'd like to see which variables are actually affected and thus how big the effects would actually be. InOutRes might well be the worst, because most-often used one, but not the only one, of course.)
These are the threadvars in FPC's system unit:
ThreadVar ThreadID : TThreadID; { Standard In- and Output } ErrOutput, Output, Input, StdOut, StdErr : Text; InOutRes : Word; { Stack checking } StackBottom : Pointer; StackLength : SizeUInt;
I guess we'd get a similar list in GPC. However, as I've said before, I'm skeptical about Input etc. Do we really want a per-thread Input, each with its own buffer (which means, if several of them are acutally used, it becomes hard to foretell in which one each sequence of input bytes ends up)? Similar for Output. For StdErr, one could argue that it shouldn't buffer at all (which GPC's in fact doesn't, and so does stderr in C by default AFAIK).
There's a few more in TP-compatibility units (like doserror in the Dos unit and some crt things)
Well, for CRT, just making the internal state per-thread wouldn't really help much AFAICS. E.g., if each threads has its own screen buffer, and updates the real screen from it, this would probably result in chaos. IOW, as there's only one physical screen (unless multi-headed), I think one probably needs some common data across threads. Or you leave it up to the user -- as, e.g., ncurses recommends doing all curses I/O within one thread.
Since our RTL is under a slightly modified LGPL (allows static linking as long as you make the modifications to the FPC-RTL-licensed code available), license-wise I don't think there is any problem for you to reuse things. If there is, we could dual-license it under the regular LGPL as well I suppose (the main reason for the static linking exception is that some OS'es, like Dos, simply do not support dynamic linking, and support for creating dynamic libraries was not available for all OS'es in our compiler from the start either).
BTW, standard LGPL allows static linking as well. You just have to distribute object files of the non-free parts, so users can relink. But I suppose for some reason that's not desired?
Frank