Hi,
On 8/4/10, Frank Heckenbach ih8mj@fjf.gnu.de wrote:
However, speed optimizations are hard, so my personal interest would just be to shrink it.
Actually understanding speed optimization on modern architectures is hard. Many things that were faster on older processors are now slower. Caching, superscalar execution, branch prediction, etc. It's really hard to evaluate even simple assembler code WRT performance
It's a mess, even for GCC, I know they're suffering trying to target so much. It's vastly different from GCC 2.7.2.3, when 486 was the best it could do (386 + alignment). And compile times suffer from that extra complexity. I just wish GCC -O0 was equally as fast, but it's not. :-(
FPU, MMX, 3dnow!, SSE, AVX ... which to support? I think most people would (probably incorrectly) say that FPU/MMX is deprecated. Gah, I hate modern computing sometimes, always complicating things, never making it easier.
Sure. BTW, AFAIK the GCC backend doesn't support any of these (yet?), don't know about LLVM.
-ftree-vectorize is supported, but I'm not sure how well it works overall. And GCC has always (I think?) assumed an FPU is present (real or emulated). AVX isn't out yet, I think, and is yet another ball of wax. (SSE is implemented even by AMD, but Intel never bothered with 3Dnow!, so that's less useful. But even my now-dead AMD laptop supported through SSE3.) I blindly assume GCC on AMD64 does something with SSE2, but who knows.
In short, not sure it's worth officially supporting any of this in a compiler. And yet this is the exact area where hand-written assembly is still direly needed. Personally I find it too complex (and boring), but it does speed up stuff sometimes.
So in this case you're on your own. An additional problem is, e.g. FPU and MMX are mutually exclusive (switching is expensive and destroys the state), so a compiler couldn't simply use both without coordination with other parts of the problem.
Yes, and SSE rectified that but required explicit OS support to FXSAVE everything.
So you want similar to "long long long int"?? Actually, GPC by default makes "longint" 64-bit! Which in rare cases can be confusing. ;-)
Why confusing? It's one power of two larger than "Integer".
Only confusing for extreme portability. I think FPC and VPC default to 32-bit for it. (And yes, I know about _BP_UNPORTABLE_TYPES_ or whatever.) In other words, my Befunge "benchmark" counts down from -1 to MAXINT, and it takes much longer (!) when that is 64-bit. ;-)