the error is returned by the linker rather than compiler itself. It is not a bug, just size limitation of the default memory model. Linux x86_64 provides `large model' -- as pointed out by VJo, but it is not supported by GCC before 4.6, http://stackoverflow.com/questions/6296837/gcc-compile-error...
What a great technical problem, except for this bit:
> Btw, I didn't try to hide behind "this is scientific computing -- no way to optimize". It's just that the basis for this code is something that comes out of a "black box" where I have no real access to
A research assistantship I held was based on classified data; all of the published work had to be approved by the DoD and the actual data we used wasn't allowed to be published which made our results entirely unreproducable.
I'm certain the methods being used here are just wrong. It looks like it's code generated by a symbolic algebra system. Chances are that using a more scalable methodology would eliminate the code size problem. For example, symbolic manipulations often have exponential space complexity in the depth of the expression tree, but automatic differentiation can do the required operation in polynomial time and space (often linear or log-linear).
Unfortunately, "no way to optimize" is basically a statement that the questioner is not interested in an algorithmic way to make the problem go away.
It seems you don't understand the context of research projects. When you're assigned to write a program that processes data x coming group another group, it's valid to say that that input data is a 'black box' to you. That still doesn't mean that the whole research is 'black box' or somehow not 'real science'.
Science is a blackbox. We make theories about the world we live in based upon the results of experiments. Richard Feynman commented method of thinking about science on one of his videos.
How? You handed me a black box. You wrote some code, and you fed it some data, and you made some prediction, and all I really have is your word for it that your prediction worked, but I sure can't do anything with it. I don't know what it is or how it works, how to extend it, how to build on it, anything.
That's not science by any useful definition of the term.
Generally not. My last project involved building a framework in which scientists could run calculations for certain types of risk. A prototype built by some of said scientists involved code like seen in the SO question. After looking at the types of calculations being done, we figured out that most of the equations that were used were similar and could be generalized. We also moved all the coefficients and parameters for all the equations into configuration files. I suspect that a thorough evaluation of this code would reveal something similar as in my case.
From some of the comments, it's suggested that the problem is too much code, and that the data (of which there is a lot, granted) isn't contributing to the problem. I'm not sure if that's the case though.
If the data is part of the problem, I wonder if he can write a new linker script to rearrange the sections in the file so all code is below the signed 32-bit (~2GB) boundary. Though that raises the question... will it be able to address the data? Does initialized data access use 32- or 64-bit offsets in the small/medium models?
At any rate, it seems gcc 4.6 supports the x86_64 "large model", which should solve the problem without code/data changes.
It has very good double precision floating point, and it looks like SSE/SIMD were not used directly from the code.
LuaJIT should garbage collect unused code, if not it always regenerates it. It should be able to compile and execute tons of code as he has (and it looks like all his stuff is generated, but for C++).
Last I checked, the LuaJIT allocator on AMD64 platforms uses only a small part of the address space for the Lua heap, partly so that more efficient type-punned representations can be used internally. I don't remember what the limit is exactly, but it's only a few GB (and beyond that the GC starts having trouble anyway). I don't know whether this applies to the machine-code JIT output, or to external cdata arrays, but it's something to watch out for here.
the error is returned by the linker rather than compiler itself. It is not a bug, just size limitation of the default memory model. Linux x86_64 provides `large model' -- as pointed out by VJo, but it is not supported by GCC before 4.6, http://stackoverflow.com/questions/6296837/gcc-compile-error...