Hacker News new | past | comments | ask | show | jobs | submit login
Gcc compile error with 2GB of data (stackoverflow.com)
74 points by p4bl0 on June 9, 2011 | hide | past | favorite | 21 comments



tl;dr:

the error is returned by the linker rather than compiler itself. It is not a bug, just size limitation of the default memory model. Linux x86_64 provides `large model' -- as pointed out by VJo, but it is not supported by GCC before 4.6, http://stackoverflow.com/questions/6296837/gcc-compile-error...


What a great technical problem, except for this bit:

> Btw, I didn't try to hide behind "this is scientific computing -- no way to optimize". It's just that the basis for this code is something that comes out of a "black box" where I have no real access to

Black box != science.


A research assistantship I held was based on classified data; all of the published work had to be approved by the DoD and the actual data we used wasn't allowed to be published which made our results entirely unreproducable.


I'm certain the methods being used here are just wrong. It looks like it's code generated by a symbolic algebra system. Chances are that using a more scalable methodology would eliminate the code size problem. For example, symbolic manipulations often have exponential space complexity in the depth of the expression tree, but automatic differentiation can do the required operation in polynomial time and space (often linear or log-linear).

Unfortunately, "no way to optimize" is basically a statement that the questioner is not interested in an algorithmic way to make the problem go away.


It seems you don't understand the context of research projects. When you're assigned to write a program that processes data x coming group another group, it's valid to say that that input data is a 'black box' to you. That still doesn't mean that the whole research is 'black box' or somehow not 'real science'.


I wish that was true.


Science is a blackbox. We make theories about the world we live in based upon the results of experiments. Richard Feynman commented method of thinking about science on one of his videos.

http://www.youtube.com/watch?v=o1dgrvlWML4


Science studies a black box. The science itself shouldn't be a black box. What is the added value of a black box representation of a black box?


> What is the added value of a black box representation of a black box?

The ability to make predictions.


How? You handed me a black box. You wrote some code, and you fed it some data, and you made some prediction, and all I really have is your word for it that your prediction worked, but I sure can't do anything with it. I don't know what it is or how it works, how to extend it, how to build on it, anything.

That's not science by any useful definition of the term.


wht do you call a machine so complicated that we do not know its source code?

Its called the human body and the black box is known as Medical Science..:)


We know its source code. We just don’t understand it.

http://www.ornl.gov/sci/techresources/Human_Genome/home.shtm...


Is including all that data in the object code really necessary?


Generally not. My last project involved building a framework in which scientists could run calculations for certain types of risk. A prototype built by some of said scientists involved code like seen in the SO question. After looking at the types of calculations being done, we figured out that most of the equations that were used were similar and could be generalized. We also moved all the coefficients and parameters for all the equations into configuration files. I suspect that a thorough evaluation of this code would reveal something similar as in my case.


From some of the comments, it's suggested that the problem is too much code, and that the data (of which there is a lot, granted) isn't contributing to the problem. I'm not sure if that's the case though.

If the data is part of the problem, I wonder if he can write a new linker script to rearrange the sections in the file so all code is below the signed 32-bit (~2GB) boundary. Though that raises the question... will it be able to address the data? Does initialized data access use 32- or 64-bit offsets in the small/medium models?

At any rate, it seems gcc 4.6 supports the x86_64 "large model", which should solve the problem without code/data changes.


Absolutely not. This appears to be generated code from a data set to avoid what most developers would do at runtime.


Those expressions look a lot like an alternating series to me. You should be able to generate an expression to produce them fairly easily.


This sorta reminds me of the silly things people do with JavaScript.


Such as... an MP3 decoder? https://github.com/nddrylliog/jsmad


I suggested to try and use LuaJIT.

It has very good double precision floating point, and it looks like SSE/SIMD were not used directly from the code.

LuaJIT should garbage collect unused code, if not it always regenerates it. It should be able to compile and execute tons of code as he has (and it looks like all his stuff is generated, but for C++).


Last I checked, the LuaJIT allocator on AMD64 platforms uses only a small part of the address space for the Lua heap, partly so that more efficient type-punned representations can be used internally. I don't remember what the limit is exactly, but it's only a few GB (and beyond that the GC starts having trouble anyway). I don't know whether this applies to the machine-code JIT output, or to external cdata arrays, but it's something to watch out for here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: