You'll always need a pointer to the objects, no matter the language. But you'll ...

fauigerzigerk · on Sept 20, 2010

Let's leave the string issue aside. It has nothing to do with the point I was trying to make. Consider the example I gave in another reply:

  class Point { int x; int y; } 
  class Rect { Point a; Point b; }
  List<Rect> list ...

In Java, this list of Rect objects contains three pointers that a functionally equivalent list in C++, C# or Go would not have to hold.

elblanco · on Sept 20, 2010

Are you talking about pointers to objects? Are there any languages that allow you to have objects that aren't pointed to? Maybe C-style structs? Is that what you are talking about? Complaining about pointers to objects is like complaining about the wasted byte at the end of a null-terminated string.

There's a general move away from structs for some reason. My guess is that structs do have a problem with platform independence if you rely too much on the primitive types being consistent across platforms.

An int being a 4-byte chunk of memory isn't always true (actually, if memory serves, an int was supposed to be standardized as the word length of the local system, so it could be all kinds of oddball lengths, like 24-bits).

So my guess is that most modern languages looked at this problem and decided that cross-platform compatibility is more important. But to be honest, with a virtualized run-time, like the JVM, you can define the primitive types as a standardized abstraction on-top of the runtime system...so in that sense it doesn't really make all the much sense to not have them, except maybe to adhere to principles of data hiding or some such.

However, strictly speaking, the code you provided would probably be a similar pointerfest in C++ or C# since you aren't using structs, I don't know enough about Go to reply intelligently about that language.

scott_s · on Sept 20, 2010

The problem is not that the pointers waste space. The problem is that they introduce irregular memory layout. If you want to, say, offload your data to a GPU, then having a structure memory layout is a huge benefit. But you'll kill performance if you have to chase pointers to first get the data in a format the GPU can handle. Other optimizations - such as SIMD on conventional processors - can apply.

And, for the record, C++ would not use pointers. In C++, if you place one object inside the definition of another, the whole object will be there, not a pointer. You control memory layout in C++. And if your objects are plain old data (POD), then it behaves just as C does. A C++ class that does not have any virtual members is laid out like a struct.

elblanco · on Sept 20, 2010

>The problem is that they introduce irregular memory layout.

Good point. I thought that the original thread was about pointers eating up space.

My understanding is that C++ class definitions, because they themselves may contain function references...even just constructors/destructors and access methods, would represent the object with a pointer on the call stack to the location of the object (and all its data + method pointers) on the heap (which like you said could end up anywhere resulting in an irregular memory layout). Otherwise, suppose a class is defined as a collection of objects (which are defined as collections of objects on down etc.) some of which may of arbitrary length...ergo you'd never know how much memory to allocate a priori to hold these irregular data structured. Far easier to just allocate a bunch of word length pointers pointing to whatever random blobs of address space the OS gives back on malloc requests.

But yeah, if it's just POD then one would assume an easy optimization the compiler could make would be to just create the objects as contiguous blocks of object-sized memory. I'm all rusty on some of this stuff (last time I seriously used C++ Borland was still a major player in the compiler business and templates were highly experimental) so I'm sure I'm quite out of date these days.

scott_s · on Sept 20, 2010

To be fair, wasted space was a point many people were making. But it's not the only consideration to make.

Anyway, most C++ compilers will turn the following:

  obj.method(arg);

Into:

  method(&obj, arg);

Conceptually, if not literally. One of the design principles of C++, as stated by Stroustrup, is that you don't pay for what you don't use. So if your classes have no virtual functions and are not a part of an inheritance hierarchy, they should be just plain ol' data. C++ compilers will generate different implementations for different kinds of classes. When virtual members are involved, a C++ object is usually more than just plain ol' data.

But, even in such objects, consider:

  class Circle: public Shape {
    virtual void draw();
    Point coords;
  };

In this case, a Circle object will probably not be just POD because it will need to resolve draw() at runtime. But, a Point object will live somewhere in a Circle object - not just a pointer to a Point object. If you wanted that, then you would say:

  class Circle: public Shape {
    virtual void draw();
    Point* coords;
  };

And, of course, you would be responsible for managing the dynamic memory for the coords member.

fauigerzigerk · on Sept 20, 2010

I'm struggling for the right terms a little bit because I honestly don't know the correct programming language independent computer science term for what I call structured value types. What I mean is simply this:

  C++:
  vector<Rect> v;
  for(int i = 0; i < 1000000; ++i)
    v.push_back(Rect(Point(i, i), Point(i + 1, i + 1)));

  Java:
  List<Rect> v = new ArrayList<Rect>();
  for(int i = 0; i < 1000000; ++i)
    v.add(new Rect(new Point(i, i), new Point(i + 1, i + 1)));

On a 64 bit machine, the C++ vector will use 16MB of RAM whilst the Java List will use around 48MB of RAM. Also, the C++ Rects will be tight, that is located next to each other in memory and can be iterated over very cache efficiently. The Java Rects will be all over the place. Iterating over them might cause a very large number of cache misses.

[Edit] Actually, I belive the Java list will take even more space because I forgot the per object overhead for the Point objects, which is 8 bytes each (in addition to the pointers). So the Java list would take 64MB if my ad hoc calculation is right.

elblanco · on Sept 20, 2010

Yeah, I think scott_s is right, the issue is that Java just dumps pointers on the call stack to objects on the heap. And heap objects can end up all over the place in all kinds of non-optimal locations, like out of cache, or in different pages of virtual memory.

There's not really any such thing as just a block of a data allocated for however many bytes the object needs dumped on the stack...like good old C structs AFAIK.

It's one of the problems of languages that don't let you do your own memory management.

fauigerzigerk · on Sept 20, 2010

It's not a consequence of automatic memory management. Both C# and Go have automatic memory management but their memory usage would be roughly comparable to C++ in this scenario.

scott_s · on Sept 20, 2010

This is sometimes referred to as POD: Plain Old Data.

brown9-2 · on Sept 20, 2010

I think fauigerzigerk might be referring to "value types" in C# (http://msdn.microsoft.com/en-us/library/s1ax56ch.aspx), so structs basically.