Hacker News new | past | comments | ask | show | jobs | submit login
JDK 7 Features (java.net)
61 points by puredanger on Sept 20, 2010 | hide | past | favorite | 71 comments



I wonder what's going on in that Java group at Oracle. How can it possibly take three more years or so to implement something like list and map literals or a data picker UI component?

The one and only change to the VM that I would like to see is not even planned: structured value types. This single issue is responsible for Java's massive memory usage. Half of that memory is used for completely unnecessary pointers and it will lead to C or C++ grabbing the title of 100 year language.


Memory usage is fairly moot, and will just become more and more moot.

When you're watching a 3d projected HD video with 5.1 audio on your smartphone, what's using the memory... A few pointers? Or the actual media? And in any event, we'll likely have a few TB of memory even on phones to play with in a few years.


You chose an interesting example. The movie can be played and processed quickly exactly because it has an explicit and probably hardware mapped in some places memory layout. It's processed with as few indirect operations as possible.

And this was exactly the complaint in the parent post - this is not possible to do with Java. If you have a list of Integers, you have a list of Integers. (please correct me if I'm mistaken here: ) so every integer needs an explicit pointer in the list itself + the overhead of boxing. It's not the pointer to the list that matters - it's the fact, that your list takes ~4 times as much memory (pointer, lock, tag, value) - or maybe more - I don't remember the object implementation details for JVM.

For scenarios where you operate on big tables of data, it matters, because it kills cache performance and uses up many times the amount of memory that you'd normally use.


That's exactly what I meant. In the case of int you can work around it using int[] but if you need more structure, like with a complex number type, or tuples, pairs, etc, the workarounds get really tedious. Basically what you can do is stuff it all into a large byte[] and deserialise on access. That's slow and kills productivity.


It's hardly a 'work around' :/

  int[][]
or

  reals int[]
  imaginaries int[]


Java's multidimensional arrays aren't stored contiguously in memory because rows can be different lengths. If you have an int[][] a, then a[1] is simply a pointer to an int[]. You can't escape the excess of pointers.

If you try something like having an array for each component as a substitute for an array of structs, then that places fairly strict upper bounds on the length of the arrays before which you are guaranteed to have killed cache locality.


int[] is fairly tight. If you're going for performance or optimum memory usage, using lists and boxing isn't going to be a good plan.

I once wrote an x86 emulator in java and later an ARM emulator (long story). It ran fast enough :)


So what is a good plan to store a large in memory list of

  class Amount {
    String currencySymbol;
    int amount;
  }

?


Immutability would allow you to re-use Amount instances; this is what the Integer class does for "common" int values (I believe -127 thru 128).


True but it doesn't solve my problem which is the extra pointer I need to hold the Amount object.


Depends on the value of "large". But two main points would be:

1. Intern the strings.

2. Store only indexes and operate on a "main list" of currencySymbols[i] and amount[i]. (to reduce the overhead of boxed ints)

Of course this is not always possible... Every scenario has its own solution, I guess.


String interning is a side issue here. But what you suggest is basically to destructure all structured types into lots of individual arrays. That's a technique I'm using frequently. But it's a very tedious and error prone thing to do once the data structures get a little more complex. But I still agree that it's sometimes a workable solution. It's not a workable solution if Amount objects are part of other structures.

Consider this:

class Point { int x; int y; }

class Rect { Point a; Point b; }

List<Rect> list ...


Don't use int for money. Use BigDecimal.

I know performance is an issue, but you can always find non-hackish solutions: like a memory upgrade from management, because you're writing safe, clean code :-)


BigDecimal uses 10 times as much memory as an int but I didn't intend to properly model currency amounts anyway.



Any sane developer will track money in lowest units - eg cents. Rather than Dollars.


A sane developer is as rare as a sufficiently smart compiler.


use an int to represent the currencySymbol as well. If you need to resolve it, use a lookup table mapping ints->symbols.


How does that help? I still have to keep that extra pointer in memory for each Amount object.


You'll always need a pointer to the objects, no matter the language. But you'll save on having to store a pointer from those objects to a string object, which with 2 unicode characters (assuming Java strings are null terminated) saves you a total of (I believe Java still uses 64-bit pointers no?), 8bytes for the pointer to the string, 16 for the string, 24 bytes per object minus the int which is 4 bytes so 20 bytes per object. Assuming these are transaction objects, that's a crap-ton if you have lots of transactions.


Let's leave the string issue aside. It has nothing to do with the point I was trying to make. Consider the example I gave in another reply:

  class Point { int x; int y; } 
  class Rect { Point a; Point b; }
  List<Rect> list ...
  
In Java, this list of Rect objects contains three pointers that a functionally equivalent list in C++, C# or Go would not have to hold.


Are you talking about pointers to objects? Are there any languages that allow you to have objects that aren't pointed to? Maybe C-style structs? Is that what you are talking about? Complaining about pointers to objects is like complaining about the wasted byte at the end of a null-terminated string.

There's a general move away from structs for some reason. My guess is that structs do have a problem with platform independence if you rely too much on the primitive types being consistent across platforms.

An int being a 4-byte chunk of memory isn't always true (actually, if memory serves, an int was supposed to be standardized as the word length of the local system, so it could be all kinds of oddball lengths, like 24-bits).

So my guess is that most modern languages looked at this problem and decided that cross-platform compatibility is more important. But to be honest, with a virtualized run-time, like the JVM, you can define the primitive types as a standardized abstraction on-top of the runtime system...so in that sense it doesn't really make all the much sense to not have them, except maybe to adhere to principles of data hiding or some such.

However, strictly speaking, the code you provided would probably be a similar pointerfest in C++ or C# since you aren't using structs, I don't know enough about Go to reply intelligently about that language.


The problem is not that the pointers waste space. The problem is that they introduce irregular memory layout. If you want to, say, offload your data to a GPU, then having a structure memory layout is a huge benefit. But you'll kill performance if you have to chase pointers to first get the data in a format the GPU can handle. Other optimizations - such as SIMD on conventional processors - can apply.

And, for the record, C++ would not use pointers. In C++, if you place one object inside the definition of another, the whole object will be there, not a pointer. You control memory layout in C++. And if your objects are plain old data (POD), then it behaves just as C does. A C++ class that does not have any virtual members is laid out like a struct.


>The problem is that they introduce irregular memory layout.

Good point. I thought that the original thread was about pointers eating up space.

My understanding is that C++ class definitions, because they themselves may contain function references...even just constructors/destructors and access methods, would represent the object with a pointer on the call stack to the location of the object (and all its data + method pointers) on the heap (which like you said could end up anywhere resulting in an irregular memory layout). Otherwise, suppose a class is defined as a collection of objects (which are defined as collections of objects on down etc.) some of which may of arbitrary length...ergo you'd never know how much memory to allocate a priori to hold these irregular data structured. Far easier to just allocate a bunch of word length pointers pointing to whatever random blobs of address space the OS gives back on malloc requests.

But yeah, if it's just POD then one would assume an easy optimization the compiler could make would be to just create the objects as contiguous blocks of object-sized memory. I'm all rusty on some of this stuff (last time I seriously used C++ Borland was still a major player in the compiler business and templates were highly experimental) so I'm sure I'm quite out of date these days.


To be fair, wasted space was a point many people were making. But it's not the only consideration to make.

Anyway, most C++ compilers will turn the following:

  obj.method(arg);
Into:

  method(&obj, arg);
Conceptually, if not literally. One of the design principles of C++, as stated by Stroustrup, is that you don't pay for what you don't use. So if your classes have no virtual functions and are not a part of an inheritance hierarchy, they should be just plain ol' data. C++ compilers will generate different implementations for different kinds of classes. When virtual members are involved, a C++ object is usually more than just plain ol' data.

But, even in such objects, consider:

  class Circle: public Shape {
    virtual void draw();
    Point coords;
  };
In this case, a Circle object will probably not be just POD because it will need to resolve draw() at runtime. But, a Point object will live somewhere in a Circle object - not just a pointer to a Point object. If you wanted that, then you would say:

  class Circle: public Shape {
    virtual void draw();
    Point* coords;
  };
And, of course, you would be responsible for managing the dynamic memory for the coords member.


I'm struggling for the right terms a little bit because I honestly don't know the correct programming language independent computer science term for what I call structured value types. What I mean is simply this:

  C++:
  vector<Rect> v;
  for(int i = 0; i < 1000000; ++i)
    v.push_back(Rect(Point(i, i), Point(i + 1, i + 1)));

  Java:
  List<Rect> v = new ArrayList<Rect>();
  for(int i = 0; i < 1000000; ++i)
    v.add(new Rect(new Point(i, i), new Point(i + 1, i + 1)));
On a 64 bit machine, the C++ vector will use 16MB of RAM whilst the Java List will use around 48MB of RAM. Also, the C++ Rects will be tight, that is located next to each other in memory and can be iterated over very cache efficiently. The Java Rects will be all over the place. Iterating over them might cause a very large number of cache misses.

[Edit] Actually, I belive the Java list will take even more space because I forgot the per object overhead for the Point objects, which is 8 bytes each (in addition to the pointers). So the Java list would take 64MB if my ad hoc calculation is right.


Yeah, I think scott_s is right, the issue is that Java just dumps pointers on the call stack to objects on the heap. And heap objects can end up all over the place in all kinds of non-optimal locations, like out of cache, or in different pages of virtual memory.

There's not really any such thing as just a block of a data allocated for however many bytes the object needs dumped on the stack...like good old C structs AFAIK.

It's one of the problems of languages that don't let you do your own memory management.


It's not a consequence of automatic memory management. Both C# and Go have automatic memory management but their memory usage would be roughly comparable to C++ in this scenario.


This is sometimes referred to as POD: Plain Old Data.


I think fauigerzigerk might be referring to "value types" in C# (http://msdn.microsoft.com/en-us/library/s1ax56ch.aspx), so structs basically.


String[] currencySymbols; int[] amounts;

?


Yes, but now there is not language-enforced relationship between the two. That currencySymbols[i] and amounts[i] are, in the mind of the programmer, two fields in the same object is no longer expressed in the language itself.


Memory usage will never be moot as long as you have a very large cost difference between fast storage and slow mass storage. Absolute sizes don't tell you anything without comparing them to the amount of data that you want to process and index.

You have chosen an example where streaming is possible. But sequential access is not what you want in databases, search engines, many forms of data analysis.


This means that Oracle chose Mark Reinhold's plan B (http://blogs.sun.com/mr/entry/rethinking_jdk7).

From the point of view of a web application developer, JDK 7 is not very interesting. JDK 7 will contain a few language updates like switch-cases based on strings. No major changes or features that would radically change a web app developer's coding routines. What do you think?


As a web application developer, on Java, I think there is a lot to look forward too.

Mostly in JSR 292.

I see a point where I'm running the best of ruby/python/groovy as core parts of a Java deployment environment.

Edit: Not that you can't now, but now this will bring very good reasons to run a hybrid environment combining the best of both systems.


Unfortunately JSR 292 will not deliver as you'd thing.

Some people don't have such a good impression of it, e.g. Rich Hickey (Clojure): http://clojure-log.n01se.net/date/2010-06-01.html

     method handles are just going to be an entirely new thing hotspot et al 
     are going to have to be taught to optimize. They are already awesome at 
     optimizing ordinary class method dispatch
     bobo_: invokedynamic is not needed for removing reflection
     ....
     it is just a different way of doing call site caches, which you can already 
     do today with classes and methods
What JSR 292 does is to simplify the work a compiler architect needs to do, since implementing call-site caches is hard work, not to mention memory management since interpreters like JRuby are generating lots of classes that are entering PermGen ... so stuff like java.dyn.AnonymousClassLoader may provide some relief.

But on the other hand, compilers will still need to support Java pre-7, since adoption in the enterprise is really slow (many companies are still on version 1.4)

There is a backport for invokedynamic though, but it remains to be seen if it's any good: http://code.google.com/p/jvm-language-runtime/

Notably missing features with a lot more bang than InvokeDynamic:

      tail-call optimization
      fixnums
      coroutines
All in all, Java 7 is a monumental failure and Oracle is waisting resources on merging JRockit with Sun's JVM, instead of saving it.

Say what you want about .NET, but their releases have been coherent with each one adding value. If only Microsoft's management would see this as an opportunity and open up .NET a bit, probably not.


Everyone could implement their own solution to call-site caching (and they do), but to push it down into the JVM for a single clear defined way of doing it, grants much more to optimization efforts. What benefits one system, should benefit them all.

There will be a schism between libraries that move to Java 7 only, and support their own call-cache mechanism, but I doubt any of the established projects would ditch their existing solutions, it would most likely only effect new libraries.

If an enterprise is busy being tied up in Java 1.4, and missing out on the drastic speed and memory improvements in hotspot 1.6 line, then I doubt that adding dynamic languages is a priority. JRuby for example requires 1.6 (from the literature I've seen and personal experience, but they don't explicitly say so on their site)

I also disagree generally about how Oracle is dealing with Java 7, but I am willing to see what the end result is.

Microsoft has been heavily involved in the IronPython/IronRuby and Mono efforts. They do see the power in .Net cross system.


> Microsoft has been heavily involved in the IronPython/IronRuby and Mono efforts. They do see the power in .Net cross system.

Mono is kind of behind ... I have high hopes for version 2.8/3.0 with the new garbage-collector, but the prelease crashed with a segfault which kind of turned me off for now, I'll retry it when the final version is released.

What bothers me about it is that currently memory-management sucks, some bugs remain unfixed (tail calls don't work properly, hence F# is unusable, and AFAIK fixing that bug requires some major changes), and there are not many Linux-specific APIs for server-side stuff ... like the async I/O apis are there only for compatibility, but don't work properly.

I also kind of expected Mono to be more than a .NET clone, and yet there is no alternative to the ASP.NET, which is heavy and is made with .NET's constraints in mind. On Mono it leaks memory for instance.

They could do more for Mono than they already did ... like have 1 or 2 experienced engineers help them with the garbage-collector / or grant them the IP to learn from / use the garbage collector from .NET.

Also, the C# ECMA standard is 2 versions behind, what's up with that?


> If only Microsoft's management would see this as an opportunity and open up .NET a bit, probably not.

What exactly would you like to open up?


Features like invokeDynamic don't impact Java developers much day-to-day, but they should make JVM languages like JRuby and Scala faster.


I'm not too familiar with Scala, but I guess you mean when you use structural types?


The main motivation for adding closures was not having to include a big pile of boilerplate interfaces coming from the jsr 166 fork/join framework. Now fork/join is in JDK7, but closures wont make it till JDK8, which means we'll be stuck with the crappier API forever.

In my opinion, JDK7 wont have enough incentive for corporations to upgrade Remember how long it took for JDK6 to be adopted? So why release a JDK version if the majority of people wont use it? Why not spend a year or more on building new cohesively designed APIs with closures in mind?


Actually, that's not accurate. The motivation for adding closures was to avoid adding those additional interfaces for the ParallelArray abstraction that goes over fork/join. Fork/join has always been planned to go in for JDK 7. AFAIK, ParallelArray will not go into the new JDK 7 plan so the API stuff you mention is not an issue.


Oh, thanks for clarifying that, I thought ParallelArray was part of it.


I like Scala, but I'm still surprised a simpler "better Java" alternative hasn't taken off like:

http://code.google.com/p/stab-language/ (C# syntax compiled directly to the JVM)

http://boo.codehaus.org/ (Python syntax, normally for .NET, but there is a pure JVM fork in there somewhere)

The barrier to adoption these days is a robust Eclipse IDE--if either of these projects had one, I think they'd see a lot of adoption.


I think Fan might be the simpler better Java - but like you say I dont think it taken off. http://fantom.org/


As a java developer who's moved over to ruby the JDK 7 feature list was starting to make me think it wouldn't be such a painful language to work with. Just 3 features would have made me think about working with java again. Type inference, closures and collection literals. Anything else they're working on for the java language is wasted time. They've never been great at writing APIs, they should stick to improving the language and letting the open source community drive the APIs forward.


No closures :-(


They're still coming, but have been put off until JDK 8, which is scheduled for late 2012 (about 6 months later than the original JDK 7 scheduled date). By the fast-moving standards of the software world this does make them a long way off, though...


Read carefully:

"Deferred to JDK 8 or later"

In the Java world, this is a good way of saying never ...


Especially when you are writing Java code to run on client machines. There it's not even safe to rely on 1.4 features yet.

So I guess I'll be able to use closures the moment I can stop trying to make my web pages work in IE 6: in the yeat 2040


That's maybe the one most important problem arising with Java if you are using it in an environment you do not know or not know well (as it is, on client machines - especially using Java on the web). As of it, using Java 7 features in Web pages - no way, maybe no way ever if you do not want to exclude a big mass of people out there on the market.


People still write client side Java? shudder I love Java, but good god would I never write an Applet again.


You do realise that 'client side' includes people's own machines, right? You don't need to do applets for those.

Besides, applets are way way better now than they used to be.


Java has closures.... anonymous inner types


Keep using Scala ;-)


Your write Java the Langauge should just die! The should work on the VM/GC and JIT. Programmers should work with Clojure or Scala.


It might be just sybtactical sugar, but I would really like to see some sort of annotation for auto generating setters and getters at compile. Eclipse makes it pretty easy to make these, but it sure would clean up class clutter.


Why bother? All Java IDEs do that with a keystroke.


There are a lot of great steps forward for the Java language in 7. My only question is when is Oracle going to release it so it can start being adopted?


TLS 1.2 = Transport Layer Security: http://tools.ietf.org/html/rfc5246


Java is everywhere. Berkeley CS162 course uses Java language to teach OS Design and Internals - they wrote a pseudo-code examples in Java, a-la SaveRegisters(). It is a madness, Isn't it? ^_^


Why madness? It has a clear, logical, consistent and easy to understand syntax.


Really? Easier than C syntax, which is a more appropriate in this context, and on which Java's syntax is based upon?


Way easier than C syntax. Even though I have long mastered various assembly languages and higher level languages, C pointers and their arithmetic still confuses me.

There's nothing worse than something like bar = *[&foo + 88] or whatever it is.


> Really? Easier than C syntax

Significantly easier. Think about all those pointers that simply aren't there now.

> which is a more appropriate in this context

This is a value judgement.

The generally nicer semantics (automatic memory management! proper strings!) are also worth a lot.


I'm a TA for an OS course that is closely following the Berkeley CS162 course, down to using Java for assignments. We've had much less trouble than with the C++ we used to use.


Why not a plain C which is an obvious choice? I think I know the answer - "because of a buzz". ^_^

And of course, using a VM-hosted language to describe things which are on two levels of abstraction lower is a little bit funny.


> Why not a plain C which is an obvious choice?

Because manual memory management is a PITA.

> And of course, using a VM-hosted language to describe things which are on two levels of abstraction lower is a little bit funny.

Why? As long as you're clear that it's a pedagogical tool, it's perfectly fine. And there are incubation OSes (e.g. Singularity) that are almost entirely managed.


We're talking about simplicity and clarity of language's syntax? Why memory management?

btw, to whom memory management is a PITA?


> We're talking about simplicity and clarity of language's syntax? Why memory management?

Both the syntax and the semantics are nicer.

> btw, to whom memory management is a PITA?

To anyone who tries to do it. That automatic memory management is easier to program with than manual memory management is self-evident.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: