Eclipse Vert.x goes Native

simias · on June 5, 2018

This is impressive, although the cynic that I am can't help pointing out that we must have gone wrong somewhere when a "hello world" HTTP server application "only" taking 40MB of RES is seen as an improvement. That's more than 40 million bytes to open a socket, receive a few bytes, parse them and then write a few bytes back. The overhead of modern software development is insane.

Of course I'm not not being entirely honest, a lot of this overhead is probably a constant and wouldn't grow linearly for more complex applications.

pavlov · on June 5, 2018

Every application comes with an entire operating system's worth of code nowadays.

Download a desktop chat client or a little menu bar utility — each app includes its own copy of Electron at 150MB.

Make a "Hello world" web client app using React with the recommended "create-react-app" toolchain — your project will first download nearly 200MB of code via npm (and each project you start will have its own copy of this stuff).

Make a web server with a framework like Rails, Vert.x or whatever — be prepared to spend hours downloading dependencies and setting up your local environment just so.

I don't have any good ideas for how to break this cycle. By 2028, I expect every to-do app will include its own 4GB copy of Ubuntu because it's convenient.

koolba · on June 5, 2018

> I don't have any good ideas for how to break this cycle. By 2028, I expect every to-do app will include its own 4GB copy of Ubuntu because it's convenient.

2028? More like 2015. It’s already like that for much of what runs in Docker containers.

jrs95 · on June 5, 2018

Is it? Most people I’ve seen working with Docker have tried their best to use Alpine for everything. Even something with a lot of OS dependencies tends to end up at around 200MB on the high end.

spacemanmatt · on June 5, 2018

my nginx image is around 16M until I add python (certbot)

gnur · on June 5, 2018

Kind of, except that the base ubuntu image is more like 70MB.

pavlov · on June 5, 2018

I was thinking of the Ubuntu desktop. Each desktop utility could be its own virtual machine, and it wouldn't be so far removed from what already happens with Electron.

homarp · on June 5, 2018

see https://www.qubes-os.org/ and https://www.qubes-os.org/news/2018/01/22/qubes-air/

IanCal · on June 5, 2018

And if things are using the same base, you don't have multiple copies but only one copy of the base image.

specialist · on June 5, 2018

I was an early critic (technorati objector) of the app server movement.

Then I needed to have a UI to monitor & manage a bunch of independent loosely coupled processes. Think poor man's workflow engine. Imagine my chagrin when I knowingly, faithfully recreated the Windows Task Manager for my stack.

I was very late to the Unix world. But I really think the Unix way is something akin to a golden law of the universe. Which each new cohort needs to rediscover for itself.

BenoitP · on June 5, 2018

> and each project you start will have its own copy of this stuff

I don't get why npm can't setup a content addressed shared repository, something à la m2/maven. This seems like such a waste of space.

I'm not up to speed with filesystems, but IMHO content-addressing by default should be a standard feature as well.

bradleyjg · on June 5, 2018

Every cell in your body includes gigabytes worth of source code for the entire organism. Maybe this kind of redundancy isn’t such a bad idea? The industry has tried shared library schemes many many times and it always turns out to be a mess.

kbenson · on June 5, 2018

I think there was a shift lost in the metaphor. Most people don't care if they have hundreds of copies of distros scattered around their humongous disks, but they might care if all those distros are running and actively using RAM and CPU cycles.

In the same vein, I'm pretty sure that while all those cells contain a copy of DNA, they don't all execute the whole copy all the time.

theyinwhy · on June 7, 2018

Deduping in ram anyone?

https://theithollow.com/2012/12/17/memory-de-duplication-in-...

pavlov · on June 5, 2018

There's over 10^13 cells in the human body. If my computer had enough memory to fit 10 trillion copies of Electron, I wouldn't be complaining.

krylon · on June 5, 2018

> If my computer had enough memory to fit 10 trillion copies of Electron, I wouldn't be complaining.

You would, once you saw the power bill. ;-) SCNR

acemarke · on June 5, 2018

The set of dependencies that comes with CRA (and other similar tools) includes:

- A compiler

- A bundler/linker

- An optimizing minifier

- A linter

- A development server with live reloading

- A test runner

All of those are isolated and scoped to that one project, and they are all build-time dependencies only. The build output you actually deploy is just static JS/HTML/CSS files, and developers are encouraged to keep those at a few hundred KB tops.

It's also important to understand that Javascript packages are effectively distributed as source, which affects the number of files on disk. (Granted, many NPM packages do include unnecessary files in the published artifacts, but Javascript itself is a major factor there.)

Meanwhile, XCode is supposedly something like 8GB installed, Visual Studio is multiple gigs, and if you were to look at the actual file size on disk of any C++ compiler toolchain, that would be a minimum of several dozen megs - those are just usually preinstalled on Linux or Mac systems.

So, context is pretty important here. A couple hundred MB for a complete JS build toolchain is perfectly fine :)

fest · on June 5, 2018

I've seen embedded device which runs on yocto/poky flavour of Linux, and the main app chroots into Ubuntu 14.04 for no apparent reason.

throwaway2048 · on June 5, 2018

you wish, more like each subcomponent of ubuntu will contain its own full operating system enviroment itself aswell.

pron · on June 5, 2018

This is actually very, very sane. I've been a professional developer for 25 years, and in the old days we used to -- or, rather, had to -- make use of a lot of crazy hacks to get what we wanted, in terms of performance and RAM footprint, from slow, RAM-limited machines. This resulted both in brittle code (which wasn't so bad because the programs had to be small) and in increased development time. These days, you can increase the performance of your code by writing well-optimized code, using a better compiler (but there's a limit to that), getting more CPUs (but there's a limit to that, too, due terms of hardware support, programming style and Amdahl's law) and getting more RAM. By far the most expensive option is writing hand-optimized code, and the cheapest, least limited resource (except for special cases of constrained environments) is RAM. Every MB of RAM spent on making your code faster while still keeping development costs low is a MB well-spent. Of course, nothing here should be taken to the extreme.

specialist · on June 5, 2018

I'm really struggling with this.

Most of my professional career, I've optimized my time. My novel strategy has always been "do less work" (aka a month of coding can save a week of design).

80% of a project is maintenance, right? Mitigating that means clear code, good docs, safety net of tests, etc.

Is that 80% still true? I've heard, but cannot yet validate, that code now only lives 3 years.

I just can't believe how much code we're all creating.

Over the weekend, I rewrote some incomprehensible goo which took months to create. A module all our services use. A real pain point.

My rewrite is 1/6th the size.

Was the rewrite worth it?

I honestly can't say.

BenoitP · on June 5, 2018

> I've heard, but cannot yet validate, that code now only lives 3 years.

That would be the rate at which the Hello World's and the front-end pieces get replaced. Back-end or entreprise code is very different in my experience.

Most enterprise internal apps actually live for much much longer that that. These are seen as cost centers and never get the necessary cash to be properly refactored. They are usually riddled with copious amounts of lava flow anti-patterns [1]; and outlive several times over the users, the developers, the management, and the founders/CEOs. If the hardware dies, they get put as-is on a VM that emulates the old hardware. Development is incremental and grinds slowly down, tending to the asymptotic ratio of changing business needs w.r.t. the amount of archeology required.

[1] https://en.wikipedia.org/wiki/Lava_flow_(programming)

mpweiher · on June 5, 2018

I can see how you can make that argument, it makes a lot of sense on the surface.

However, it has the slight problem of not actually being true.

First, accessing RAM is (apart from I/O) the most expensive part of your program these days.

"The reality today is very different: computation is essentially free, because it happens “in the cracks” between data fetch and data store; on a clocked processor, there is little difference in energy consumption between performing an arithmetic op- eration and peforming a no-op."

-- Andrew Black, Object-oriented programming: some history, and challenges for the next fifty years

So making your code more RAM intensive is very unlikely to make it faster, and much more likely to make it slower.

Second: yes, there is a relationship between time-saving technologies and RAM use, but it is mostly coincidental, not causal. My NeXT cube ( 16MB RAM, 400MB HD) had a very productive development environment, and earlier Smalltalk systems were arguably more productive, with even less resource use.

We're slowly groping towards fast feedback loops and loosely coupled components (and often times away from, hence the slowness), and sometimes we pick up some goodness almost as if by accident.

They don't need lots of RAM, the Alto had 128KB.

pron · on June 5, 2018

> However, it has the slight problem of not actually being true.

The slight problem of your explanation showing that you do not actually understand the issue makes this unprovoked snide remark not only unnecessary but also embarrassing.

> First, accessing RAM is (apart from I/O) the most expensive part of your program these days.

That is absolutely true but has nothing to do with my argument, as the cost of access is not dependent (at least not much, and NUMA aside) on how much RAM you're using (once you exceed cache sizes etc.). A thousand cache misses cost the same whether your heap is 100 MB or 1 TB. In fact, using more RAM can make it easier to access RAM less, and so can make your application significantly faster precisely because access is expensive.

For example, suppose a common operation in your program is using the average of a set of numbers (that doesn't change often). Computing that average would require a large number of RAM operations (depending on the size of the set). On the other hand, computing that value once and caching it, would need more RAM but would access it less.

Another example: some modern GCs (like ZGC) have worst-case pauses that are independent of the size of the heap, but the rate of collection increases the less memory you have. Therefore, increasing your heap would only reduce the amount of time spent in GC (because it reduces RAM access by increasing RAM usage).

> Second: yes, there is a relationship between time-saving technologies and RAM use, but it is mostly coincidental, not causal.

It's very hard to learn from old examples because they operated in a very different environment under very different constraints and requirements (also, RAM constraints affected Smalltalk's ability to cache JITted code, and made it less efficient). But in order to make my argument less controversial, let me put it this way: if you can pay for productivity/performance with RAM you're making a good deal.

rhacker · on June 5, 2018

> However, it has the slight problem of not actually being true. The slight problem of your explanation showing that you do not actually understand the issue makes this unprovoked snide remark not only unnecessary but also embarrassing.

I thought it was a snide remark too. I wish people would stop using flashy statements just to make themselves sound smarter.

Samtaran · on June 5, 2018

Check out this article in which the author was able to run an app written in java with 4MB RAM only using GraalVM:

https://sites.google.com/a/athaydes.com/renato-athaydes/post...

That clearly shows whats possible.

jug · on June 5, 2018

Yes, but GraalVM is still a VM with all that implies. An emulation of a computer system to execute code in a platform independent environment, where another security-wise benefit is the walled garden nature. I think that all this done in 40 MB is pretty fascinating!! It depends on how you look at it?

If you want the lowered resources, go ahead and code in C or why not Free Pascal, with an underestimated pretty mature and stable API on a multitude of platform targets. There are plenty of active projects with rich libraries. But there are also several disadvantages from not running on a VM.

I think that these days, what you look for sometimes makes sense but in my opinion mostly just if you are into operating systems, embedded systems, hardware drivers, and so on. I don't think the added effort is worth it otherwise except for maybe personal pet projects. :)

viraptor · on June 5, 2018

> GraalVM is still a VM with all that implies. An emulation of a computer system to execute code in a platform independent environment,

Not with graalvm native image compilation. It does ahead of time compile and eliminates the bytecode as far as I understand. It still uses managed memory and other abstractions, but it's quite far from a typical java VM. (And definitely away from a system vm)

tom_mellior · on June 5, 2018

> Yes, but GraalVM is still a VM with all that implies.

What does it imply? For one thing, the term "VM" itself is vague. GraalVM isn't an "emulation of a computer system" in the way a "VM" like VirtualBox is.

tootie · on June 5, 2018

It sorta is. It was like a single Docker image called "Java" that gave you a complete runtime environment with all the OS internals available through a standard API so long as you wrote everything in one language.

tom_mellior · on June 5, 2018

An API is not an emulation of a thing X. It's a way for you to access the actual X.

Besides, as a sibling comment pointed out, even that loses all significance with native image compilation that throws out all the API methods that are not used, and inlines away the ones that are used.

dullgiulio · on June 5, 2018

Not a lot of pruning. Incidentally, writing the same thing in Go (for example) yields more or less the same binary size.

You should see how quickly the binary size grows as you add more of your code (answer: not much) and what the alternative is: a txt file containing "hello world" and a statically built nginx server.

eikenberry · on June 5, 2018

They list the binary size as 27MB where a similar Go binary is just under 5MB. So I'm not sure where you got your numbers, but they're wrong. Also you were responding to a comment about the runtime resident memory (RSS) of the app being 40MB, not about the file size.

dagenix · on June 5, 2018

It's not overhead unless you are intent on only making a hello world app and stopping there. Assuming you are going to make something more involved, you are going to use those extra features. Optimizing the size of the executable for the hello world case (outside of a fun weekend project) is insane - optimization should be done for the types of apps that are doing real work.

I'm not sure why I'm saying any of this since you seem to be aware of it - your second paragraph says the same thing and completely contradicts your first paragraph.

jpalomaki · on June 5, 2018

What is the most scarce resource for average developer. Is it CPU? Is it memory? Is it network bandwidth? Or is it his own time?

I guess for most developers it is reasonable to trade-off to blow few megabytes here and some kilobits there in order to save even few hours of precious calendar time. When everybody contributing to the dependency graph does that, it adds up.

shellac · on June 5, 2018

On my machine it's an order of magnitude larger than `true`.

    /usr/bin/time -f "\nmaxRSS\t%MkB" true
    
    maxRSS 1276kB

I'm not sure what to make of that.

kodablah · on June 5, 2018

Missing under limitations is lack of Windows support (though it's coming they say). Also, I would caution anyone from putting their eggs in this Oracle basket. This is not a community project and we know who the owner is and they do have a premium version. Sure, if you have existing JVM stuff and want a native app, go ahead, just don't build a reliance on it. Otherwise, even if you believe Oracle will be good stewards of the project (which I do believe), it's still ok to avoid it on principles around the owner's other activities. Of course stances like these can't be 100% consistent across everything (especially legacy or defacto standards) but just something to keep in mind.

djsumdog · on June 5, 2018

I hadn't heard of Vertx until my most recent job where we use it for our platform. We currently deploy in openjdk docker containers.

This seems neat, but I'm wondering about dependencies. Does Graal rebuild all your dependent jars to be native? What if you have unsupported things like reflection in a dependency? I'm looking through their website and can't seem to find any answers.

We've already run into dependencies for Vertx that puke on Java 9 (depreciated/removed API) and are currently still on Java 8. This seems nice, but I have a feeling that, for production, you'd end up writing a lot around GraalVM.

nwatson · on June 5, 2018

The "limitations" link in the posted page has a section on "Reflection", [1]. It says: "Support Status: Mostly supported ... Individual classes, methods, and fields that should be accessible via reflection must be specified during native image generation in a configuration file ..."

I'd assume libraries your app depends on should be usable via Reflection as well, and that their own calls to reflection APIs should work as well, as long as the classes your code, or the library codes, want to reach are somehow included in the Java --> Native conversion configuration files.

[1] https://github.com/oracle/graal/blob/master/substratevm/LIMI...

shellac · on June 5, 2018

My understanding is that Graal is doing whole-application compilation, so the dependent jars will be compiled.

Reflection is an issue that they have a work around for. There's a good article illustrating various issues [1] in the case of netty written by one of the graal team. (There you'll see an example of a reflection configuration file)

[1] https://medium.com/graalvm/instant-netty-startup-using-graal...

AndrewSChapman · on June 5, 2018

Excellent! I've started using Vert.x with Kotlin instead of PHP for making restful API's and it's wonderful. Event driven, strongly typed and performant. Looking forward to trying this out.

nobleach · on June 5, 2018

Did you find the documentation to be a bit painful? I gave it a shot for API endpoints too, and was very frustrated that Kotlin felt like a fourth class citizen... and the existing docs displayed one of my biggest pet peeves with most Java-esque docs; "we won't mention the imports, you'll probably have an IDE that figures those out magically". As a Vim user, that's a bit more difficult.

tannhaeuser · on June 5, 2018

AOT compilation for Java alone is just a technique to improve startup time for command line apps vs actually native apps; AOT shared libs still need JVM infrastructure for GC.

Also, going all-in on async doesn't seem that helpful on the JVM where the vast majority of existing code (= what makes the JVM valuable) uses synchronous I/O.

So what's the point of this? To start a webserver really fast? I always thought the value of vert.x is to become part of a node.js/CommonJS runtime for the JVM (like what Oracle and RedHat were trying a couple years ago).

thermodynthrway · on June 5, 2018

I've messed with async on JVM. In my experience, it doubles speed at most for CRUD apps with heavy DB access. Async is only useful if you're thread starved, and most DB calls are fast enough that it never happens.

The one place it makes a huge difference is when you're waiting on other, possibly slow, services like third party REST calls. If you have 100 threads and each call takes 1000ms you will run out of threads at 100 req/sec which is quite low for the JVM.

Using async in these situations allows thousands of threads to be paused while your 100 still do work, allowing you to handle maybe 10k req/sec even with a slow partner service.

A lot of JS programmers don't understand the relation between thread starvation and async, they act like async is magic. If you're not thread starved (everything you do is pretty quick) then async is useless

apta · on June 6, 2018

> I've messed with async on JVM.

What library or framework did you use? Using ExecutorService or similar would still have those 100 threads blocked even if they run in their own future, correct? The only thing it would allow is queuing the other incoming requests, but it does not time slice or switch to another request upon hitting an IO request.

thermodynthrway · on June 6, 2018

You can use fiber libraries like quasaras your executor, just like Go uses for regular threading.

The double edged sword of Java is that you can do anything. But the learning curve is huge because there's so much crap out there

BenoitP · on June 5, 2018

If I had to bet on that, I'd say it is a move to make it a really great platform for serverless; where it can absorb trafic spikes with ease, and go to sleep very fast.

The logic required in a typical web server is not complicated. Most of what the hardware does is filling and emptying memory/caches, waiting for the disk or the network, etc. Most useful computations are like very thin spikes that need a vast context to exist. All the current work being done is to carve a lot of this context down. Doing this will drive the operating cost down a lot.

The JVM has several other initiatives in the spirit of this. Consider these 2 JEPs that just got added:

* Dynamic Max Memory Limit [1]

* Timely Reducing Unused Committed Memory [2]

There is also the Loom initiative [3], to provide a base for blocking-code-like to run on very cheap threads; providing tiny threads with a tiny context.

IMHO these make it possible to have an AWS lambda service that has a typical wake up time in the 10us, rather than a few 100ms. Your functions (and CPU costs) could sleep in-between IO/disk/network requests!

[1] http://openjdk.java.net/jeps/8204088

[2] http://openjdk.java.net/jeps/8204089

[3] http://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.htm...

kuschku · on June 5, 2018

> AOT compilation for Java alone is just a technique to improve startup time for command line apps vs actually native apps; AOT shared libs still need JVM infrastructure for GC.

That’s true with the JVM’s AOT project, but SubstrateVM is entirely separate – you get a single binary that runs standalone.

kodablah · on June 5, 2018

> Also, going all-in on async doesn't seem that helpful on the JVM where the vast majority of existing code (= what makes the JVM valuable) uses synchronous I/O.

I'm not sure that the majority of many-year old backwards compatible code being sync IO has much to do with the value of async. So long as there're ample amounts of async code/libs. I wrote a backend in Scala using an async Postgres lib. I hopped over three different ones before writing my own, all of which are async. For things like network IO in fact, there may be more maintained async code than sync code.

marktangotango · on June 5, 2018

Great but whats the gc story? Does substrate vm include a generational gc?

tom_mellior · on June 5, 2018

Yes, according to https://nirvdrum.com/2017/02/15/truffleruby-on-the-substrate...:

"... the SVM has a new generational garbage collector (GC) that’s different from the JVM’s. The SVM ahead-of-time (AOT) compiler uses a 1 GB young generation size and a 3 GB old generation size by default."

Reason077 · on June 5, 2018

Yes, SubstrateVM provides generational GC. Just not traditional Java dynamic classloading or JIT compilation.

specialist · on June 5, 2018

My bro and my bestie worked on a Vert.x project. So I got to hear all about it.

I can only comment thru comparison. Due to multiple poor life choices, I'm currently stuck maintaining multiple large nodejs code bases.

I'm "maturing" our code base from callbacks to promises (futures) to async/await as able.

It's terrible.

Whatever the future of async I/O programming is, it's not events, promises, callback. I've been reading about and playing with actors, CSP, Erlang/Elixir/Phoenix. If I was doing greenfield, that's where I'd place my bets.

BenoitP · on June 5, 2018

There is a great approach in Quasar [1] where you program simple blocking code, and the runtime (bytecode manipulation with a Java agent) unmounts and remounts thin contexts in the fat thread you are in. A light thread, if you will. No events/promises/callbacks. It does require unmounting-aware APIs, though; making you incur the usual fat thread mounting/unmounting cost if they are not available.

The author of this project is currently implementing the base element of this approach (continuations) in the JVM [2]; probably along the required rewrite of the Java standard library. I'm very excited about this!

You could start writing code in Quasar, IIRC they said it will use these continuations when they are made available. I would not be surprised if it is one of the first frameworks to make use of it.

[1] http://docs.paralleluniverse.co/quasar/

[2] http://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.htm...

ofrzeta · on June 5, 2018

Latest news on the website from 2016. Latest commit on Github from January; build broken. Wouldn't be my first choice for starting a new project, I guess.

bpicolo · on June 5, 2018

> unmounting-aware APIs

Does that mean libraries need to be rewritten to support it? In my experience, important IO-driven libraries like database access/ORMs tend to be extremely poorly supported for non-language-native concurrency strategies, or even concurrency strategies added late in a language's lifecycle.

BenoitP · on June 5, 2018

> Does that mean libraries need to be rewritten to support it?

Yup. Here are the list of supported libraries [1].

> even concurrency strategies added late in a language's lifecycle

Quasar code uses a regular Java Exception as a trick to stop execution and unmount, to sort of piggyback on the language features. Fibers do also implement the Thread interface; I think what you pointed at was indeed a concern of the author. I don't know how the Loom initiative look like.

If you want to see how a library is adapted, here is JDBC [2].

[1] http://docs.paralleluniverse.co/comsat/#getting-started

[2] https://github.com/puniverse/comsat/tree/master/comsat-jdbc/...

randomsearch · on June 5, 2018

Does native code execution outweigh the loss of JIT optimisation?

mickronome · on June 5, 2018

For big applications with high concurrency and long lifetimes where reducing working set sizes compared to available CPU cache sizes, it probably won't.

But for a lot of smaller applications, or applications where you need to do a lot of warm-up of the JIT/JVM to avoid issus on deploy, and CLI tools it will definitely be a net-positive.

gabcoh · on June 5, 2018

If I understand it correctly JIT compiles down Hotspots to native code, so if the entire executable is already native it must be just as fast or faster.

jhomedall · on June 5, 2018

JIT compilation makes decisions on how to optimize the generated code based on run-time analysis the program. AOT compilation doesn't have access to that information, which can lead to slower code.

However, because AOT compilation isn't delaying the execution of the program, it is allowed to take much more time to optimize the output.

pjmlp · on June 5, 2018

You can AOT compile with PGO (Profile Guided Optimization) though.

Bjartr · on June 5, 2018

And if you can record profiling data from every machine the code will run on you might be able to be confident no JIT could do better, until a new processor (that is a compatible architecture to the executable) that wasn't available at the time of compilation comes around and JIT can still tune for things like cache sizes or processor features and the AOT executable is stuck not being able to take advantage of them.

pjmlp · on June 5, 2018

Agreed, but there are ways around it.

For example, Android P will upload PGO data into the store, which is then distributed across all devices, and fed into the on-device AOT compiler.

Bjartr · on June 6, 2018

I would argue that is still a JIT in the spirit of this discussion so far. That, or we need another term to disambiguate the cases of AOT(pre-distribution) and AOT(post-distribution)

pjmlp · on June 7, 2018

It is called PGO.

guipsp · on June 5, 2018

There are some optimizations hotspot can do that substrate cannot