More

pkhagah · on May 3, 2018

Is TornadoFx supported native without JVM? AFAIK TornadoFx is just a layer around JavaFX. So, Kotlin/TornadoFx should use the same resources as Java+JavaFX.

dropofwill · on May 3, 2018

Yeah, I think the usual approach is to ship the JVM with the app.

And last I looked it (like ScalaFX) runs against Java 8, so no Jigsaw to shrink app sizes.

tomc1985 · on May 3, 2018

Which it does. JavaFX doesn't do native but its given widget takes a very conservative and elegant approach to styling.

pkhagah · on Dec 18, 2017

I don't think openssl in phronix benchmarks use armv8 instructions, which makes those benchmarks unfair. There are benchmarks on cloudfare with openssl-devel, with hardware armv8, where arm performed decently. We have to wait and see how this pine64 does.

https://blog.cloudflare.com/arm-takes-wing/

redshirt · on Dec 18, 2017

This is entirely to lacking an optimized version for the new aarch64 instruction set. Similar problem noted in Go benchmarks awhile back, see 100x speed up here: https://blog.minio.io/accelerating-sha256-by-100x-in-golang-...

revelation · on Dec 18, 2017

armv7 and armv8 is just 64 bit, so I guess a corner case like openssl signs could benefit massively (lots of bigint) but normal workloads only marginally, while taking a massive hit on memory (don't want the 1 GiB model).

The ARM they tested at Cloudflare is also of course nothing like the mobile chip that powers the pine.

theresistor · on Dec 18, 2017

AArch64 is actually a completely new and much improved instruction set. More registers, less warts that inhibit out-of-order execution, etc.

pkhagah · on Nov 13, 2017

IITM will start with tapeout of microcontroller class(C-class) CPU. But IITM is designing mobile, server and HPC class CPUs.

http://rise.cse.iitm.ac.in/shakti.html

pkhagah · on Sept 26, 2017

It can be enabled with extension.

https://testpilot.firefox.com/experiments/containers

pkhagah · on Sept 26, 2017

Issue bigger then demonetization is NPA. Banks need appx. $50 billion USD to comply with the new norms. These huge NPA's are given in UPA2 term. Your beloved Manmohan Singh didn't utter a word against this. When the banks are looted systematically.

kamaal · on Sept 27, 2017

Finance minister and the PM himself said that they did DeMo to demonstrate their heroics in killing black money.

Should I believe them or you.

pkhagah · on Sept 20, 2017

It's not about using duckduckgo to search with google. I use it when duckduckgo results aren't good enough, instead of typing everything in google again. I have been using it less frequently recently.

pkhagah · on Sept 18, 2017

Loans and sanctions. China helped SriLanka in UN when human rights issues came up. Also India's cultural influence is not an answer to everything. It has limited influence. Myanmar for example evicted Indians forcefully in 60's. Bangladesh and Nepal are trying to balance China and India influence.

https://en.wikipedia.org/wiki/Burmese_Indians

pkhagah · on June 3, 2017

India is still focusing on thorium. But slowly. Design for AHWR if recently finalized and site selection in progress.

https://en.wikipedia.org/wiki/Advanced_heavy-water_reactor

pkhagah · on May 18, 2017

What lossy compression? Were you guys throwing bits into /dev/null?

pkhagah · on May 13, 2017

> The OCaml garbage collector is a modern hybrid generational/incremental collector which outperforms hand-allocation in most cases. Unlike the Java GC, which gives GCs a bad name, the OCaml GC doesn't allocate huge amounts of memory at start-up, nor does it appear to have arbitrary fixed limits that need to be overridden by hand.

From the linked article. Can anyone confirm or deny this? I have hard time believing this statement.

gsb · on May 13, 2017

As far as it goes I think it is a pretty fair statement. However it is important to know of other decisions in the language which make it easier for the GC. Particularly, the use of tagged values and a GIL. Tagging makes it simpler to distinguish heap values from other values, and the GIL means that the GC does not need to operate concurrently.

jblow · on May 13, 2017

It is simply not true unless you have a pathological idea of "hand-allocation" (which to be fair, some programmers do program like).

Let me put it this way ... all "garbage collection is fast" claims are saying the following thing:

"It is faster for the programmer to destroy information about his program's memory use (by not putting that information into the program), and to have the runtime system dynamically rediscover that information via a constantly-running global search and then use what it gleans to somehow be fast, than it is for the programmer to just exploit the information that he already knows."

It sure sounds like nonsense to me.

_ph_ · on May 13, 2017

You are talking about an abstract case. Yes, if an objects lifetime is perfectly easy to determine, manual memory management is faster. But in many typical programs with real dynamic memory requirements, this is not the case. One has to keep track about the life time of objects in the program logic. Either by storing additional information as for reference counting, or careful designing your program logic so the status of an object is always clearly known.

The latter might not bear a cost in CPU cycles, but certainly in programmer cycles, if the program logic allows for it at all. It also puts the burden of correctness on the programmer vs the garbage collector. And for most contemporary programs, correctness is the larger challenge then execution speed, especially with GCs which can run in parallel with your program logic, so can utilize unused CPUs "for free".

mcguire · on May 13, 2017

"This paper presents a tracing and simulation-based experimental methodology that executes unaltered Java programs as if they used explicit memory management. We use this framework to compare the time-space performance of a range of garbage collectors to ex- plicit memory management with the Lea memory allocator. Com- paring runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with ex- plicit memory management when given enough memory. In par- ticular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collec- tion’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Gar- bage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management."

http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf [2005]

Consider what happens when you allocate or free memory manually: the manager had to do some amount of work to track the status of the memory.

Now consider a 2-space, moving collector. Allocation is a pointer bump and range check. Collection means tracing and copying the live objects. That's it.

It's not as simple as it seems.

vvanders · on May 13, 2017

Yup, an arena allocator that you drop at the end of an operation will always be faster than a GC.

yosefk · on May 13, 2017

Well, yeah, but if you need some of the allocated data at the end of the operation, you'll end up copying it, which will have a runtime cost, might introduce bugs (because of pointer invalidation and the ease of forgetting to update some pointers) and then none of the two will get counted as "cost of manual memory management." C++ with its value semantics encourages unnecessary copying tremendously and it's never counted as "cost of memory allocation" or "producing garbage", on the contrary, Stroustrup says things like "C++ is my favorite GC language because it generates so little garbage to begin with."

This is not to say that it's fair to call OCaml "efficient" in the memory department based on a GC benchmark; TFA is full of examples where OCaml allocates things on the heap that TFA recommends to allocate elsewhere and shows you how to maul your code to get there.

My only point is that how well a programming system uses memory is a very hard question because (A) there are many different use cases and (B) you can't isolate "memory performance" into a few easily measurable things like time spent allocating, time spent in GC and peak memory use - there are other things like what your program has to do outside of the allocator to cope with its semantics and how the performance of code using the memory objects is affected by the layout encouraged by the allocator and these things cannot be measured in isolation from the rest of the program.

pjmlp · on May 13, 2017

Not if you have cascaded data structures, have some kind of destructor like operations to call, synchronize threads, or if the memory allocator does a call into the OS to release the memory back to the OS.

mcguire · on May 13, 2017

Two facts of computer science: Destructors suck. And destructors are wonderful.

pjmlp · on May 13, 2017

I loved Herb Sutter's talk at CppCon 2016, where he showed how careless written destructors can lead to "stop-the-world" and stack overflows in complex data structures.

mcguire · on May 13, 2017

Arenas are probably the fastest memory management technique this side of fixed allocation and stacks.

If the lifetime of the objects is contained within the lifetime of the arena. If not, you have to copy live objects out before you destroy the arena, at which point you are implementing a n-space moving collector.

rbehrends · on May 13, 2017

For a modern generational GC, the minor heap works just like an arena allocator (at least for the data that can be thrown away), so this shouldn't result in any performance difference (assuming the minor heap is large enough).

gsb · on May 13, 2017

I think that when you find arguments that GC can be competitive with manual memory management, they always rely on the assumption that it is in the context of idiomatic code. For a GC language, idiomatic code allows the use of a generational collector which can automatically operate in a manner similar to an arena allocator. I suppose this is a matter of opinion, but I think that using arena allocators in a language like c++ is not strictly idiomatic - you should only add them when you find they are required for optimisation, and they have a cognitive overhead. If you make the wrong choice with your arena allocator then either you can make performance worse or you can introduce memory safety problems.

So this is the way that a GC might be competitive with manual memory management - if the benefit of arena-like allocation offsets the additional tracing performed by the GC.

yosefk · on May 13, 2017

Isn't it harder to deallocate the data if you actually check that it's unused, as a GC must, instead of just assuming it?

barrkel · on May 13, 2017

A GC doesn't check if memory is unused; it looks for memory that's used and frees whatever is leftover. Having a small fraction of used memory to total memory is what makes GC cheap (in compute time). It also means that allocating short-lived memory with abandon is cheap, if that's most of your allocation.

yosefk · on May 13, 2017

Got it, an interesting point and it follows that a use case can be engineered where a GC adds essentially zero overhead over an arena allocator.

But in the general case where some objects are short-lived and others aren't, surely manually splitting your allocations to malloc for long-lived ones and some sort of arena_alloc for short-lived ones ought to be faster than allocating them all in one place, then copying the ones which are still reachable out of the area reserved for short-lived objects?

(This is not to say a GC-based system will be slower "on average" because nobody knows what the "average" is. A realistic arena-based system can have objects most of which are short-lived but some do need to live longer and you only find out long after they're allocated; in that case, one has to manually reallocate those objects just like a GC would, and doing it, say, the C++ way is definitely more bug-prone than GC's bug-free handling of this, and one way to make it less bug-prone in C++ is to have deeper copies and avoid trying to minimize copying, and now you might easily be slower than a GC. I'm just saying that it's very easy to find a case when a system not getting any hints from the programmer wrt object lifecycles and instead discovering them fully automatically would be slower than a system which does get these hints. And of course a GC can provide ways to supply these hints, I'm just not aware of one which does - perhaps it's avoided on the theory that the GC algorithm might change and you don't want to make hints which operate in terms not portable between algorithms a part of your interface.)

rbehrends · on May 13, 2017

> But in the general case where some objects are short-lived and others aren't, surely manually splitting your allocations to malloc for long-lived ones and some sort of arena_alloc for short-lived ones ought to be faster than allocating them all in one place, then copying the ones which are still reachable out of the area reserved for short-lived objects?

This is where things get difficult, actually, and you need benchmarks because:

1. You still have a bump allocator that's much faster than a general purpose first-fit/best-fit allocator and has generally better memory locality than a pool allocator.

2. Offsetting that may be the additional tracing and/or copying you are doing as a result of garbage collection.

3. Manual memory management techniques often have their own performance costs: std::shared_ptr and std::unique_ptr both add overhead and you sometimes see additional copying where lifetimes are difficult to predict.

Which one has the higher cost is often something that can be only tested for with actual code (and can go either way).

I'll also note that this is primarily of interest for functional languages, which often have a high allocation rate. For imperative programs, the memory allocation rate (and ratio of memory that contains pointers to memory that doesn't) is often so low that even a very basic mark/sweep collector would be fine, as long as pause times don't matter in your application domain. This means that it's often not really a practical issue, one way or the other.

barrkel · on May 13, 2017

IMO the design of the memory system has a strong influence (or should have a strong influence) on the architecture of a system when performance is a big concern. What the "average" is, is under the control of the program itself; a program written for a generational GC should be written differently to one with a refcount-based GC, and both should be different to one for a simple mark and sweep GC.

Similarly, designing a program with manual allocation for speed means using some kind of zone or arena allocation, probably mixed with some stack-oriented allocation that mirrors the control stack (not necessarily consuming CPU stack space). The design of the memory system needs to be integrated with the architecture of the program and the lifecycle of the values it needs to track.

I don't think there's any simple winner. Different problem spaces require different treatments of memory. For example, stateless servers have little need for long-lived memory; they're well suited to generational GCs, but also to arenas (though GC is easier to keep correct, and usually more fluent in practice, without rewriting too much of the standard library). Mix in in-process caching, and things start getting murkier. Put caching in a different process, keeping it simple, at the cost of some IPC; tradeoffs, etc.

gsg · on May 13, 2017

In general you can't drop the contents of the arena on the floor in a GCed language, since you might have been wrong about those objects being short-lived (or because they are short lived, but they were allocated shortly before the young heap filled up and happen to still be alive). So tracing/copying is still necessary.

It is quite possible for objects that are expected to tenure to be allocated directly into the major heap. Many systems use various criteria to decide when this should be done. In particular it is common to allocate very large objects directly to avoid ever having to copy them.

There are also static analyses which attempt to determine which objects can be safely allocated in an arena-like way, that is, region inference. In some systems in addition to inference the programmer is given ways to specify that a given object should live within a certain region. MLkit is the usual example of such a system. Region systems don't seem to have been particularly popular, but there's still some work being done on them here and there.

rbehrends · on May 13, 2017

Unused data is by definition unreachable from any GC roots and should not incur additional tracing work.

ori_b · on May 13, 2017

That means that the heap grows with the highest address allocated within it, unless you incur additional overhead for copying the data around.

rbehrends · on May 13, 2017

This is exactly what a generational GC does. Once the minor heap fills up, it evicts data that is still in use and moves it to the major heap.

Note that this overhead is only incurred for data still in use. We are not comparing memory management strategies in general, but the special case of temporarily allocating data that can be thrown away at the end.

ori_b · on May 13, 2017

It is also not what an arena allocator does.

gsb · on May 13, 2017

Absolutely, but I think the "in most cases" of the original quote was meant to imply simple malloc/free style memory management rather than the use of tailored arena/region allocators.

rwmj · on May 13, 2017

That could blow away your cache if the arena is large. Better to measure it.

andersson42 · on May 13, 2017

Heap allocation and smartpointers (as an alternative to GC) is by no means free, in that context GC performance favorable in many scenarios, for example when doing large number of small allocations.

But in the end, if your application have serious performance requirements, you should avoid allocating memory in the hot path altogether.

_ph_ · on May 13, 2017

So much this. GCs don't mean you shouldn't care about reasonable memory allocations. In hot paths, not allocating will always be faster than allocating. Having a GC does not prevent you from using object pools and buffers, where it is a clear performance win. But having a GC means, you are not required to do so, where there is no bottle neck. The difference between a "fast" and a "slow" GCed language often is, how well the programmer keeps some control about memory allocations.

Fun fact: the Go GC is written in pure Go code, which does not perform any heap allocations. So there is enough control to write code in Go which does not require any heap allocations. There is even a compiler switch for the development of the GC code which makes heap allocations a compile error.

qznc · on May 13, 2017

Java gave GCs a bad name? It made them mainstream!

_ph_ · on May 13, 2017

In a sense, both :). Java made GC mainstream and since Hotspot has an excellent GC, solving the problem for many use cases. However, the Java language design is extremely performance-hostile with respect to memory allocations. Notable points are:

- no value types mean, that there are tons of separate heap allocations which have to be referenced by pointers, this puts a ton of unnecessary load on the GC. It also means, that you fully depend on escape analysis when passing objects to functions for avoiding heap allocation. And the existing primitive types (int, float) often require boxing.

- the design of the standard libraries was, especially at the beginning full of wasteful allocations. Most string handling code was a constant series of allocations (having 16 byte chars didn't help this either).

- far to many Java libraries build extremely complex object hierarchies and protocols. Just reading from a text file means instantiating several objects. This adds to the GC pressure.

dom0 · on May 13, 2017

A good example how to not write performance sensitive code, in general, and specifically in Java, would be Minecraft. Though commercially successful Minecraft is (or at least was) an excellent study of how to not do stuff, be it memory management, networking or rendering (when I played Minecraft it still used immediate mode GL if I recall correctly)

mcguire · on May 13, 2017

Let's break it down. Hopefully someone can fill in the parts I don't know.

> The OCaml garbage collector is a modern hybrid generational/incremental collector which outperforms hand-allocation in most cases.

Ocaml is mostly functional and likes to allocate many short lived objects. With enough memory, a moving collector is very good at handling that load.

> Unlike the Java GC, which gives GCs a bad name, the OCaml GC doesn't allocate huge amounts of memory at start-up, nor does it appear to have arbitrary fixed limits that need to be overridden by hand.

Java's GC (-s; there's a bunch) use a heap with a fixed maximum size and an initial allocation. It has also received much more research attention over the decades. It also has a lot of knobs to fiddle with.

I've run production Java web app servers with multi-gig heaps where almost all requests were handled in the young generation. The knobs and visibility were very nice.

Ocaml doesn't use a fixed size heap, so it can conceivably take over all of memory. It also doesn't have all of the knobs. But it works pretty well.

joshmarlow · on May 13, 2017

Jane Street are big players in the OCaml community (they build/maintain what's arguably the defacto standard library for the language - Jane Street Core) and apparently make a lot of money being good at what they do; so I'm inclined to believe them.

haimez · on May 13, 2017

Devils advocate: they're not doing anything better than anyone else in the market and they don't make a product. They're very invested in their ecosystem which is only as good as they make it because they have a very quiet, well funded (pun intended) echo chamber and a lot of spare cycles to maintain a STDLIB on top of their actual business.

AlphaSite · on May 13, 2017

It may be the but from a couple of talks I've gotten they go to pretty extreme lengths to maintain that performance.

Cieplak · on May 13, 2017

@pron probably has the answer to this

searealist · on May 13, 2017

It's nonsense, the only way to make a GC fast is to use many times as much memory as manual management (about 6x to reach parity[1]).

[1] https://people.cs.umass.edu/~emery/pubs/04-17.pdf

rwmj · on May 13, 2017

This paper assumes that malloc/free is cost-free and that free is always called at the earliest possible place.

mcguire · on May 13, 2017

The paper uses the Lea malloc, which is fast, but not free.

rwmj · on May 13, 2017

By "cost-free" I mean that they use precise traces so that memory is always freed at the exact moment it is no longer used. So they underestimate (greatly IMHO) the memory used by malloc/free versus the garbage collector. It would be a superhuman programmer who always called free() on the line exactly following the last use of every piece of allocated memory.

The second problem I have with the paper is that it underestimates the kinds of structures which GC makes possible -- eg sets of trees sharing nodes. And the corollary to that is that to implement those structures with malloc/free, programmers tend to reach for ref-counting, which adds to memory usage and is generally a terrible form of garbage collection for many other reasons.

Nevertheless it's an interesting paper which does add to the debate (I studied it about 10 years ago). It is worth reading, even though I have concerns about the methodology and hence the results.

searealist · on May 14, 2017

> By "cost-free" I mean that they use precise traces so that memory is always freed at the exact moment it is no longer used. So they underestimate (greatly IMHO) the memory used by malloc/free versus the garbage collector. It would be a superhuman programmer who always called free() on the line exactly following the last use of every piece of allocated memory.

Let's say you are correct, that the manual memory management presented in the paper is too unrealistic. What do you think real world manual memory management would bring the multiple to? 5x? 4x?

> The second problem I have with the paper is that it underestimates the kinds of structures which GC makes possible -- eg sets of trees sharing nodes. And the corollary to that is that to implement those structures with malloc/free, programmers tend to reach for ref-counting, which adds to memory usage and is generally a terrible form of garbage collection for many other reasons.

These data structures are extremely niche, and when used when sharing is not truly needed just lead to time/space inefficiencies.