Hacker News new | past | comments | ask | show | jobs | submit login
On comparing languages, C++ and Go (grok.se)
126 points by abelsson on Oct 5, 2013 | hide | past | favorite | 127 comments



I've been following this whole business card raytracer story and wonder if people might be missing the forest for the trees.

It would be a little nutty to suggest that Golang 1.1 is going to give optimized C code a run for its money. Nobody could seriously be suggesting that.

What is surprising is that the naive expression of an "interesting" compute-bound program in both languages are as close as they are.

Most C/C++ code --- the overwhelming majority, in fact --- is not especially performance sensitive. It often happens to have performance and memory footprint demands that exceed the capabilities of naive Python, but that fit squarely into the capabilities of naive C.

The expectation of many C programmers, myself included, is that there'd still be marked difference between Go and C for this kind of code. But it appears that there may not be.

This doesn't suggest that I'd want to try to fit Golang into a kernel module and write a driver with it, but it does further suggest that maybe I'd be a little silly to write my next "must be faster than Python" program in C.


That's interesting. I had the opposite reaction regarding Golang capabilities while following this saga. Not sure how 'idiomatic' the Golang code is, at first glance it just seems less expressive (more lines of code) than either c++ or java!. I didn't think that was possible.

So whenever people talk about expressiveness of Golang it just seems like a design gone bad. The designers wanted a programming language with the expressiveness of python and the speed of C, they ended up with a language with the expressiveness of C and the speed of python.


I agree, I honestly don't see where the expressiveness claims of Go come from. I've always put it in the Java-like bin of languages (which is not necessarily a bad thing).

I suppose the one thing that Go does well (compared to C++ or Java) is builtin concurrency and communication across tasks.


My first impression of it was that it was like a cross between Java and Python. It is unmistakably similar to Java conceptually and syntactically; they are sibling languages, both designed to streamline, simplify, or modernize C.

I'm a Java-literate C/C++ programmer. I would avoid writing straight Java code at all costs; I find it immiserating. Here are some reasons off the top of my head that Golang is more pleasant to work in:

* The syntax is deliberately streamlined, including implicit declarations, semicolon insertion, lightweight expression syntax, the capital-letter-exports-a-symbol thing

* It has fully functional idiomatic closures

* Interfaces get rid of adapter class BS

* The table type (maps, in Golang) is baked into the language, like Python, not a library, like C++ and Java

* Clearer, more direct control over memory layout; data structures in Golang feel like C

I don't know if Golang's standard library is that much better than Java's, but it was obviously designed carefully by and for systems programmers, so I find it remarkably easy to work with.

It also feels like a much smaller system than Java. Almost every time I write a significant amount of Golang code, I find myself answering questions by just reading the standard library source code. It's easy to get your head around. I've written compiler/runtime-level code for the JVM and I still don't have a great grip on all of Java.


I agree with all of the above and I write Java code for a living currently (Android/Dalvik though, not for the JVM).

Another cool aspect of the last point (Go being small and lightweight) is that if you've got gcc and mercurial on a supported platform, building latest go from source is as easy as:

hg clone http://code.google.com/p.go cd go/src ./make.bash

Got to build a local copy of the JVM and/or JDK for some reason? Good luck with that (even ignoring all the licensing, OpenJDK vs closed, etc)


Have you compared it with ML or Haskell?


Brevity and expressiveness are not the same.


Are you sure about that? Given that any [Turing-complete] language feature can be implemented in any other [Turing-complete] language, the only conceivable difference is in fact length of implementation.

In other words, it is possible to express anything in one Turing-complete language that is possible to express in another.


Except my brain isn't a Turing machine. What expressiveness means then is how easily I can fit the concepts of the code in my head (not on my harddrive). This is not the same as how short the code is, though the two are often closely related.


Expressiveness is power and it applies to the act of writing code, not reading it. I agree that it is ideal if one's code can be read and comprehended easily by others although that has little to do with expressiveness.

http://www.paulgraham.com/power.html


To me, expressiveness is about expressing an idea with as little incidental complexity as possible, which is definitely different from as in as few characters as possible.


I don't think it is different actually. 'Kolmogorov complexity' (which is essentially the global minimum 'essential complexity' of a particular algorithm) is specified in terms of length.


Applying Kolmogorov complexity in this context is a bit more tricky than you make it appear. Kolmogorov complexity measures the complexity of a string as the length of the shortest possible program that generates that string.

So to measure the complexity of a program P we have to write another program G that generates P and then take G's length. It's not a given that a generator for a verbose language is necessarily longer than a generator for a terse language.

But more importantly, what we want to measure is the mental effort required to write and understand code in different languages. To measure that effort in terms of Kolmogorov complexity, we'd have to write a program that generates our states of mind while programming.

Good luck with that ;-)


Actually, given a Turing Machine M which outputs x when given x, then we would have the description of program T0 (for terse language T) equal to the concatenation of the [specification of the] machine M with T0, M T0. Given program J1 (for verbose language J), its description equals M J1. Given sufficiently small M and large T0, the M factor's importance is quite reduced and we see that Kolmogorov complexity is roughly analogous to the respective sizes of T0 and J1 themselves (for this specific Turing Machine M).


You're forgetting that Kolmogorov complexity is the shortest possible description, not just any description of the desired output. The concatenation you're suggesting is highly unlikely to be the shortest possible one.

You have to think of it in terms of compression, because that's what Kolmogorov complexity is really about. It is certainly possible that after compression J1 is shorter than T0.


You're forgetting that I get to pick the Turing machine to use. Kolmogorov complexity is the shortest possible description relative to a particular Turing machine. Re-read my previous post with that in mind.


No you don't get to pick the particular Turing machine if by "particular Turing machine" you mean a Turing machine including a particular transition function, which is the conventional definition of Turing machine (http://en.wikipedia.org/wiki/Turing_machine#Formal_definitio...)

In terms of this definition we need two Turing machines, one whose transition function can be proven to be the shortest that describes L1. And another one whose transition function can be proven to be the shortest that describes T0. These two Turing machines or transition functions are compressed representations of L1 and T0 respectively.

What I'm saying is that the one describing L1 can be shorter than the one describing T0 even if L1 is longer than T0.


I think we are approaching this from different angles. I am thinking of a single contrived M for whom a simple copy input to output is the primary supported operation and all other operations/modes, although still capable of performing arbitrary computation [hence able to execute 'compressed' programs if you should wish to attempt to provide them and also, by definition, still Turing complete], are exceedingly clumsy/verbose.

I have not only the transition function but 6 other entities to define to accomplish this (really 5 barring the 'blank symbol') so I am certain that it is not only possible but likely much smaller than one might think at first. With such a small M (that penalizes attempts to compress T0 and J1), and a large T0 [such as the code base for Microsoft Windows ported to F#] and an even larger J1 [such as the same code base written in C], we still see my desired result.


I understand what you're saying but I fail to see how your M is related to Kolmogorov complexity. You're not free to define some contrived M with particular properties that make your assumption true. Kolmogorov complexity explicitly asks for the shortest possible M for each individual string (i.e L1 and T0). Here's the definition from Wikipedia (http://en.wikipedia.org/wiki/Kolmogorov_complexity):

"More formally, the complexity of a string is the length of the shortest possible description of the string in some fixed universal description language".

M is not that universal language. M is a program written in the universal language of Turing machines. I guess that is our misunderstanding. If you define M itself as that universal language but M only ever does that one contrived thing you said (i.e. there is only one program written in that language) then it is not a universal language any longer and any comparison of lengths would become completely pointless.


In my case, the 'universal description language' is one that supports a single operation, let's say CC for 'carbon-copy', and some other number of operations (contrived to have a quite clumsy/verbose interface). The CC operation takes one input--the 'text' to output (which in my case would be T0 and J1). The 'other operations' would be only the minimal set required to still be Turing-complete (and once again would be devised such that no sane person would want to use them for any sufficiently complicated task). So, M is the implementation of the above UDL.

Then, the question becomes, how much logic do we have to put into M itself to achieve the above? I posit that it would not be very much in fact (but that should be demonstrable if someone were sufficiently motivated).

I think KC requires that M be constant across the texts you wish to compare (so that texts across various languages are normalized to an underlying architecture).

If one views M in this light (as an attempt to level the playing field across competing hardwares/algorithms/languages), then we could see it as the concatenation of the following:

1 - the specification of the raw hardware. 2 - the microcode for the hardware. 3 - BIOS, drivers, HAL, OS, etc. 4 - language implementation/runtime/shared libraries etc.

Now, to compare any two real-world implementations, we need to port them both to the language implemented in #4, for running on the hardware specified by #1-3 and sum their length with M's length.

And if we want to compare two algorithms from different languages (but running on the same 'machine'), we can shift the codes in #4 into the 'texts' of the programs themselves (so as to keep M constant between them). We can do this all the way down to the minimum M required to be Turing complete (and nearly all the way 'up' to the programs under test); i.e., the division between M and a particular text, L0, is somewhat variable.


So you are constructing a particular universal language that is designed to make any compression algorithm written in it so long that the concatenation of the algorithm and its own output would always be longer than its input.

First of all, I doubt that this is even possible, because the input, i.e L1 and T0 is arbitrarily long in the general case, while the algorithm's length must be determined before it sees any input, i.e it is constant. That's related to the invariance theorem: http://en.wikipedia.org/wiki/Kolmogorov_complexity#Invarianc...

But even if it were possible or if we agree to arbitrarily limit the size of the input for practical purposes, you wouldn't have achieved your original goal, which was to show that a verbose language can never be as expressive as a terse language.

Your universal language is neither a Turing machine nor any other generally accepted universal language, so no one would agree to accept this particular language as a test of Kolmogorov complexity.

If you can disprove the invariance theorem you may have come up with a valid criticism of Kolmogorov complexity itself, but you were the one who initially based his argument about the expressiveness of programming languages on Kolmogorov complexity. If you invalidate Kolmogorov complexity you invalidate your own argument as well.

In any event, this has been a fun thought experiment.


Yes, fun thought experiment indeed.

I think I am suggesting to construct a particular universal language designed to make any compression of some reasonable T0 and J1 written in that UL (e.g., the source code of Microsoft Windows written in F# and C respectively) longer than T0 and J1 themselves, hence leaving them to be their own Kolmogorov limit (if you will). This seems entirely doable to me within the confines of the 5-tuples afforded me by the formal definition of a Turing machine (although I can see how others, including you, might be less convinced). :-) Maybe I, or someone else sufficiently motivated, will attempt it someday.


See Felleisen's "On the Expressive Power of Programming Languages"[1] for one formalization that differs from conciseness. Essentially, Turing complete languages can express the same programs at the very coarse "whole program" level, but the paper advocates taking a more local view to assess expressiveness (e.g. what programs in L1 can you write in L2 without having to do a whole program transformation). See also Neal Gafter's related commentary[2] (in the context of the various proposals for closures in Java).

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.4...

[2] http://gafter.blogspot.com/2007/03/on-expressive-power-of-pr...


One doesn't have to look far before he finds a fatal flaw in the commentary. Particularly this:

"In my mind, a language construct is expressive if it enables you to write (and use) an API that can't be written (and used) without the construct." - Neil Gafter

Once more, there is no such construct. All APIs can be written and consumed without language support. Otherwise, Boost would not have had all the new C++ goodies (as an add-on library) before they became part of the language itself. And, of course, one can always put the 'human compiler' to work to produce boilerplate that would be produced by the compiler if the language in question supported the feature in question (if there isn't an add-on library providing the feature in question). One can always do this but the discriminating one tries to avoid it and instead chooses 'expressive' (i.e., conciseness-facilitating) languages. [In the case of closures in Java, one [human-compiler] would merely use one-off interfaces or anonymous classes].

I will take a look at the other links but I gave up on the commentary after seeing such a blatant falsehood.


C++ has features like operator overloading that make faking lambdas via Boost feasible. And even Java has anonymous inner classes that allow a rather ugly local transformation somewhat akin to closures. But if you removed that feature, then you'd need a non-local transformation to fake them.


You apparently didn't read all of my comment as I mentioned Java's anonymous inner classes being used to fake lambdas near the end. And, the point remains, given any TC language L, all possible language features for all possible languages are implementable within L itself. It doesn't matter if L has operator overloading or not; only that it be TC.


Are you sure about that?

Yes.


You can be sure if you want but unless you back up your claim with some reasoning, others will consider you wrong.


A line from a real code (a library):

    "* ^^"._? #> (a.!?(1, Get())) must be_==(Full(Answer(1))) ?~! eventually
It is very short, most of the people would implement the same thing using 5-10 lines of code. I prefer 5-10 lines of code over regex-like syntax.


A more obvious example is minified JS versus regular JS. Briefer? Definitely. More expressive? Definitely not.


"Brevity and expressiveness are not the same."

"Are you sure about that?"

"Yes."

Do I need to explain?


I think you're confusing expressiveness of the language with expressiveness of a particular text written in that language. English certainly affords you the opportunity to share more thoughts but given that you seemingly only wanted to express an affirmative, in English 'yes' is pretty much the minimum. You could have said 'Ya' but that is slang. If you open up the choice of language then you could have said 'si', or '#t' or even just '1'. But with such a short intended message, language choice hardly matters.

Certainly if you intended to say more than 'yes' then you failed at adequately expressing the message to begin with so comparisons at that point are moot: i.e., two pieces of text must both express precisely the same ideas before comparisons of them make sense.


He gets points for brevity. You gotta admit that :)


I'm not sure if it's because the JVM has a bad rep or people like the familiar feeling of Go (smaller learning curve) or some other reason, but it surprises me so many people jump to Go for performance similar to what you'd get on a modern JVM language (Clojure, Scala), but with less expressiveness, fewer libraries, and (I believe) less tooling.

Of course this only really applies to long running apps (web servers), if startup time matters then certainly Go wins.

Perhaps I need to try Go out, but I just don't see what the selling point is.


People are trying to get as much space as possible between themselves and the Java community's culture of complexity. I don't think it's a technology issue.


"the Java community's culture of complexity" -- great evocative phrase.


s/Java/Enterprise/g


> but it surprises me so many people jump to Go for performance similar to what you'd get on a modern JVM language

In a word: hype. If Go did not have the Google brand name attached to it, it wouldn't have gone anywhere.


You're empirically wrong.

I actually did write a fair amount of Go code (my primary languages are C++ and Python).

On real, non-toy programs it is significantly more concise than C++, in the ballpark of Python code.

This isn't visible on code snippets (less than 500 loc) like the toy raytracer, but trust me: you'll see a big difference on 10k loc codebase.


I've just wrapped up a go program that's not a toy, albeit not as large as 10k - it's a little over 3k. It was not nearly as terse as python, although the performance increases made it well worth it. This is something that surely varies a great deal from one application to another.


Without knowing what your program did, it is hard to see what language features caused the difference. In any case, why was Java not a better option than Golang?


Go outputs a statically linked binary that just RUNS. Java needs a quite heavyweight runtime to be installed, that imposes quite a bit of startup overhead. That's just one reason - for short runtime CLI-type utilities it's not in the same ballpark.


Ok, so lets go with your use case. You can run basic java ("hello world") under 3MB with a startup time of about 0.1 second. So that is the true overhead if you really care about tight code. The default values are pretty large. Everything else (memory/startup) is added due to external libraries that are needed and additional memory as the program grows.

Given that "kkowalczyk" talked about 10K line programs, what applications are you thinking of that are 10K lines and cannot tolerate a 0.1 second/3 MB overhead. Would you restart java everytime? Note that even a simple helloworld c program has about 0.01second/0.5MB overhead.


I was going to say there's no way java programs boot in 0.1 seconds, but looked it up to be sure. Here's the results on my mac/i7, defaulted to server mode:

    $ time java HelloWorld
    Hello, world!

    real	0m0.101s
Touche. 0.1 seconds is exactly right, at least on my setup. That said, javac is slow given the program is 5 lines of code:

    $ time javac HelloWorld.java 

    real	0m0.511s
    user	0m0.833s
    sys	0m0.050s
And I'd ventured to guess that there must be something to the JIT being pretty slow for real-world applications, otherwise people wouldn't complain so frequently about it. Maybe aspects of JIT optimizations increase linearly-ish with the amount/complexity of code?

FWIW, we went with go at my work instead of java because our application is memory-intensive, and there's huge gains there in go over java.


A program that consists solely of println("Hello, World!"); is pretty stateless and trivial, there's not much that a JIT could do with this.

Somewhat anecdotally, I remember hearing Rich Hickey talking about how the JVM JIT loves stateless code. I'm pretty sure the vast majority of Java code in use is deeply stateful. I don't know how much of a difference this actually makes but it seemed like a relevant data point.


You are off by two orders of magnitude in the C case, at least with my trivial test case of writing hello world, and then a runner program that forks/execs/waitpids 10,000 times.

If you know what crt0.c does, you can see C is also pretty much the asymptote of what you're going to get from program startup, so it's a little silly to make the comparison.


Is that with the JVM already running or not?


> Java needs a quite heavyweight runtime to be installed, that imposes quite a bit of startup overhead.

One can always use one of the commercial native compilers for Java, any of them can easily produce static binaries with fast startup times.

Don't measure Java the language, by Oracle JVM the implementation.


A lot of the verbosity in Go comes from error handling; the language forces you to deal with errors (I think it's a good thing, some may disagree). Expressiveness is real, 'duck typing' does give it a feel of scripting language like Python, and it is way less verbose than Java (apart from error handling). And to be fair, Go is faster than Python (the numbers tell you that), and (slightly?) more expressive than C --and just in a league of its own for concurrency.


> and just in a league of its own for concurrency.

Not really. Other languages are far superior in this regard (see Scala or Rust for instance).


I think the reason both versions lack conciseness is because they're faithful translations of one another. I wouldn't call either well structured, but maybe that's because I don't understand the problem domain well enough. T() in the C++ versions especially looks awful.


To concur with what kkowalczyk, there is little expressiveness advantages to be gained for this specific example -- math is math, and outside of compressing it down to a couple of library calls, what could possibly be done better?

"they ended up with a language with the expressiveness of C and the speed of python."

This is just pure troll. Apparently you have a Python variant of this renderer that matches the already impressive Go performance? Please reveal it.


> It would be a little nutty to suggest that Golang 1.1 is going to give optimized C code a run for its money. Nobody could seriously be suggesting that.

AFAICT, the entire situation started only because the article was submitted to /r/Golang and /r/C++ with the trollbait title "Business Card Ray Tracer: Go faster than C++", and not because of anything in the article itself (which was actually a pretty good article).


Ray-tracing, especially the simple kind in this example, is all about vector maths. CPUs are extremely good at this type of task. Finding Go performs well at this shouldn't be surprising. Any decent compiler will be able to produce good code for this task as it maps very closely to what CPUs do best, meaning you don't need much fancy analysis.

I think the majority of languages in popular use are faster than Python. I believe that Go is popular with the Python / Ruby crowd because idiomatic Go is quite close to what they do already. I.e. you don't need to learn much to shift from Python or Ruby to Go. Using a language like Scala, for instance, is a much bigger jump.


Actually, looking deeply at Go from a performance perspective (as I have been, the last couple of days) has revealed a bunch of low hanging fruits/missed optimization opportunities in the Go compiler. That was 50 % the intent of the grand father blog post


Sure, there's plenty of work still to be done. Go 1.0 is only 18 months old, with 1.1 only 6 months old.

C++ has 30 years of history behind it, MS VC++ is 20 years old, Intel's C++ compiler is at least 10 years old, etc.


Finding Go performs well at this shouldn't be surprising.

Go doesn't do SIMD at all (see note 1). Personally I leverage Go coupled with the Intel Compiler (Go happily links with and uses very high performance C-built libraries, where I'm rocking out with SSE3 / AVX / AVX2).

To respond to something that Ptacek said above, many of us do expect Go to achieve C-level performance eventually. There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet. There is nothing about the language that prohibits it from a very high level of optimization, and indeed the language is generally sparse in a manner that allows for those optimizations.

*1 - For performance critical code you are supposed to use gccgo, which uses the same intermediary as the C compiler, allowing it to do all of the vectorization and the like. Unfortunately for this specific code gccgo generates terrible code, yielding a runtime that is magnitudes slower (albeit absolutely tiny). Haven't looked into why that is.


> There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet.

Those optimizations would almost certainly reduce the speed of the Go compiler (requiring SSA form and aliasing info).

> There is nothing about the language that prohibits it from a very high level of optimization, indeed the language is generally sparse in a manner that allows for those optimizations.

Autovectorization is very sensitive to good output from alias analysis. This is where the const and restrict keywords in C, absent in Go, are useful. I think you will at least need runtime guards in Go, whereas they are not necessary in well-written C.


My understanding is that automatic vectorization is still quite sensitive to how code is written. The compiler may fail to vectorize one implementation of an algorithm, while vectorize another, due to details in the implementation of both the code and the compiler.

My point is not about vectorization though. Code that uses mostly vectors, math, and function calls has a very direct translation to machine code. I expect all compilers to generate approximately the same machine code for this type of code, assuming vectorization doesn't come into play. So I don't expect to see large differences in performance. Of course there will be some difference, but not the order of magnitude one sees between compiled (statically or JITed) languages and interpreted languages.


> assuming vectorization doesn't come into play.

Now that 256 bit AVX registers that process 4 numbers in one go, even when one uses 64bit floats (and 8 with 32bit floats), vectorization more and more comes into play.

Using 64bit floats with 128bit SSE registers, it was kinda possible to ignore the vectorization, as it was less than 2x speedup. But no more.


"There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet. "

A JVM could compile byte code to SIMD instructions. Most of them don't, yet.


Auto-vectorization is impossible for a lot of real-world code because it requires changing how data is laid out in memory. Notice that the AVX version of the raytracer actually involves packing blocks of x components into a single 256-bit-wide variable. Realistically, a compiler is not going to be smart enough to figure that out.


Absolutely true, though of course you could do the same memory layout with the Go code. If we're talking about compiler comparisons, a vectorization-suitable tighter inner loop that operates on contiguous memory would be a good high performance comparison. The standard Go compiler would not vectorize it...yet...though honestly I don't know what the state of gccgo is or whether it yields an intermediary that brings the gcc vectorization into play.

And of course the reason you code for auto-vectorization is for ease of platform support. The linked AVX code will not run on the vast majority of virtual machines, or any CPU made prior to 2012. Nor will it take advantage of AVX2. I use the Intel compiler and either yield builds that I can target to specific processors or technology levels or I can add support for virtually all technologies, such that the same code will vectorize on AVX2, failing that AVX, failing that SSE3.2, failing that... you get the picture. With a suitable ARM compiler the same code would vectorize to NEON, etc.


There have been languages in this space for a very long time. Pascal and Algol 68 are both very old examples (see the Go vs Brand X comparison). Then there's Modula or Oberon not quite as long ago. In the last 20 years, everything from Java to OCaml to Scala to C# to F# to Haskell to Common Lisp to Lua have been developed.

So it's not especially interesting that Go is in this space as well. The most surprising thing about Go is that its developers seem never to have heard of any of the above languages (with the exception of Java).


I don't think it's interesting that Go has this combination of attributes. I think it's interesting that the "right" boring systems programming language got a degree of tooling and traction sufficient to make it, unlike Modula and Oberon, a viable mainstream choice.

I am fine with boring languages. The first language love of my life is C. It's hard to get more boring than C. If a system I build is going to be clever or sophisticated, I'm fine with that being expressed in my code, rather than as the product of the environment I happen to be working in.


It's hard to get more boring than C.

C... boring? I don't think so. Not with 200+ undefined or implementation defined behaviors. Not when even something as simple as "a = b + c" can evoke nasal demons.

C is so loosely defined that it keeps you on your toes constantly. Every single line is like walking down a dark corridor with poisonous snakes... and your torch just guttered out.

There's nothing boring about C. It's pure thrill and danger. If you're not experiencing that, you're probably writing terribly non-portable, brittle code, ready to unexpectedly invoke undefined or implementation defined behavior.


First, Go isn't a systems programming language, any more than Java is.

Second, C is _far_ from boring. It's boring now because we all live in its shadow, but the idea that we should have high-level languages for writing low-level software in? That's C, right there.

Lots of other languages that now seem boring were interesting to start (Java, Perl, ...). What's frustrating about Go, and the hype it gets, is that it isn't interesting in any of those ways, and yet so many people, including its designers, seem to think it's revolutionary.


Seems to be pretty much similar to Oberon, used to build a few OS.

https://news.ycombinator.com/item?id=6498878

As you should be aware, since you mentioned Oberon.


Only because it has Google as the Goodfather, otherwise no one would care about Go.


I don't think its nutty at all. Ideally the language shouldn't be tied to performance - the compiler and optimizer will do it. His method of optimization hints that the compiler is still immature.

From another angle, consider that in HPC FORTRAN codes will often get the best performance - and FORTRAN doesn't have real pointers.


C++ code is 5x faster after some standard optimizations so it's not really the same league. Faster than Python for sure but still unusable for anything CPU bound is the message I get from all of this.

This doesn't show anything new though, programming math/graphics is perfect use for C/C++ and you won't really benefit from anything Go has to offer and some of its features actually become annoyance for this kind of application. The biggest strength of of Go doesn't really really shine either as simple parallelism needed for ray tracer is matter of few lines of code in both C and C++.


C++ code is 5x faster after some standard optimizations

There is nothing standard about the optimizations -- direct AVX use is enormously uncommon, even among extremely high performance code.


I'll grant you that - AVX is pretty uncommon. I originally wrote it with SSE2 (there's a working version in the next to last commit on the github repo), but rewrote it using AVX because.. well, I hadn't used it before.

But I wouldn't say writing media and signal processing inner loops using SIMD intrinsics is uncommon. The style of optimization I illustrate is pretty common, perhaps minus the AVX code path. Most widely used video/image processing, ray tracing and other compute bound libraries will probably be SIMD optimized in some fashion (probably with different code paths for different processors). You gain 1-8x performance, which is pretty significant. It's on the same order of magnitude speedup as threading your program.

I have yet to see anyone truly and systematically trusting automatic vectorization, but perhaps there are libs out there I've missed. Anyone know of some?


Hi there, speaking of autovectorization, I actually tried it on this last night. After seeing kid0man's post, I decided to try optimizing it and selected that same loop as my target. (When I wrote the original C++ program, I was favoring conciseness and portability over performance, naturally.)

I made many of the same transformations as you did: switching the object's data to a structure of arrays, splitting out the computation of the normal from the loop, etc. (even an int hit = -1.) My goal was to coax Intel's compiler into autovectorizing that loop, without directly using vector intrinsics. I succeeded, but the result turned out to be noticeably slower than just compiling kid0man's with -fast. Part of that, I suspect is that it generated suboptimal code for the reduction over the minimum, where a human programmer would have used a movemask as you did.

That said, I'm fairly curious to experiment with seeing how it would perform with the kernel compiled via ispc [1].

Regarding direct use of SIMD intrinsics in inner loops, I have to agree with you that it's still reasonably common for this type of thing. I've certainly done it before in ray tracing contexts [2], and I've seen many others do it as well, e.g. [3] and [4]. Autovectorization and things like Intel's array notation extension [5] seem to be getting better all the time, but I don't think it's generally as performant yet as direct use of intrinsics. In the cases where it is, it usually seems to have taken a fair amount of coaxing and prodding.

[1] http://ispc.github.io/

[2] http://www.cs.utah.edu/~aek/research/triangle.pdf

[3] https://github.com/embree/embree

[4] http://visual-computing.intel-research.net/publications/pape...

[5] http://software.intel.com/en-us/blogs/2010/09/03/simd-parall...


As an intermediate step, before starting with the SSE intrinsics I rewrote the code in a form that should have been reasonably suitable for an autovectorizer (an inner loop over a fixed number of elements - I imagine it probably looked fairly similar to your code), but my gcc with -ftree-vectorize didn't do much with it. I didn't really explore that path further though.

I actually did a version which did the reduction over minimum purely using SIMD and then a post step which reduced the SIMD minimums to a single scalar. It was somewhat tricky to get the index right, and in the end it turned out to not be faster (at least not for the little example of 32 objects, I imagine you would gain something on a more complex scene)

Anyway, it was a fun little exercise and it has sparked some interesting discussion. Thanks for posting the original.


Funny, where in the ANSI C++ standard is the entry for AVX/SSE2?


But I wouldn't say writing media and signal processing inner loops using SIMD intrinsics is uncommon.

But at that point this has nothing to do with Go or C++, and I find this whole discussion rather disingenuous (at first I thought you were detailing the maturity of C(++) compilers and their superior support of auto-vectorization, which would be a reasonable angle): You can import the Intel math libraries and call them from Go (I know, as I do it regularly. See my submissions).


No, it doesn't. I tried to make that point too, but perhaps it didn't come across very clearly. For these kind of tight inner loops a language only ever gets in the way, the difference is really only how difficult it is to get rid of the conveniences you don't want. (The much vaunted zero-cost of features you don't use in C++ lingo I guess)

I still think that a systems programming language need to offer escape hatches, whilst striving towards ease of use in the common case. C++ has plenty of hatches, at the cost of horrific complexity.

But suppose I'm willing to pay the cost of writing my code in 5 different code paths for different processors for that extra 2-4x of performance. Very few languages offer that possibility, and most of those who do only offer to call a C library. I'm the guy stuck writing the Intel math libraries of the world, and I want something more reasonable to do it in.


The point that corresation is making is that none of the optimizations you did had anything to do with C++. You could have easily done them for the Go version, but you didn't. Then you put up a chart and said "this is why C++ is better." Huh?


Leave Britney alone!

I don't see why people feel that C++ needs to be replaced, when I write C++ I have many levels of scope - and while dangerous it is not impossible and the empowerment makes me feel like a god.

Programming is not incremental. If we spend all day writing a python back-end and when it doesn't give the performance numbers that day was a complete waste. When I think about C++ I know that a code written in C++ will take me 100% of the way - even if it takes longer to write.


"I don't see why people feel that C++ needs to be replaced"

Here are some of my reasons:

1. It is impossible to write high-level code without dealing with (and often getting bogged-down by) low-level issues in C++. Why should I be forced to choose between different "smart" pointer types? Why should I be forced to decide how variables should be captured by a lexical closure? Sure, such decisions might make sense when you want to squeeze out a constant-factor improvement in performance, but they do nothing to help you get things done in the first place.

2. Error handling and recovery is needlessly and pointlessly complicated. You can throw exceptions, except for the places where you cannot, and once caught you there is not much you can do to fix the problem. It is so bad that the C++ standard library actually requires certain errors to not be reported at all.

3. Extending the language is impractical. Look at what it took just to add a simple feature, lexical closures, to the language: modifications to the compiler. At best C++ gives you operator overloading, but you do not even have the ability to define new operators. Lisp, Scala, and numerous other high-level languages give programmers the ability to add new syntax and new features to the language without having to rewrite the compiler.

I am not familiar enough with Go to say that it addresses any of this, but I know why I stopped using C++ and why I have not regretted that decision. All the above make writing reliable code difficult. I actually switched away from C++ when I needed my code to scale better, because improving the scalability required a high-level approach and I did not have time to debug low-level problems. Even C++ gurus wind up having to deal with dangling pointers, buffer overflows, and other needless problems with their code -- that takes time and mental effort away from important things in most cases.

"When I think about C++ I know that a code written in C++ will take me 100% of the way - even if it takes longer to write."

The same is true of any programming language if the amount of time spent on the program is irrelevant. I am not sure what sort of work you do, but for what I have been working on, getting things done is considered higher-priority than squeezing out a constant factor improvement. Nobody complains about faster code, but everyone complains about late, buggy, and incomplete code.


C++ does not force you to use smart pointers.


-- Why should I be forced to choose between different "smart" pointer types

If you don't want to decide then write all your types with value semantics and pass by value. How types are going to behave when passed should be decided before you write 'class{}'. It's a semantic decision. For types that you're borrowing, and not writing yourself, pass a shared_ptr or refer to the documentation.

-- Why should I be forced to decide how variables should be captured by a lexical closure?

Same thing applies. Auto capture[=] everything by value. If you're type doesn't have any (sane) value semantics, use a shared_ptr or a reference.

-- You can throw exceptions, except for the places where you cannot, and once caught you there is not much you can do to fix the problem.

You can throw an exception anywhere safely in correct code. The default assumption in the language is "anything can throw, any time, anywhere", so if your code doesn't at least provide the basic or weak exception guarantee you're swimming against the tide. Doing so usually improves the encapsulation and structure of code imo anyway.

-- once caught you there is not much you can do to fix the problem.

Exceptions are more like hardware exceptions or page faults than typical error states. You should only throw when you cannot reach a state expected by the caller. Ultimately, it comes down to API design, not philosophy.

    // Clearly the only sane thing to do here if you 
       can't stat() the file is throw an exception.
    size_t get_file_size(string filename);

    // Some flexibility. Could probably avoid throwing. 
    optional<size_t> get_file_size(string filename);

    // Better still, and easy to overload with the above
    optional<size_t> get_file_size(string filename, error_code&);
... the better your API the better you can avoid having to throw. This isn't a new problem either, if you look at the C standard library there are many deprecated functions that provide no means of reporting an error at all except to return undefined results.

-- extending the language is impractical.

Writing STL-like generic algorithms is trivial. Writing new data structures is trivial. Existing operators can be overloaded to augment scalar types or, more ambitiously, re-purposed to create DSLs. You have user-defined literals. Initializer lists and uniform intialization.

How would you like to extend further without it being completely alien to the existing language?

-- I actually switched away from C++ when I needed my code to scale better, because improving the scalability required a high-level approach and I did not have time to debug low-level problems

You should write more about this.

-- C++ gurus wind up having to deal with dangling pointers, buffer overflows, and other needless problems with their code

Not really bad pointers and buffer overflows these days. More slogging through pages and pages of compiler errors and hunting for library solutions to problems that should really be solved in the standard library (For me lately: ranges, matrices, more complex containers).

In any case, all languages have their share of friction. Look at that new Bitcoin daemon written in Go that hit the front page a few hours ago. The author had to debug 3 separate Go garbage collector issues.


Correct me if I am wrong - but exceptions should not leave destructors. So you can throw them in destructors but you have to catch them there too - I think that is what the commenter you are replying to is getting at. So it has to be RAII because the destructor will not communicate up a hierarchy. This imposes a lot of overhead writing the destructor.

Further to the general theme of the thread - C++ compile times are bad, Go compile times are quite nice - I think this is significant when prototyping and testing units of code.


Combining different libraries with their respective memory management and error handling ideas is one of my biggest issues with C++. You have to keep so many things in mind to use every API according to its own peculiar rules. One slip of the mind and you're in big trouble.

Also, getting all those libraries to compile with a particular compiler/stdlib combination is a big hassle. Things break in non obvious ways because of weird implicit template instantiations that are basically untestable by the creator of the library.

These types of integration issues are never going to go away and therefore C++ will occupy a stable niche forever but will never become mainstream for application development again, regardless of any language improvements.


Mainstream again? What's the majority of code being written in nowadays?


Most new application development is done in Java and C# (and perhaps JavaScript).

C++ is used for a long list of specialized tasks like embedded systems, games, in-memory computing, database systems, high performance scientific stuff, some low latency trading algos and a few core UI things like browsers. I don't consider any of those mainstream application development.


"The same is true of any programming language if the amount of time spent on the program is irrelevant."

No it's not. That's what the original post was arguing (I think successfully).


It most certainly is true, if you can spend an unlimited amount of time coding. Python is too slow? You can isolate a subset of Python and write an interpreter that emits very efficient machine code for that subset, then write all your code with that subset. Sure it will take a long time, but so what? -- we started with the premise that the amount of time we spend writing our programs is irrelevant.


Yes, you can call a binary C/C++ function from Python.

The important point is that C++ does it for me, automagically.


So you can write a really efficient program in Python if you change the definition of what Python does (that's what you're doing by rewriting the interpreter)? I think you're moving the goalposts here.


Why would you need to change what Python does (i.e. the semantics of the language)? All I said was that you would find a subset of Python that is useful for your project, then put together a Python compiler that is very good at optimizing that subset. This is not at all unprecedented: Microsoft does this with the C compiler used for Excel and ITA software did this with Lisp when they modified CMUCL's code generation routines (CMUCL and the SBCL fork are particularly interesting here, as they expose their code generation modules to the programmer, reducing the effort needed to hack your compiler).

The point is that these things require a substantial time commitment, which is what I said in the first place. If you have unlimited time to work on your project, you can use whatever language you want -- your time is unlimited -- and still meet all functional and performance requirements.

Of course, time is rarely unlimited. Usually you need to meet a deadline, and that means that you need to prioritize, and your priorities will guide your language choice. In my experience, functional requirements are typically higher-priority than performance (with a reasonable margin). Getting the right answer slowly is usually better than getting the wrong answer quickly; getting all features to work is usually more important than producing a fast system that implements only half the features.


If a python back-end doesn't give the performance numbers, then you find the bottleneck and write that 1% of code in C++. Tada - you get the performance anyway, but gain in development speed (which often is more valuable than the cheap processors).


> Personally, I’m hoping for Rust.

That was my thought while reading article; Rust seems like the answer here. I'm coming from the opposite direction than the OP: I'm unwilling to give up the expressiveness of Ruby and friends in order to write micro-optimized C++ code, and I'm hoping Rust will give me the best of both worlds.


Yeah, everyone’s got their eye on Rust as far as low-level expressive programming goes. Nimrod[1] also looks quite good:

> Nimrod is a statically typed, imperative programming language that tries to give the programmer ultimate power without compromises on runtime efficiency.

In my spare time, I’m working on a statically typed concatenative language called Kitten[2] with similar goals.

[1]: http://nimrod-code.org/

[2]: http://github.com/evincarofautumn/kitten


Speaking as a C++ programmer, Rust appeals to me a great deal over Nimrod because you can turn off (or just not use) Rust's GC and still have memory-safe code.

I believe you can turn off Nimrods GC as well, but you lose any guarantee of memory safety when you do. Not to mention -- aren't all pointers in Nimrod reference counted? That's going to take a fairly significant performance toll due to cache effects alone.


Nimrod borrowed Modula-3's idea of having both traced and untraced pointers (using ref and ptr as keywords, respectively). As long as you only use ptr references, no overhead for reference counting (or other forms of garbage collection) is generated.

Of course, in order to use untraced references, you also have to use manual memory management (though you can use the macro/metaprogramming system to reduce the pain somewhat).

For traced references, Nimrod uses deferred reference counting; i.e. reference counts are only updated when they are stored on the heap or in a global variable (similar to write barriers for generational/incremental garbage collectors). If a reference count reaches zero, the reference is stored in a zero count table. At intervals, a separate pass checks if any references in the zero count table are still referenced from the stack (and does cycle detection where needed).

Deferred reference counting avoids the biggest problem with naive reference counting, which is that just assigning a pointer to a local variable (to inspect the contents of an object, say), can trigger up to two reference count updates. Conversely, with deferred reference counting, a read-only traversal of a data structure will not perform any reference count changes at all. Similarly, purely temporary allocations will not require any reference count changes, either.


> That's going to take a fairly significant performance toll due to cache effects alone.

It's not so bad because in Nimrod they are non-atomically reference counted and deferred reference counting is used to avoid touching the RC for stack references, at the cost of some minor latency for stack scanning. (Disclaimer: I work on Rust and my Nimrod info is possibly wrong and/or out of date.)


Even for non performance critical code, Rust is looking like it'll be a winner.


Does Kitten also try to not make (too many) compromises on runtime efficiency?


It’s very much a work in progress, but yes, as far as the language design goes. Right now there’s a horrifically inefficient AST-walking interpreter, essentially the simplest thing that can possibly work. That gives us integration tests, a standard library, and a REPL. We are working on the design and implementation of a native runtime as free time allows.

The language looks and feels fairly high level, like a hybrid imperative-functional language. But in reality, it’s closer to (stack machine) assembly. There is a relatively straightforward mapping to x86 or ARM, so it’s easy to look at some code and mentally compile it at -O0 the way you might in C. Each Kitten instruction usually corresponds to one or two x86 instructions.

Evaluation is strict, because excessive laziness can negatively impact predictability and performance, and you can get many of the benefits of laziness in different ways. All types have immutable value semantics; sharing is essentially an optimisation, and can be implemented with a tracing GC or a reference-counting one because the language is cycle-free. And function definitions are often extremely short, so they benefit from aggressive inlining.

The general feeling is that we want absolutely as lightweight a runtime as possible—no type tagging, value boxing, runtime dispatch, reflection, garbage collection, &c. unless the programmer asks for them. At the same time, we want to avoid infecting code with low-level details such as pointers, lifetimes, and ownership. Correctness first, followed shortly by performance.

Kitten is attacking the same problems as Rust and Nimrod, but from a very different direction. And of course we are only a small handful of people working in our spare time, and I write most of the code. It would be wonderful to have some more people on board.


It feels like a very modern language overall, I particularly like the memory management ideas. It's refreshing to see something which tries to find a middle ground between error prone manual handling and unpredictable GC. Now someone just needs to write a good Rust IDE.


There was actually some work done over the summer with... DWARF? So that Eclipse can use its debugger on Rust code. I'm on mobile or I'd find you a link.



> Rust has a great approach to safe regions. That's a hard problem, and Rust has had to expend a considerable amount of firepower on it (four kinds of pointers etc). D does not offer safe regions; I believe the language design precludes that without at least an amount of discipline. So Rust is better than D at safe regions. However, like in chess, good language design is to not sacrifice the whole for the beauty of the part. I think D is better than Rust at a lot of other things, because it has firepower it can afford to expend at problems that are also hard, and just as important.

http://www.reddit.com/r/IAmA/comments/1nl9at/i_am_a_member_o...


Rust seems like the answer here.

How so? To me the callout to Rust diminished the entire article because it made it almost anti-Go for no particular reason (ala "Go is poopy anyways, but maybe [some other unproven option] will be the savior]". This same sort of nonsense occurred with the statement "on my not-very-fast-at-all Core i3-2100T" regarding the C++ performance, which is just narrative nonsense given that the same circa-2012, AVX-equipped processor was what yielded his Go numbers).

The author demonstrated C code that was 2x faster than Go code run through the standard Gc chain, when the C code is run through a very mature, hyper-optimized C compiler, which is actually really impressively good for Go. They then try to pound home the point by inlining AVX which is absolutely ridiculous for such a language comparison. It borders on pure troll, and I'm really surprised that so many people are falling for this.


Both Rust and C/C++ sit squarely in the "no GC" or "optional GC" camp, which has very important implications for certain applications which strongly favor that compared to a generalized system-wide GC.


> How so? To me the callout to Rust diminished the entire article because it made it almost anti-Go for no particular reason

It might seem that way if Go and Rust were competitors, but given that they seem to have pretty different philosophies and goals, that doesn't seem to be the case.


"On comparing languages, gcc vs 6g"

Yet again the fallacy that comparing implementations equals to comparing languages.

On the left corner the 6g compiler toolchain, which the authors admit that has yet to suffer lots of improvements in the optimizer.

On the right corner, the battle tested gcc optimizer, with circa 30 years of investment, aided by language extensions not part of any ANSI/ISO standard, that are not language specific.

Of course "C++" wins.

People, just spend time learning about compiler design instead of posting such benchmarks. And before someone accuses me of Go fanboy, I think my complaints about Go's design are well known by some here.


I think all there's to get from these posts is that:

Presumption: Writing Go code is more fun than C++ code.

Demonstration: You can write performance Go code that's not too far from C++ code.

Result: Cool, here's a more fun than C++ language I can use as a step down the complexity path when I need performance.

Or like tptacek said.


Fun is really very subjective.


One correction:

Presumption: Writing correct Go code is more fun that C++


C++ cant be so easily replaced by Rust, even though rust seems to be a better language. The more possible case is that Rust's great ideas get ported into C++NewX standard. Even if Rust is to be our new systems labguage, we will have to wait for 12-15 years for it to actually happen.


I don't think ray-tracer is a good example for Go.

Ray-tracer workload can be fully split into each hardware core. Only input data needs to be shared, and even the input mostly doesn't make any race condition. The algorithm even can run on GPU.

This is handicap for Go. Go wants to solve - safe and easy concurrency without race condition for complex logic. So (IMO,) Go needs to make some overhead (or sacrifice some performance feature) for its goal. But in ray-tracer example, this ability mostly not required.


Ofcourse, if you want your site to be up when it has 25 upvotes on hacker news, it's best not to worry if your web application is written in C++, Go or in Ruby (or sadly but likely, PHP), but that it doesn't actually have to spawn hundreds of processes and hundreds of connections and allocate hundreds of megs of ram to accomodate your visitors.

So, please just use nginx to host some static html files for your blog, and fetch your discussion boards asynchronously..


Yep. Octopress really helps with that


So, please just use nginx to host some static html files for your blog

Wordpress + W3 Total Cache can handle a hundred HNs with ease. There is absolutely no reason to go to the past, and poorly configured blogs don't justify that Luddite argument.


If you want to "poke at the processor" with Go, its toolchain makes it pretty easy to use assembly in your package.


Go uses an assembly syntax completely different from everyone else. That's mighty annoying.


Where "everyone else" is at least two different syntaxes in the first place, Intel and AT&T.


Ha? Doesn't it work through the 'import "C"' and then it's just regular C?


I meant something like this, i.e. not going through C at all: http://golang.org/src/pkg/syscall/asm_freebsd_amd64.s


Oh. You mean the Plan 9 assembler... It's pretty good looking to me so I haven't really considered that an issue. I guess you're on the other side of the AT&T vs. Intel assembly fence... But just how much assembly are we writing this days anyhow? I say, just wrap it all up in Go (or C through cgo) and be done with it.


It looks like AT&T assembly without the %s. Not really that weird.


You can use C files and GCC syntax in your Go project. Go will use Cgo and compile it with GCC.


C++ does it automatically, I think this is a call for Google to spend some more time optimizing floating point workloads.


The article doesn't mention reasoning for using -O2 instead of -O3 or -Ofast. It seems that they give better performance in this case, as does adding -funroll-all-loops(when combined with -Ofast gives almost 10 % speedup). I am compiling with GCC 4.8.1, so perhaps authors 4.6.3 (what?) behaves differently.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: