I'd love to see how Go 1.1 compares against modern JITted scripting languages, because I think those will be a major part of Go's current and future competition.
Hopefully the benchmarks game will one day include those JIT variants. V8 is there, but sadly for example PyPy, HHVM (Facebook's PHP TJIT) and LuaJIT are not.
I think we're in dire need of a new benchmarks game. The author:
removed PyPy
removed the clear overview table that showed all languages, keeping only the barely visible gray bar charts with vertical text (why not horizontal bar charts, they can be scrolled and read without head tilting)
refuses to add LuaJIT and other languages
responds to most change inquiries with "publish your own version" which probably means that he doesn't really care about the project anymore
As a result, the spirit of the benchmarks game was lost a long time ago as its made less fun with each iteration.
Not only that, but "the author" has for many years provided the source code for the scripts to make the measurements -- so anyone who actually wants to can easily start a new benchmarks game.
Its far from "easily". Its not like you could just fork the code and run the benchmarks game - you would not only have to setup the environment for all languages, but also understand the not-so-well documented idiosyncrasies and deficiencies of the implementation (hardcoded paths? seriously?)
It doesn't seem like the code was originally written to be understood and modified by others and its not fully documented. As such your claim that its simple to fork it isn't really valid. (However I'm willing to admit the possibility that the configuration complexity is not easily avoidable)
I agree that its far from easy to write a benchmarks game, setup a test environment and invest oh-so-many CPU cycles into running the tests periodically. But I wouldn't go around telling people that forking the original is easy either - that would be very misleading.
Much will have to be done to improve the current engine to make it easier to fork. I'm seriously considering doing that (or alternatively making a new benchmarks game from scratch with the intent of making it easy to configure and fork)
> Its not like you could just fork the code and run the benchmarks game...
Some people download the scripts and make measurements straight-away, as always YMMV.
1) REQUIREMENTS
Willingness to read the README
Willingness to write ini file name=value properties
Willingness to (sometimes) write make commands
Install Python 2.5+ (these are Python scripts)
> 12.2) Choose a file extension to identify programs
> and measurements made with the new language
> implementation, for example - python3.
Okay, picking .luajit
> 12.3) In the [commandlines] section of the ini file, define a command line that will be used to run program source code files that have the new file extension you chose.
> For example, for file extension python3
[commandlines]
python3 = $PYTHON3 %X %A
No comment on what %X or %A may mean - they could mean anything, really. Of course %X is the source (or built) file and %A are the arguments. Easy enough.
luajit = $LUAJIT %X %A
> 12.4.1) EITHER alias existing source code files that have a different file extension with the new file extension, in the [alias] section of the ini file.
> For example, re-use all source code files with file extension python but make measurements identified with file extension python3
[alias]
python = python3
So let me get this straight, to define the "python3" alias, I need to add the "python" key to the alias section and set it to "pyton3". Definitely not intuitive, I would rewrite this to be reversed.
So uhh, I have to configure the parameters for every benchmark. Why don't you share your configuration? Oh, I see, there are various paths defined there appropriate for your environment.
Wait what? Why don't you separate the test configuration from the environment configuration?
First conclusion:
The bencher needs more work to make it easier to fork:
* improve readme by actually mentioning which ini file users should look at, telling them about the tmp dir, explaining the value syntax for the commandlines section etc. (important)
* invert entries in the [alias] section, because the way its set up now it makes no sense. (not that important)
* separate test and environment configuration in different files and share the test configuration files used on the website to help people reproduce the same results. (very important)
* fix the script to use the introspection bindings for gtop
Second conclusion:
The claim that its "easy" to set up has been nullified.
>>Without setting up the language environments? I doubt it.<<
You seem to have run nbody.python without setting up the language environment for it.
I'm going to ignore all the rest of your editorializing and try to find something of substance.
Let's just note that you've jumped from #3.1 to #12.1 -- ignoring #3.2 which checks for problems and the subsequent sections that work through those problems and explain some of what you later find so puzzling.
>>Where is that INI file? The README doesn't say. Oh its my.linux.ini (its not mentioned anywhere in the document).<<
You seem to have found the ini file.
>>No comment on what %X or %A may mean<<
Actual comments on what %X or %A may mean:
; %X %T %B %I %A in commandlines are replaced like this:
;
; nbody.python-4.python %X = nbody.python-4.python
; nbody.python-4.python %T = nbody
; nbody.python-4.python %B = nbody.python-4
; nbody.python-4.python %I = 4
;
; %A = [testrange] value or 0 when the program takes input from stdin
>>What, only python? Oh, I need to copy the programs from the "bench" dir. Where in the readme does it say that? Nowhere.<<
You don't seem to have read sections #3 through #11.
>>Okay, did that, re run the script, and it runs all nbody and regexdna benchmarks, and none of the others. Why?<<
You don't seem to have read sections #3 through #11.
>>Oh, it uses an old way of importing GTop<<
That actually is worth updating the readme about!
>>of course, I had to remove everything in tmp/* before re-running the test, otherwise it thinks there is "nothing to be done".<<
No you didn't. You don't seem to have read sections #3 through #11.
>>No summary dir here.<<
That actually was a bug! Empty directories weren't included in the snapshot zip.
>>improve readme by...<<
You didn't seem to read sections #3 through #11 (very important)
No, you didn't seem to read my suggestions at all.
>>What, only python? Oh, I need to copy the programs from the "bench" dir. Where in the readme does it say that? Nowhere.<<
> You don't seem to have read sections #3 through #11.
If you grep the file, there is no mention of the bench dir anywhere in sections 3-11. Please do tell where you found it. The readme does say how to add new programs, but it doesn't say where the original programs of the shootout are in the distribution.
>> of course, I had to remove everything in tmp/* before re-running the test, otherwise it thinks there is "nothing to be done".<<
> No you didn't. You don't seem to have read sections #3 through #11.
No, you didn't read what I was doing. I had to remove everything in tmp/* because I was trying to enable gtop by editing the source code of the program. The old results were made without gtop and only contained cpu and elapsed time.
Finally the "actual comments" on what variables are available for the commandline strings aren't written anywhere. Only "%A" is explained in 9.3 and "%I" isn't even mentioned.
So it looks like all my suggestions remain valid:
From the first suggetion, removing the "tmp" dir assuming that the summary dir bug is fixed in the future:
1) improve readme by mentioning the location of my.linux.ini and my.win32.ini and adding the complete syntax for the commandlines section etc. (important)
2) invert entries in the [alias] section, because the way its set up now it makes no sense. (not important)
3) separate test and environment configuration in different files and share the test configuration files used on the website to help people reproduce the same results. (very important)
4) fix the script to use the introspection bindings for gtop
Note that you failed to comment on (3) which will by far improve the bencher the most. Infact, that is exactly where I stopped trying to reproduce your results - I stopped at the point where I had to extract all the values for the testrange from your CSV files, as the ones used on the website aren't distributed anywhere.
But of course, you're free to continue ignoring useful comments from other people. Which brings me to the original point, we're in dire need of a new benchmarks game.
> You seem to have found the benchmarks game "original programs" in the project tarball.
> People measure their own programs with bencher, not the benchmarks game programs -- it's not dependent on the benchmarks game programs.
It is not, but if you sincerely want to make it easy for people to reproduce all or parts of the benchmark, you should at least document where they can find all the settings of the original.
>>I had to remove everything in tmp/* because...<<
#5.3 #5.4
Fair enough, I missed those two points in the README. A weaker point still remains though: its not very user friendly or obvious, like say a --force flag (or something in the spirit of make's --always-make).
These things may not seem important, but I think they are when distributing software with the intent of someone else running it.
>>testrange from your CSV files, as the ones used on the website aren't distributed anywhere<<
Ah in nanobench. Well that was definitely not easy or obvious, was it? My fault for not using find on the directory.
And how about that suggestion of splitting the configuration file to two separate files: program configuration and environment configuration? It would make it easier to run the benchmarks on the same set of programs with the same parameters, except in a differently configured environment. (the set of language implementations and their build and run commands would be a part of the environment configuration)
> You're comments are useful because they show what someone might become confused about.
That was exactly my point - to prove that your "easily" claim is a stretch. The README does not cover everything, especially not the parts needed for someone to bench the programs found on the website. Which is understandable since its a README for the bencher program.
Perhaps a separate README for the entire archive (documenting where the original configurations and programs are) will fix most of these confusion issues. More descriptive directory names would also help (e.g. "game-programs" and "game-configurations")
And if I seem so negative, its because I used to love the benchmarks game and felt that every change you've made lately significantly subtracted from the game's fun factor of it while not significantly adding anything.
For example, that recent removal of the (not very useful, but still extremely fun!) combined language comparison table. Why remove things? Its sufficient to simply warn the user. If they ignore the warning and take those numbers seriously, that is their own fault. Why should they be ruining the fun for everyone else?
>>to prove that your "easily" claim is a stretch<<
> Yes, you were trying to prove that -- you were finding fault instead of finding how to make things work.
> Other people found how to make things work, because that's what they were trying to do.
Not really. The first time I downloaded the zip (about a year ago) I sincerely tried to make it work and did read the readme. Spent about 1 hour then gave up. This time I simply retraced those exact same steps with the intent of showing the current deficiencies.
>>Its sufficient to simply warn the user.<<
> Was it sufficient to list
> Willingness to read the README
> as a requirement? Apparently not.
Yes it is. Its my fault for not reading the README completely. (but to be fair, all directories of the distribution were not quite covered)
You don't have to remove the complete zip archive then add a quiz to the website testing if the user read the README before allowing them to download the complete archive.
If that doesn't work either, what then? Will you completely remove the download link to the archive, ruining it for those willing to read the README? The way I see it, you've done your part, the rest is up to the user.
This is where I strongly disagree with your approach - and this is why I want to make a new benchmarks game.
Do you mean the project tarball or do you mean the bencher zip?
> I sincerely tried to make it work
Did you ask for help?
> This time I simply retraced those exact same steps
This time you've told everyone such-and-such isn't in the README when it is; such-and-such aren't written anywhere when they are; such-and-such aren't distributed anywhere when they are.
None of that stuff changed in the last year.
>>and this is why I want to make a new benchmarks game<<
So get the tarball tomorrow, read the README, find out what doesn't work for you and fix-it (if you're missing python-gtop install it), make measurements and publish them. Easy.
> This time you've told everyone such-and-such isn't in the README when it is
I clearly demonstrated all the deficiencies of the current distribution, especially in regards to running the same programs (with the same arguments) as those on the website.
your explanation of the commandline variables was missing from the readme
reference to the actual location of the ini file (my.linux.ini). I expect a section like this:
The bencher is configured with an INI file.
There are two example ini files included with the
distribution: my.linux.ini and my.win32.ini,
located in the "makefiles" directory.
this can't be found anywhere in the distribution:
The ini files from the game website can be found
in the nanobench/makefiles directory:
<listing of the files>
Then I will agree that the documentation is complete, and that the process to run your own benchmarks is almost "easy".
If you separate the configuration files, organize the directories more appropriately and write a more condensed README that skips the condescending act towards the user, then I will agree that running your own benchmarks is easy.
> Did you ask for help?
I wouldn't have had to ask for help if the documentation was complete and adequate.
I will however check out the tarball tomorrow.
Oh and what about the suggestion to separate the environment configuration from the programs configuration? No comment, I guess...
Yeah, but the request isn't to include some random language, it's to use the fast implementations of any given language. Why wouldn't you, in a site that benchmarks speed?
I think not to do so is unhelpful, actually -- if you don't realize they're are JITs for Python and Lua that aren't being included in the benchmarks, you will come away with a completely wrong impression about the possible performance of those languages.
Because there was a big spitting match a couple of years ago and it seems the maintainer picked its favorite languages and features and chose to represent (what seems like on purpose) others in a bad light.
For a couple of years I've wanted to "cull the herd" but
my curiosity (and interest in promoting experimental
language implementations) stopped me doing so.
The most that Alex Gaynor's nonsense did was prompt me
once more to consider whether the time was ripe.
Apr 2011
"These are not the only compilers and interpreters.
These are not the only programs that could be written.
These are not the only tasks that could be solved.
These are just 10 tiny examples."
Let's say I make a website called the Prettiest Person Game, and I throw up some professional crazy-awesome shots of myself and my buddies, and I also put up all igouy's old driver's license photos, and then create a whole bunch of different ways we can compare these photos and see who's prettier.
And let's say this site becomes really popular. And people start posting articles like, "wow, dilap is like 10x prettier than igouy in ever single way!"
And then igouy's friends go, "whoaaaa, hold on a minute, those are terrible photo's of ioguy! he can really look much better than that!"
And then I'm like, "Hey, I ain't got time for this -- I told you already these are not the only photos in the world. Go take your own photos if you want a different comparison!"
tl;dr
Measurement is highly specific -- the time taken for
this benchmark task, by this program, with this
programming language implementation, with these
options, on this computer, with these workloads.
I fondly remember The Great Language Shootout by Doug Bagley; I could understand the majority of the problems, there were a lot of languages represented. Good times.
That was there. PyPy was there at least. LuaJIT too I think.
All go yanked out. The benchmarks are bogus anyway unless what you plan on working on n-body or fasta search or a short other list of things they run.
It would be nice to compare it to the old version. At any rate, it seems like it's a lot more competitive with Java in performance and it uses a lot less memory:
Go vs. v8 is a reasonable comparison. Node.js is probably the closest substitute for the way Go is most often used today (as API servers or backend app components for web applications).
I have no comment as to whether this "benchmarks game" stuff tells us anything meaningful.
Russ Cox, in response to a regex benchmark vs. Python/Ruby.
"You assume the benchmark is worth something.
First of all, Ruby and Python are using C implementations
of the regexp search, so Go is being beat by C, not by Ruby.
Second, Go is using a different algorithm for regexp matching
than the C implementations in those other languages.
The algorithm Go uses guarantees to complete in time that is
linear in the length of the input. The algorithm that Ruby/Python/etc
are using can take time exponential in the length of the input,
although on trivial cases it typically runs quite fast."[1]
As a member of the masses of programmers that typically deal with truckloads of trivial problems, his reply is worth as much as the benchmark he criticizes. I've never used anything but a trivial regexp (for some values of trivial) in production code (one-shot problems don't count) and I don't give a damn what is being benchmarked here - I only care if my Python script runs fast.
Actually, I might agree with Go here. I agree that most of my regexes are the trivial kind that don't exhibit exponential growth, but that's only because I also control the input to all those regexes. But Go's intended niche appears to be for user-facing server-side programs, and in that environment if you ever run a naive exponential-time regex against user-supplied input you're setting yourself up for a potential DOS attack.
But Go's intended niche appears to be for user-facing server-side programs, and in that environment if you ever run a naive exponential-time regex against user-supplied input you're setting yourself up for a potential DOS attack.
Yes, but for larger recognizers, one would probably use something like Ragel anyway. A DSL that provides intersection, union, difference, concatenation, composition, etc. makes it far easier to construct elaborate automata from smaller expression than the 'compile a big regex string' approach.
Why not use the potentially exponential algorithm initially and timeout and fall back to the linear case if the sometimes-quick one has taken longer than the linear one would have? At most a 2x slowdown for complex cases, and substantial speedup for the simple ones.
It depends, if you have the opportunity to build a DFA once, why not do it? It's only a one-time cost.
I think the more serious problem is that what most people consider to be a 'regular expression' does not express a regular language, and hence cannot be expressed as a finite state automaton.
I think his answer is useful. It does explain what's going on. If the performance characteristics of PCRE suit your problem better it's trivial to write a cgo wrapper for it. There's even one on github, though I don't know if it works.
I'd be interested in seeing compile times. In a talk posted recently Rob Pike seemed to be kind of emphasizing #include inefficiencies and bytes read by compiler to bytes of source input ratios for C/C++, and how was better at this.
Isn't comparing against a scripting language apples and oranges? Compiled vs interpreted one would assume to always look like this. Although the modern sophisticated JITs have definitely narrowed the playing field.
Hopefully the benchmarks game will one day include those JIT variants. V8 is there, but sadly for example PyPy, HHVM (Facebook's PHP TJIT) and LuaJIT are not.