Hacker News new | past | comments | ask | show | jobs | submit login
The State of Statistics in Julia (johnmyleswhite.com)
69 points by wallflower on Dec 2, 2012 | hide | past | favorite | 12 comments



I tried julia last week looking for an R replacement. I was pretty excited but the web REPL is broken on the current OSX build. Try as I might I could not get the web repl to run when building it myself.

It still seems quite immature, but I'm very excited about it's future.


We're phasing out the web repl since it's not actively developed anymore. At some point I'm going to take a crack at providing a Julia backend to iPython Notebook [1].

[1] http://ipython.org/ipython-doc/dev/interactive/htmlnotebook....


That will actually be an interesting test of the iPython architecture. Hopefully the team got it right with client/server, zmq and websockets.


That's unfortunate as I was trying to get an r-studio like experience.


No reason that can't happen too, but the current web repl is more of a proof-of-concept.


I tried Julia recently and am very excited about it as well, especially with the addition of DataFrames. The only thing that worries me is the disorganized nature of the Base and Core packages. I fear that 5 years from now Julia will have a mess of functions in the global namespace like PHP, rather than a clean standard library like Python. Is this fear misplaced?


The namespace mechanism was only added a few months ago and we're still working out some of the kinks with how it works. The plan going forward is to slice Base into an organized hierarchy but still have everything available by default without too many imports. The "using" keyword works rather similarly to the same keyword in C++ and now in Ruby 2.0 [1], and aims to give the best of both the "single huge namespace" (C, PHP, Matlab) and many small namespaces approaches (Python, Java).

[1] http://blog.headius.com/2012/11/refining-ruby.html


Make sure you get namespaces right, right now.

This is one of the biggest problems in Matlab, and 20+ years later, they still can't fix them, and the namespace problem is still huge.


Nothing more will be added to Core (though a few things might be removed), and very little more will be added to Base. Most new stuff in the future will be in separate packages and modules.


This might be slightly off-topic, but are DataFrames intended to be similar to R's data.frame, where the contents of the entire table are loaded into memory and operated on all at once, or more like SQL with the possibility of aggregate functions calculated incrementally? Since a lot of statistics can be calculated incrementally (OLS, obviously, but even something like MLE based on nonlinear optimization, if you allow for multiple passes through the dataset) R's approach really bugs me... even though providing the right tools for an aggregate function approach would be (I imagine) quite a bit more difficult.


DataFrames are in-memory. It sounds like you're describing what we're calling DataStreams, which are still a work in progress, but do already exist. And we're also building SGD descent for doing things like OLS incrementally.


It's rather amazing how far this has come in a very short time. Excellent post, John!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: