"Based on distributed Solr" might be an over simplification. It uses Solr as its indexing engine, but really, that engine could be any single-node indexer... including single ES nodes. Basically, Yokozuna adds real grownup distributed systems computer science to the OSS distributed search space. http://docs.basho.com/riak/2.0.0beta1/dev/advanced/search/
I think the biggest pluses for Erlang as a language are that fault-tolerance (Erlang "let it crash" + OTP supervision trees) and distribution of tasks (process interact via message passing, be it across procs or nodes) are built-in language features. They aren't libraries or afterthoughts. This subtle shift manifests itself in all Erlang code. There's rarely a fear that some esoteric library you want to use has a memory leak because it's keeping state in some static hashtable, or that semaphores are used incorrectly causing a race condition.
Quick question: I don't know much about Erlang compared to other functional languages, but how is Erlang superior to another language which claims to have some of these traits, such as Go?
There was a really good post earlier this year about how Erlang does scheduling (http://jlouisramblings.blogspot.fr/2013/01/how-erlang-does-s...). I think one of the important takeaways is this: "[Erlang] values low latency over raw throughput, which is not common in programming language runtimes". It's worth a quick read.
Ultimately I find that Erlang's strength is not one thing or another, it's a combination of all of its traits: from the functional nature, to OTP for providing a consistent application structure, to its VM; all of these things put together make it interesting and useful.
I've yet to get into Haskell (I will eventually) but I really do find that constructing applications in a functional manner, with state being stored in independent processes within the running application, is a superior way to build systems.
One final note: I really like Go as well, for very specific reasons. Go's resource usage (both in terms of CPU consumption and RAM) is soooo nice, especially coming from Ruby. I was never a C programmer but I can wrap my head around Go and use it to build low resource usage apps with specific purposes, and sometimes this is exactly what is required.
It's been a few years since I've done any Erlang work, but there's a couple of places of the top of my head where Erlang take fault tolerance to a higher level as compared to Go.
While Go's goroutines give you similar sort of concurrency, they don't work in the distributed sense in the same way as Erlang. Go let's me fiddle around with the goroutines on my computer, but that's it. With Erlang, I can natively talk to the routines on any of the computers in my cluster. Just prepend a function call with an IP address and I can call a function on a completely different computer. While you could emulate this with XMLRPC or JSON, it's not backed into the language and it's not treated as a first class function the way it is in Erlang.
Also, Erlang's basic libraries were built with hot code loading in mind. You shouldn't have to stop and restart your software just to fix a bug. Now, I've seen discussion on how it's possible to do this with Go, but Erlang keeps the idea so pervasive that almost every beginner's guide include how to do it. As a side note, being a dynamic language has some advantages with hot code loading, though this can be a matter of taste.
The Netchan project I think showed how that particular limitation -- distributed goroutines -- will be overcome in time and, i think, built in to the language runtime.
This is just idle speculation but it seemed to me, reading the mailing list ages ago when this was discussed, that netchan worked in many use cases, but not others, and it just was not the right time in the development of the language to be distracted by a tertiary feature like that. Now, I could be mis-reading the whole thing and maybe it will never be a language feature. But if I were a betting man...
Adding on to what other's have said, I'll mention that while Go's concurrency features match Erlang's concurrency primitives pretty closely (`go routine()` compared to `spawn(fun routine/0)`, Go channels compared to `Pids ! {message}` and `receive`), almost nobody who does work in Erlang uses those primitives. Most concurrent/distributed systems in Erlang are built on OTP, the set of libraries built on those primitives, which has it's own set of higher level tools and abstractions (supervisors, gen_servers, FSMs) that provide a lot of the failsafes and profiling built-in. Basic Actor concurrency is simple in principle (but like most things in distributed computing) extremely hairy in practice, and OTP's libraries and templates take care of almost all of it (a reasonable analogy might be naked C++ vs. C++ with the standard libraries and Boost -- you CAN implement your own raw pointer-mungling data structures, but almost nobody does).
The wikipedia article has some confusion about whether Erlang uses the actor model for concurrency. If you have some spare time, it would be nice to clean it up.
* Fault tolerance comes first. Built in from ground up.
* High concurrency.
Yap that is it. Ok, it has many other features but they amazingly fall from those one. Here are the secondary features and how they relate to the primary ones.
* Fault isolation. If a system is to be fault tolerant, faults must be isolated. If one process crashes it shouldn't bring down the whole system.
* Easily parallelizable. Because of isolated processes and the desire for high concurrency. It was easy to build in a sane scheduling algorithm that can spread the load across multiple CPU cores.
* Functional. Functional programming discourages handling large states. This a large Java class with 50 instance variable that could be modified by a 20 different methods. Functional programming encourages passing the state along explicitly.
* Built in distribution. It is hard to make a fault tolerant system (ok impossible in practice, to be more precise) without redundant hardware. Servers will fail but your service must not. You must have more than one server read to take over. Distributing your application across multiple physical machines is built it. You send a message to a local Erlang process like this:
Pid ! Msg.
Here is how you send a message to a process running in another data center. Maybe half way across the world.
Pid ! Msg.
That is pretty nice.
* The system is responsive. A non-response system can be considered a failed system in some domain. Think about a mail server. If the user click on a message and it take 5s to return a response and open it, maybe the person would consider the system as broken. This also comes out of concurrency and fault isolation. As the load increases instead of throwing errors everywhere the system gracefully absorbs the load while still staying responsive.
There are at least 2 major differences relevant to message processing:
1) Erlang runs in a VM which provides preemptive scheduling and builtin message flow control (sending a message costs proportional to the receiver mbox size)
2) Unlike other VM languages, Erlang VM provides per-process heap, which helps to eliminate VM pauses. None of JVM based solutions can provide low latency processing unless you manage the heap by hand.
As people said before, Erlang combines several traits in a nice package, which is hard to beat.
There are other differences in the language as well. Bing a functional language with TCO it allows you to implement a lot of algorithms which will be cumbersome in imperative languages. Powerful pattern matching capabilities allow one to structure the program in a clean way, and pattern matching on binaries makes most binary protocol parsers one-liners.
I've barely touched either language myself, but my understanding is that you can do almost anything you could do in C in Go, whereas Erlang is much more restricted in how it lets you communicate between processes, so there's more of a guarantee of fault-tolerance in Erlang. I could be completely wrong, though, so please don't take this at face value.
In a sense, yes. Luwak was just basically plain-old Riak with some support for huge files. CS is a full solution for storing large assets, along with multitenancy, role management, reporting, monitoring and pretty much anything else you'd expect from a cloud store like S3.
I hear that. Riak docs used to famously suck-out-loud. It wasn't that the information was bad, it's just that no one had the time to devote to making them clean, consistent, or easy to navigate (ie. Info Arch).
Now we have some of the best NoSQL docs[1] around (clearly there's still more to do). A nice side-effect of cleaner navigation is that PRs to our repo[2] have increased dramatically.