I know Twitter's just done the dirty with Cassandra, but it seems Cassandra is getting a massive PR boost this week, despite being one of the older (where old is 2008!) NoSQL systems.
Can anyone explain/posit an idea as to why MongoDB and CouchDB have been stealing the thunder for the past year rather than Cassandra? It just seems odd.
Its actually very easy to setup. Install and run ./bin/cassandra -f nothing like the troubles with HBase.
Administration is pretty hands off for most tasks. For instance you want three copies of each piece of data? Set the replication factor to 3. Other parts are difficult to "get" if you don't have an understanding of the code, my best advice is to make sure you approach Cassandra from a developer perspective. You can't treat it like a black box quite yet.
Not sure where you are getting your information from, but Cassandra is one of the easiest setups I've used. It couldn't be simpler. I do not have experience with MongoDB though.
Since Cassandra uses Apache Thrift as the default RPC mechanism, exposing the Thrift layer to any non-controlled data can be dangerous. We use firewalls on our nodes to make sure our Thrift ports are only exposed to a very small set of machines, because even just telneting into the port and typing "hello" can cause the JVM to OOM.
--
I use Redis and heavily guard its telnetable port, but it doesn't OOM. This issue should have been fixed before public release, imo. You wouldn't want something as simple and common as a port scan to shutdown your data layer.
Can someone explain why Protocol Buffers (http://code.google.com/apis/protocolbuffers/) is not being used more widely?? It it 'cause of the limited language support (Java/C++/Python only) or some other reason??
Protocol Buffers has limited language support, the implementation in one popular language (Python) blows goats, and even though it predates Thrift it was not released until after Thrift arrived so its community is smaller. The number of people who need this sort of product is relatively limited, so the early traction that Thrift has gained will take a bit of time or some compelling reasons to overcome.
I thought Cassandra used only keys for selects - but in this post I see you can also use slices of from..to values. Are there any other predicates that one can use? Like ones implementing 'LIKE string%' or 'LIKE %string%'?
It doesn't look like that from the API wiki, but maybe someone knows if that's possible, or planned.
The post doesn't mention the environment in which you're running Cassandra. Any chance you're running it in the cloud (EC2?), or are you running it on real h/w?
We are running in a virtualized environment, we've done a pretty good job at adding capacity for performance and space reasons ahead of the demand. Loadbalancing is pretty hands off, its just something that takes some time and fundamental understanding, as there are still ways to shoot yourself in the foot with this system.
Can anyone explain/posit an idea as to why MongoDB and CouchDB have been stealing the thunder for the past year rather than Cassandra? It just seems odd.