More

alexpopescu · on July 3, 2013

While both CouchDB and RethinkDB store JSON, the differences between them are more radical. I cannot post an as-extensive comparison as the one with MongoDB, but here are some aspects.

Please keep in mind that this is not an authoritative comparison and it may contain mistakes. Plus as for many such systems, the aspects covered are in reality not that easy to be described in just a few words.

Platforms:

- RethinkDB: Linux, OS X

- CouchDB: where Erlang VM is supported

Data model: - both JSON

Data access:

- RethinkDB: Unified chainable dynamic query language

- CouchDB: key-value, incremental map/reduce

Javascript integration:

- RethinkDB: V8 engine; JS expressions are allowed pretty much anywhere in the RQL

- CouchDB: Spindermonkey (?); incremental map/reduce, views are JS-based

Access languages:

- RethinkDB: Protocol Buffers

- CouchDB: HTTP

Indexing:

- RethinkDB: Multiple types of indexes (primary key, compound, secondary, arbitrarily computed)

- CouchDB: incremental indexes based on view functions

Sharding:

- RethinkDB: Guided range-based sharding (supervised/guided/advised/trained)

- CouchDB: -

Replication:

- RethinkDB - sync and async replication

- CouchDB - bi-directional replication can be set between multiple CouchDB servers

Multi-datancenter:

- RethinkDB - Multiple DC support with per-datacenter replication and write acknowledgements

- CouchDB - (?)

MapReduce:

- RethinkDB: Multiple MapReduce functions executing ReQL or Javascript operations

- CouchDB: views are map/reduce but they need to be pre-defined

Consistency model:

- RethinkDB: Immediate/strong consistency with support for out of date reads

- CouchDB: http://guide.couchdb.org/draft/consistency.html

Atomicity:

- both document level

Durability:

- both durable

Storage engine:

- RethinkDB: Log-structured B-tree serialization with incremental, fully concurrent garbage compactor

- CouchDB: B-tree

Query distribution engine:

- RethinkDB: Transparent routing, distributed and parallelized

- CouchDB: none

Caching engine:

- RethinkDB: Custom per-table configurable B-tree aware caching

- CouchDB: none (?)

apendleton · on July 3, 2013

Are there any plans for Couch-style incrementally-computed aggregates/views in RethinkDB?

alexpopescu · on July 3, 2013

Considering RethinkDB's secondary indexes can be defined around pretty complex ReQL expressions [1] you could already get some of it already.

[1] http://rethinkdb.com/docs/pragmatic-faq/#how-do-i-take-advan...

apendleton · on July 3, 2013

Yeah, I thought about that, but it seems like you can only use them to get the "map" part of map/reduce... no aggregation. Unless I'm missing something.

alexpopescu · on July 3, 2013

> Yeah, I thought about that, but it seems like you can only use them to get the "map" part of map/reduce... no aggregation. Unless I'm missing something.

I think that's correct.

coffeemug · on July 3, 2013

Yes. RethinkDB is really well set up to do this due to the underlying parallelized map/reduce infrastructure. This feature is a matter of scheduling priorities. I don't have an ETA yet, but it will almost certainly get done in the medium-term.

apendleton · on July 3, 2013

Awesome. If there's a Github issue about it I'd be interested to follow it.

coffeemug · on July 3, 2013

I just added one -- see https://github.com/rethinkdb/rethinkdb/issues/1118.

alexpopescu · on June 29, 2013

Maybe these other systems are more fluid, but their are also a lot weaker in what their platform gives them. What I mean by this is that the Java world never needed the extra rvm, virtualenv, etc.

The complexity of dependency management doesn't come from declaring dependencies, but from managing conflicts over time and being able to create isolated environment.

Even if many call it the "jar nightmare", I don't know other systems that offer better isolation.

doktrin · on June 29, 2013

I think you've misunderstood the point I was making, which makes sense in context of the video in the OP. I'm not arguing NPM > Maven. I'm arguing NPM is conceptually simpler and easier for a newcomer to use than Maven.

Namely, the speaker made reference to "weaker" DB technologies being used due simply to a frictionless barrier to entry. Mongo was cited as one example. The use of MySQL over Postgres was another. This is basically taken verbatim from the presentation in the video.

This isn't about which systems are "better" or "worse", but which are simpler to get up and running with. Mongo is super duper simple to install and configure. So is NPM. Managing complexity is a real and valid concern, but also not what we're talking about here.

alexpopescu · on June 30, 2013

What I wanted to suggest is that: at the tool level npm, gem, pip are simpler to get started. But I don't think that's the case at the env level where you'll need more tools.

So while I've used some wrong words in my comment, I was still (trying at least) referring to simplicity vs complexity.

PS: I've never been very fond of other complexities introduced by the maven approach, but that's a different (and probably longer) story.

alexpopescu · on June 29, 2013

Lately if I see a post starting with "Unless you have been living under a rock", I stop reading.

stack0v3erfl0w · on June 29, 2013

ronaldx · on June 29, 2013

I dislike it particularly because of the implication of stupidity: "I believe only stupid people won't know this, but I'm going to tell you anyway".

In general it's poor form to assume readers know anything - consider (for example) a future lay-reader who is trying to find out this information.

khairul · on June 29, 2013

Because it's a tired, overabused cliche.

agilebyte · on June 29, 2013

But we are not grading an English essay are we? The content of the article was quite good for a "normal" non-tech person.

herghost · on June 29, 2013

I don't think the content of the article was particularly good.

The advice on how to protect yourself from being spied on is essentially:

"stop using the internet; or use HTTPS - which is probably compromised anyway; or encrypt - which is probably compromised anyway; or use a VPN - which I will say next to nothing about"

It's just a rehash of points and opinions already better expressed elsewhere.

lttlrck · on June 29, 2013

We don't wan to read an English essay either.

alexpopescu · on May 21, 2013

I'm always impressed how such obvious tools (e.g. memcached, junit) can change the face of software development.

Do you have similar other obvious ones in mind? (please no rails :-)

garraeth · on May 21, 2013

Maybe not so much a tool, but the AJAX paradigm and XMLHttpRequest is, imo one of the bigger developments in the Internet era.

And, Thank you very much for making memcached, and happy birthday! I've used this as a core part of my stack for years and love it.

edit: clarity

fosap · on May 21, 2013

Git, or in general distributed VC.

Not obvious to implement. But the concept was obvious.

jlgreco · on May 21, 2013

Even if DVC lacked the D, the concept of versions being a DAG was an incredibly important. A very big improvement to how classical CVS systems have developers think about history.

nostrademons · on May 21, 2013

Wikis. Who woulda thunk that a website that anybody can edit would actually work, let alone become an important component of the web and most corporate intranets.

HTTP. The protocol is not that complicated, but it's become ubiquitous, and a cornerstone of the web.

Social news. Nowadays everybody takes links you can vote on as a "duh" feature and wonder what the big deal behind Reddit is, but when it came out in 2005, it was very much a "Why has nobody thought of this before? Oh, right, because nobody will ever use it" invention.

papsosouid · on May 21, 2013

>Nowadays everybody takes links you can vote on as a "duh" feature and wonder what the big deal behind Reddit is, but when it came out in 2005, it was very much a "Why has nobody thought of this before?

It had been thought of before, and done before. The late 90s had tons of sites like that. When reddit came out in 2005, it was very much a "what is so special about slashdot clone #437?".

nostrademons · on May 21, 2013

Didn't slashdot have you vote on the comments and not the stories, and the stories were still picked by editors? I think that was what was new about Reddit and Digg...they were just user-submitted stories and nothing else.

papsosouid · on May 21, 2013

Yes, but everyone had told them over and over again to change that, they just wanted to maintain control. It certainly wasn't a new invention of diggs, I just used slashdot as the example of reddit being a clone because slashdot was the most well known. kuro5hin was(is?) exactly digg/reddit for example. Social bookmarking sites were around before slashdot too.

jalfresi · on May 22, 2013

Apache Httpd

pramodliv1 · on May 22, 2013

redis?

alexpopescu · on May 6, 2013

Niiiice!

alexpopescu · on May 6, 2013

For everyone thinking this is a text-to-speech plugin for vim (boring): it is not. It's a voice controlled vim (quite cool).

alexpopescu · on May 6, 2013

I use a combination of all these 3 tricks as I have my dotfiles shared between Mac and Linux.

Basically my pythonstartup.py contains the extra check for os:

    AP_AUTOCOMPLETE=False
    import sys
    try:
        import readline
    except ImportError:
        print "Module readline unavailable"
    else:
      import rlcompleter
      readline.parse_and_bind("tab: complete")
      if sys.platform == 'darwin':
            readline.parse_and_bind("bind ^I rl_complete")
      AP_AUTOCOMPLETE=True

alexpopescu · on May 6, 2013

It was only yesterday that I thought about using animated GIFs for demoing code typing. That's what I liked most about this post!

landr0id · on May 6, 2013

I disliked how I had to watch the gifs a couple of times though in order to read exactly what he typed for the first snippet to see how it related to what it filled. Text underneath would help, but it does indeed look nice.

drewbarontini · on May 6, 2013

Good point. I just updated the post to fix that.

alexpopescu · on April 29, 2013

URIs are pretty much immutable. My impression is that what the OP suggests is a guaranteed lifetime of the content associated with the URI.

As for this second part, "once it's published it should always remain out there", I'm not very sure it's a good idea. In many cases I'd actually like to be able to say that a piece of content has expired (the content is not relevant anymore).

alexpopescu · on April 7, 2013

I've recently seen this slidedeck [1] from guys on Twitter's data team where they say that most of the time the data mining process is basically:

1. Your boss says something vague

2. You think very hard on how to move the needle

3. Where’s the data?

4. What’s in this dataset?

5. What’s all the f#$#$ crap in the data?

6. Clean the data

7. Run some off-the-shelf data mining algorithm

8. ...

9. Productionize, act on the insight

10. Rinse, repeat

[1] http://www.slideshare.net/Hadoop_Summit/scaling-big-data-min...