> Seven-table joins. Ugh. What? That's what relationship databases are *for*. An...

bowlofpetunias · on Nov 11, 2013

People hate joins because at some point they get in the way of scaling, and getting past that is a huge pain.

Or at least, that's where the original join-hate comes from.

In reality of course, most of us don't have that problem, never had and never will, and it's just being parroted as an excuse for not bothering to understand RDMS's.

Relational database design is a highly undervalued skill outside the enterprise IT world. Many of the best programmers I've worked with couldn't design a proper database if their lives depended on it.

rosser · on Nov 11, 2013

People hate joins because at some point they get in the way of scaling...

No, in fact, they don't.

Poor relational modeling gets in the way of scaling, and that can be geometrically exacerbated by JOINs. A JOIN, in and of itself, is neither good nor bad. It's just a tool, and like all tools, how you use it is what makes it "good" or "bad" — just like you can build a house or bash in a skull with a hammer.

parasubvert · on Nov 12, 2013

In most relational database implementations, joins stop scaling after 10-50 million rows or so assuming an online transactional site.

A time series data warehouse could go into the billions of rows with scalable joins with partitioning and bitmap indices ... but is also only applicable in the unlikely case you could afford oracle at $60-90k/CPU list price

Also, most databases that aren't Oracle don't have high performance materialized views to "preprocess" joins at upsert time, therefore people resort to demoralized tables and their own custom approach to materializing those views.

Then even denormalized tables begin to stop scaling at around 250 million to 500 million rows. So people resort to sharding managed in a custom way.

I haven't even begun to express the scalability impacts of millions of users on a LRU buffer cache used in most RDBMS - that usually is resolved through an in-memory cache (Memcached, Redis) whose coherency is also managed in a custom manner. Or you could spend $$$ for Coherence, Gigaspaces, Gemfire, etc. but that's also unlikely in most web companies.

At the end of all this, even if you bought a cache, you wonder why you're using an RDBMS at all since you're so constrained in your administrative approaches. Cue NoSQL.

of course in practice many devs ignore all of this history and "design by resume" assuming their new social-mobile-chat-photo-conbobulator will be at Facebook scale tomorrow.

olavgg · on Nov 12, 2013

This is not true at all. I've worked on several databases with billions of rows in several tables. A good solution for improving your query performance is to use a multi column index http://www.postgresql.org/docs/9.3/static/indexes-multicolum...

parasubvert · on Nov 12, 2013

What part isn't true?

I'll restate my narrative: Single instance, normalized, unpartitioned databases run into scaling problems the several-hundred million row range especially when under heavy concurrent load.

But once you start moving to multi-instance, partitioned databases, you start to lose the benefits of the relational model as most databases have to restrict how you accomplish things -- e.g. joins are severely restricted.

darkmoth · on Nov 12, 2013

Oracle will handle anything you throw at it, assuming you have the $$$. Ebay uses it for 2 Petabytes of data:

http://www.dba-oracle.com/oracle_news/news_ebay_massive_orac...

parasubvert · on Nov 14, 2013

That's a link discussing an analytic database from SEVEN YEARS ago. eBay has moved on.

Please understand what I am saying:

- Traditional database architectures have limitations on what you can express in SQL for highly available and scalable online transaction processing once you introduce partitioning and clustering.

- Oracle has probably the best support for partitioning and clustering out of all RDBMS, but even that has limits in the billions of rows

- Many companies do not use Oracle for business reasons (licensing/sales/pricing practices)

What I am not saying:

- Oracle sucks (it's the most feature complete and robust RDBMS out there and is );

- Oracle is not used (Amazon, Yahoo, eBay, etc. all use Oracle in various contexts);

- Oracle does not scale (it does, though it requires you, the SQL developer, to intimately know the database physical design at a certain point of scale, which defeats much of why SQL exists to begin with)

batbomb · on Nov 12, 2013

I routinely deal with joins on a 100 million row table and they work just fine. Other than that, I also use a 10 billion row table for searches. This is in Oracle.

k1w1 · on Nov 12, 2013

I've also used databases with more than a 100 million rows in a single table and received realtime query performance in multi-table joins. And this is using Sqlite! No expensive DB licenses - but it was using a high-end SAN since we actually ran thousands of these multi-million row databases in parallel on the same server.

frederickf · on Nov 12, 2013

"demoralized tables" :)

tracker1 · on Nov 11, 2013

For me, it was a database schema with 38 joins (and 2 additional queries) to effectively get the data to display a single page. For that use case, mirroring the data on save to MongoDB was a no-brainer... with geospacial queries out of the box, and a few other indexing features it made a lot of sense.

I wouldn't even think to use MongoDB for certain use cases... but for others, it's a great fit. I think that Cassandra, Riak, Couch, Redis and RethinkDB all have serious advantages and disadvantages to eachother and SQL.

I do find that MongoDB is a very natural fit for node.js development, but am not shy about using the right tool for a job.

Another thing that tends to irk me, is when people use SQL in place of an MQ server.

davidp · on Nov 12, 2013

> For that use case, mirroring the data on save to MongoDB was a no-brainer

I think you just confirmed the OP's point -- MongoDB makes a good cache, not a good primary store. I'm guessing you didn't do updates into that MongoDB store, and always treated the SQL source as "authoritative" when it became necessary. Am I right?

tracker1 · on Nov 13, 2013

I no longer work at the company in question, but the plan was to displace SQL for the records that were being used in MongoDB, for mongo to become the authority. NOTE: this was for a classified ads site for cars. Financial and account transactions and data would remain in the dbms, but vehicle/listing records would have become mongo authoratative.

The transition was difficult because of the sheer number of systems that imported/updated listing records in the database... there wasn't yet a 100% certainty that all records were tagged on update properly, so that they could be re-exported to mongo... each day, all records were tagged ... took about 24 minutes to replicated the active listings (about 50K records), and we're not talking "Big Data" here, but performance was much better doing search/display queries against MongoDB.

drsintoma · on Nov 11, 2013

> In reality of course, most of us don't have that problem, never had and never will

Maybe you never had any problems, but I don't believe "most of us" can say the same. At least me, I'd encountered problems derived from join-abuse in almost every job I've had.

rodgerd · on Nov 12, 2013

That's funny, because I've mostly encountered problems with people who prefer to nest SQL queries inside a sucession of loops in their code, rather than learn how to use SQL properly.

emn13 · on Nov 11, 2013

Yeah, me too. But having said that, I've also seen problems with mongodb and they're much, much, much, much harder to solve.

schrodinger · on Nov 11, 2013

7 isn't necessarily nothing. Each join is O(log(n)), so I believe you're stuck with O(log(n)^7) as a worst case, although in practice it will probably not be so bad since one of the joins will probably limit the result set significantly.

The other problem is that with 7 joins, that's 7! permutations of possible orders in which the database can perform the join. That's a lot of combinations, and often you can run into the optimizer picking a poor plan. Sometimes it picks a good plan initially, and then as your data set changes it can choose a different, suboptimal plan. This leads to unpredictable performance.

I think that in practice, you're best off sticking with only a few joins...

jmelloy · on Nov 11, 2013

If you're regularly doing 7 joins it's a good sign of an over normalized databased.

jacques_chester · on Nov 12, 2013

Nonsense. It very much depends on the problem domain.

_pferreir_ · on Nov 11, 2013

> A lot of people have to write JavaScript, but aren't good at JavaScript, [...] they blame it for being a crappy language, when it's really just their own lack of investment.

I think it's pretty much an accepted fact that JS has its problems. Even Brendan Eich has been quoted as admitting it.

(Note: I am a JS developer myself)

rictic · on Nov 11, 2013

This is true, but the "wtf js is such a fucked up language" meme is outsized compared to the actual problems of javascript. Having worked full time in python for a couple years I could easily show you just as many weird python semantics that will inevitably bite you[1]. I think the grandparent's point has merit, that people expect to invest in their primary language for a project, but when circumstances dictate that they need to use a bit of javascript they find it annoying.

1] What does this program do?

    print object() > object()

ptx · on Nov 12, 2013

  >>> print(object() > object())
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unorderable types: object() > object()

Yet another reason to upgrade to Python 3! :)

raverbashing · on Nov 11, 2013

About your code snippet, it prints a "random" boolean value

You're creating two objects (with random addresses, which affect the __str__ method result, which in turn result in a string comparison that returns False or True)

aduitsis · on Nov 11, 2013

That's nothing compared to the "Perl is Satan!!1" meme that us Perl programmers usually have to put up with ;)

beat · on Nov 11, 2013

Actually, Perl is many different Satans, depending on the particular stylistic quirks of the programmer in question.

clem · on Nov 15, 2013

TMTOWTDI: There's more than one way to damn it!

berntb · on Nov 12, 2013

Not so much with the usual coding standards (see "Best Practices" book etc).

merfakos · on Nov 12, 2013

IMHO, it's well deserved. Start by stop having those $, @, % identifiers for variables and then we 'll talk again about how many more daemons you need to impale.

draegtun · on Nov 13, 2013

Sigils are what make Perl standout, in syntax and behaviour, to (most) other languages so it would be silly to get rid of them!

i.e. it's a differentiator to what is or isn't Perl.

jiaweihli · on Nov 12, 2013

That one is somewhat reasonable (and relatively obscure code you'd probably never write).

A more realistic example: inner classes can't see class variables from their enclosing classes. (Why enclose classes? - builder pattern)

_pferreir_ · on Nov 11, 2013

Every language has its quirks. Python is not perfect either. But JS shows clear signs of bad design decisions, such as the behavior of the == operator.

woah · on Nov 11, 2013

What == does is pretty simple and easy to understand. If you have a hard time with it, use ===. Problem solved.

georgemcbay · on Nov 12, 2013

The problem is that JavaScript does not exist in isolation, there are other languages that use this operator. If you're familiar with any other C-derived language, the way == acts in JavaScript is very unexpected.

Yes, you can learn to deal with it, but that doesn't mean it wasn't a bad design choice. If both forms of equality are required to be operators, == and === should have been swapped. Too late to do it now, woulda, coulda, shoulda, but it certainly is, IMO, a "bad design" smell in the language, and hardly the only one that still bites people.

Another example: the way 'this' scoping works is similarly busted in that while the rules for it are reasonably straightforward in isolation, it is different enough compared to other languages that share the same basic keywords and syntax that it should have been called something else.

To be fair, I don't think much of this has to do with Brendan Eich being a bad PL designer as much as it has to do with the odd history of JavaScript that is still represented in the name of the language ("Take this client language you made which has no connection to Java, and make it look kinda like Java, please!").

achy · on Nov 11, 2013

I agree with you to a point. Joins are your friend. But trying to pull out all of the information about a graph of 'objects' using a single query with multiple one-to-many and many-to-many joins is just as foolish in SQL as in Mongo.

chao- · on Nov 11, 2013

Do you have any resources you would recommend to understand or at least give an overview of indexes?

I learned basic SQL once-upon-a-time and understand the relational algebra side of things, but only truly picked up the finer details and specific engines in piecemeal manner, as needed in various projects.

Negitivefrags · on Nov 11, 2013

http://use-the-index-luke.com/

This is a really good resource for understanding how the queries you do relate to the actual actions that the database engine takes.

daigoba66 · on Nov 11, 2013

http://use-the-index-luke.com/ is a great online book (free) targeted at programmers and developers. It's practically required reading in my opinion.

jmulho · on Nov 11, 2013

http://docs.oracle.com/cd/E11882_01/server.112/e40540/indexi...

Keyframe · on Nov 11, 2013

Star, constellation, snowflake, flat.. Developers (not the author) would benefit from database introductory course even if they are not using databases. I think Stanford did one that was open to everyone.

mkoryak · on Nov 11, 2013

This article ends up agreeing with you at the end, by the way.

pekk · on Nov 11, 2013

When program errors pass silently, that is a legitimate problem in the toolchain.