Hacker News new | past | comments | ask | show | jobs | submit login
ORM is an anti-pattern (seldo.com)
286 points by jeromegn on June 15, 2011 | hide | past | favorite | 191 comments



Personally, in developing quite a lot of different data-backed apps, I've never really found the problem ORMs are solving to be a hugely significant one; it seems like a 'quick fix' for coders who don't really understand SQL anyway, which always felt to me to be attacking the problem in the wrong place. SQL isn't that hard....

In any case, while I don't dispute that it might offer speed of startup advantages for some developers, it seems no-one is so far disputing that it simply doesn't scale and, if your project really takes off, it will be creating problems. Call me a fogey if you will but I don't like the idea of launching a project that I know will need very substantial rearchitecting too early in its life.


It's hard to believe this isn't just trolling.

Writing a large app directly using SQL is not "hard," but A) it's exceptionally boring, B)it necessitates mental context shifting from your app language to SQL, and C) it requires you to think about nuts and bolts routine below-your-paygrade solved concerns like SQL injection, transaction management, type casting, etc.

Two more points:

I've never seen a large app that either didn't use a popular ORM or didn't end up REINVENTING large chunks of functionality ORMs provide. Good developers write DRY code, and SQL everywhere leads to, eventually, some library that looks remarkably like... an ORM...

Finally, in the case of large web applications (since that's HN's bread and butter), an ORM'd app will not need a "very substantial rearchitecting" to get rid of the ORM. That statement is patently absurd. Early in its life, for scaling (always, IF it becomes necessary) what it will need is dropping down to SQL to tune some aspects of some pages. Architect for doomsday if you like (or, conversely, if you need extra hours to bill) -- or architect to ship.


A) it's exceptionally boring

Perhaps as a result of it being so exceptionally boring that it's so error-prone. If you forget to wire something up by hand (especially if its nullable) and don't notice while you're coding, it can be one of those subtle little bugs that lingers in the background until it really bites you.

Also, you forgot:

D) Difficult and time-consuming to create test automation for.


Yes! At least LINQ-to-SQL gives compile time type checking.


Linq 2 SQL has ambiguous relationship path decision function when deciding how to map the Linq expression tree model to the SQL DBMS's model.

Compile-time type checking is very illusory with heterogeneous execution, not only for this reason but also because it is a distributed application and the schema can change.


And I've been bitten by the truth of your statement. It sure beats the pants off writing SQL by hand.


Is LINQ even in favor anymore. I think we're all about the entity framework now which appears to be more or less ORM with an auto update component for references to the orm objects.


No, but you still use LINQ style lambda expressions to do more advanced queries involving the EF objects. It's actually a really, really nice blend. If you're doing .NET dev though, take a look at NHibernate. It's been around longer, it's been stable much longer, it has more features... and it actually has a community which is probably EF's weakest point IMO.


I'm old and set in my ways at the ripe-age of 29 and like my sql processed as prepared statements that reside in my data objects. I also curse everytime I see relationships between data that do not boil down to int (size of system's cpu architecture) equivalency checks.


Good developers write DRY code, and SQL everywhere leads to, eventually, some library that looks remarkably like... an ORM...

I agree with this.

I think you can look at this two ways, over time you incur technical debt for a variety of reasons (if your app is being used or growing in any way). The articles author thinks it is best to start at the bottom and build up your "model layer" from scratch.

People who use an ORM start from above and almost certainly will have to modify their entire software stack to address things like scaling and unusual features along the way. As long as you know your ORM well and can do the rearcitecting this really is not a problem.

Basically whichever approach you take a good developer would end up with the app in roughly the same state.

Personally I would prefer to start with an ORM and work down, just because it is a hell of lot quicker.


There are libraries out there that already reinvent the large chunks of functionality ORMs provide without requiring you to use an actual ORM. I think the middle ground is the best way to go here. Avoid ORMs, avoid native SQL, meet somewhere in the middle.


Like what?


Perl has Fey. http://search.cpan.org/perldoc?Fey

As ORMs go Fey::ORM (built on top of Fey) is good. It's easy to get under the hood and make it do exactly what you want. http://search.cpan.org/perldoc?Fey::ORM::Manual::Intro


You've never seen a large app that didn't use ORM? What about every large business application from the late 70s-early 2000s?

A) I'm not sure why its execeptionally boring. Solving problems with sql is interesting and elegant in it's own way (like using recursion or functional programming to create an new algorithm).

Lots of people posting who don't know so much about the specifics of the database the are using. Why spend thousands on Oracle or SqlServer then abstract all those features out?

Phil


I'm definitely an old-school code-sql-by-hand guy and I used an ORM (Entity Framework 4.0 Code-First) for the first time for my startup project, a data-intensive online strategy game.

I've found that for many things, it's so much faster in terms of dev time, especially with the super-cool code-first approach, to get things out there and working using the ORM. I can create a new fully-functioning and reliably-working repository class, along with its test double, in around a minute. Really.

Of course, I have found that I've had to replace bits of it with hand-coded SQL for performance reasons. But I've decided to stick this general approach for now because I don't need to substantially rearchitect everything, I can just replace the very few bits that have proven to be an actual performance bottleneck, and keep the development speed for the many places where runtime speed just doesn't matter as much.


I don't mean to start an argument over which ORM you want to go with, but you might want to give this a look, if you're interested in being able to quickly iterate on a data model and still have test coverage: http://wiki.fluentnhibernate.org/Persistence_specification_t....


Cool. Thanks for sharing.


No problem. Do you have any info about your startup? I have been working on a similar idea as a back-burner thing for years but I'm in the process of doing a different startup, so I think it'll stay there.


http://victorsunited.com

It's a turn-based, (largely) asynchronous, strategic conquest game that uses the same ruleset as a classic board game that you probably played when you were a kid.

The idea is to create a social game that you're not embarrassed to play. You know the trend of people getting together in bars to play Settlers of Catan for hours on Friday nights? We're trying to make a game that same people would want to play for a few minutes at a time from their office on Monday morning.

I've got a half-written article about how we're doing some cool continuous deployment stuff that I plan on sharing with HN soon. In the meantime, any and all feedback is welcome.


I was interested in your game, I played Galaxy PBEM some time ago, but you have few flaws: 1. You assume that everybody played board game, which I haven't, so you don't have an easy introduction for a new player. 2. You have single player game, but you set up a high barrier to try it, I don't understand why you need logging in when a cookie will be enough.


Thanks for the feedback.

1. We do have a pretty straightforward tutorial. Http://victorsunited.com/help/tutorial

2. We are working on a way to bypass the login/registration requirement for single player games. To be honest, the Singleplayer mode was something of an afterthought, and the SP experience isn't yet as polished as it should be.


Looking forward to the continuous deployment article.


The problem with writing all your SQL "by hand" is that it's needless code duplication.

Set aside databases for a moment. Let's say that instead we're going to generate a bunch of different types of documents, all in the same output format (PostScript or LaTeX or HTML or whatever). Which approach would you take to this problem?

A) For each type of document, manually write out the necessary code from scratch, starting over again each time, or

B) Write some general-purpose libraries which understand the output format and expose an API that you can call programmatically to generate the documents?

I don't think anyone would choose (A) in this case. But many people do choose, and pride themselves on choosing it, as soon as we change from any other use case to generating database queries. Why do we suddenly have to throw away everything we know about writing libraries and avoiding duplication, just to pretend that typing out some SQL makes us "real programmers"?

ORM, in essence, is nothing more than this: a convenient API you can call to do database queries. Why should it be so different?


shocking vote up


It's not that SQL is hard, it's that it becomes very redundant very quickly.

That being said, I use ORMs all the time, but have never really been a massive fan. There's always going to be a little bit of mismatch because object != relational.

I find myself attracted to document databases simply because it feels like there's less of a disconnect between objects and documents.

I do remember an ORM called iBATIS that wasn't too bad though. It actually made my classes look cleaner and didn't treat SQL like a red headed stepchild.

These days, working with ARel doesn't make me feel too dirty.


I have worked with ORMs and with raw SQL. In my personal experience, using raw SQL exclusively becomes time consuming when you need to make the jump between the raw sql results and objects.

With a well designed ORM, it should be possible to have the best of both worlds. One of the features that I value in Django's builtin ORM is the ability to almost seamlessly drop down to raw SQL when needed:

https://docs.djangoproject.com/en/dev/topics/db/sql/


"using raw SQL exclusively becomes time consuming when you need to make the jump between the raw sql results and objects"

I don't find that to be true. You write the SQL, you've already written your Objects, you write a 2-3 line row -> object(s) mapper (supported by your own utils, or something like Spring). You have clear control and mapping between SQL and Object.

This said, I like what I've seen and used of the Django ORM so far, but then my application has not been anywhere close to as complex as the "day job" RDBMS.


If you have entity with 10 fields and 5 nested collections (which may also have more nested objects) and you need load full object for edit (and save it after), it will be more than '2-3 line'


I don't know how I could manage deployments as effectively as I do today without Migrations.

And i don't know how I could ever write DB functions more clever than, say, the ones DataMapper gives me.

And I don't know how anybody could prefer

"Select * from TableOne Inner Join TableTwo using(SomeCol) Inner Join TableThree using(OtherCol) where TableOne.foo = ? and TableOne.bar = ?"

to, say, ORMs at their best:

result = TableOne.findByFooAndBar("foo", "bar");


> result = TableOne.findByFooAndBar("foo", "bar");

This doesn't give me the same information as the SQL. This, for example, doesn't tell me it's doing an Inner Join on two tables. Being that I know the query you are replicating here, I can assume either it's automatically joining all tables that are related to the table I'm calling. In which case, how do I get a result while only joining specific tables (or none), or the more likely case, it's not doing any join, and waiting until I actually call the object, whereby it performs the query on the table.

Simply put, the SQL query tells me exactly what is being called. The ORM does not, and hides away what it's doing. Granted, this is your example, but it's also easy to find this same style in ORMs in the wild.

I'm on the bench about ORMs myself. I use one, and enjoy it's use. However, it's also annoying because I'm comfortable with SQL and can whip up a query faster than I can wrestle the ORM to do what I want.


It's about convention over configuration.

Learn a few conventions (not hard) and there you go. In some ORM's, the magic findByX functions will do no joins -- in others it will do all joins. And learning which is quite easy.

And almost universarlly, the findByFooAndBar type magic methods are not the most commonly used feature because often you need complex conditionals (let alone complex joins). But when you do need just a quick record, it's wonderfully easy.

You can get cozy with an ORM in an afternoon. After that, you've got nothing ahead of you but timesaves. Sure, sometimes the ORM abstraction will break or you'll find yourself fighting it. So at that point you use the ORMs connection manager to execute some raw SQL and be on your way.

Your argument, to me sounds as wrong as if you said "I don't like MVC frameworks. Without a frameowork, I know that when I go to www.example.com/path/to/index.php I know exactly what is being called.

But in an MVC, www.example.com/path/to could be anything! The routing engine and framework hides away what it's doing.

But I use it because it's not really as opaque as you portray, and it saves me SO. MUCH. TIME.

And once you learn convention, the ORM makes it EASIER to grok any codebase that uses it than it would if they used straight SQL.

Anyway, as always, YMMV and I respect your fence-riding. I've used some unfortunate ORM's before (CakePHP I'm looking at you! Adopt Doctrine!) where instead of writing a simple query I'm stuck crafting some insane nested dictionary structure.


Do you recommend any particular good (free preferably) ORMs to practice with? Never used one before in my day to day job :(


Really I'd find the best one for whatever language you're strongest with.

My absolute favorite, bar none, is Ruby's DataMapper.

In PHP, the best ORM project is certainly Doctrine. Doctrine can be a little overwhelming at first, but it's worth it. However if you found yourself just not pursuing it because of its complexity, I've found that the Kohana framework has a quaint, easy to use ORM. Not a bad start.

In Python, SQLAlchemy is great.

In .Net obviously LINQ.


> In .Net obviously LINQ.

LINQ is not an ORM. There is LINQ to Objects and LINQ to XML, which query collections and XML documents respectively.

The first ORM over Linq from Microsoft was Linq to SQL. This is now deprecated, and the second one is called Entity Framework (EF), which was a more ambitious but slower-moving project. In the latest release, EF 4.1, it's actually quite good.

NHibernate is still very popular, in part because over the confusion over Linq to SQL and EF, in part becuase it was there first and is very mature.

There are a whole lot of other, less-well-known .Net ORMS - SubSonic, SimpleData ... just google ".net orm". But for new work, most people would only consider EF and NHibernate.

There's even a Linq to NHibernate:http://ayende.com/blog/4083/nhibernate-linq-1-0-released

That and fluent NHibernate take some of the pain out of configuring NHibernate.


.NET also has NHibernate. I'm not intimately familiar with the pros/cons of LINQ to SQL vs NHibernate, but I believe it gives you a lot more control over things like pre-loading associations, etc (while requiring a lot more configuration).


One con of LINQ to SQL is that it only supports MS's DBs. LINQ to Entities is the more flexible version. Though you have to go whole hog and use Entity Framework. There is also a active record implementation for .NET called castle active record.


> One con of LINQ to SQL is that it only supports MS's DBs.

This is not true for Entity Framework (which is essentially the newest iteration of Linq2Sql.) I have successfully used EF with SQLite in production, and it appears to have a MySQL driver as well (I honestly don't know how well the MySQL driver works.)


That is what I meant by LINQ to Entities, which is what I thought they were calling the new version. There are plenty of DB drivers(I think they call them providers) for EF.


I've been using datamapper recently - I like it, but would be interested to hear what makes you like it so much over the others (which I haven't used, except for a smattering of ActiveRecord).


I'm a Doctrine user as well, and I've been using it for years now. But I still stand by what I said. Mostly it's a result of finding that regardless of the ORM being used, knowing SQL is still required for anything serious (or, not knowing SQL can hurt).


I was forced to port an existing in-house web app from Postgres to Oracle. Thanks to ActiveRecord I just made some tweaks to the db config file and didn't have to touch a line of code. Of course, it's not always this easy but a good ORM can buy you a lot of portability.


My conclusion, drawn from the title alone: The term antipattern has apparently jumped the shark.

Spend five minutes decoding a particularly hairy regular expression? Regexps are an antipattern. Someone writes an inefficient SQL query? SQL is an antipattern. Stub your toe on a curb? Curbs are an antipattern.


This is my favorite response. One of the reasons I came to HN from Reddit was that the proportion of populist/echochamber articles and comments to well thought out and insightful articles and comments changed unfavorably imo. It makes me sad to see the trend continue on HN, perhaps it is inevitable.

My frustration with this article is that while it makes some valid points that could well be articulated as things to watch for, it instead takes that to the extreme and generalizes a single person's biased experience to the notion of a more universal truth. Unfortunately the world is much more complicated than these type of articles like to admit.

Yes SQL is well known, and solves some sets of these problems, but it also introduces other problems that can overwhelm depending on the environment and the scale. As an example, at my last job I worked at a company that worked in banking and mortgages. They tangle of banking and mortgage laws is different state by state, and in some cases county by county. At this particular company an unusually large percentage of the people know how to write SQL. So vast amounts of business logic were written into SQL. This was stored variously in stored procedures, DTS steps, SSIS Steps, Excel VBA Functions, Access Functions, etc, etc, ad nauseum. May not sound like much until you realize that we counted 50 thousand excel documents on the network stores, 10 thousand access databases, 10 thousand stored procedures, etc, etc. This was for an IT team of about 20 people. Does this mean SQL is an anti-pattern? No, it just means that tools taken to the extreme and used well beyond their design tend to stop functioning well.

ORMS certainly have their problems and can definitely slow things down. However almost all of those arguments could be and were used to argue why assembly was better than C, why C was better than Java, why Java was better than Ruby...almost seems like a pattern. ORMS can save you on maintainable code footprint, and can reduce complexity if they are used correctly. Of course that doesn't mean they can unthinkingly be applied in every situation with no design. The contra is also not true. Unthinkingly applying SQL in all situations is just as perilous and fraught with issues.

I suppose it is against human nature, but I wish for the day when people could have reasonable discussions about issues, recognize that other people have some valid points, and possibly learn from each other and not feel the need to over-sensationalize everything.


This isn't just "a single person's biased experience". This is an experience that millions of developers and users have experienced time and time and time and time and time and time and time and time again.

ORMs make for a bad developer experience, a bad user experience, and a thus a bad overall experience. For any serious app using an ORM, this will always be the case.


It seems like the term was flawed from the start. "Anti" does not mean "bad".


No, it means opposite. Patterns are "the right way to do things". So, the term anti-pattern means "not the right way to do things".


Patterns are recurring designs, so anti-patterns might merely be non-recurring designs. This isn't necessarily a value judgment.


The term entered widespread use after the AntiPatterns book was published. The authors were referring to "patterns of failure" commonly seen in software projects.


Nope, the AntiPatterns book defined it as something like "recurring patterns of failure, which look superficially attractive".


Edit: yes, I am extremely well aware of the conventional definition. I was deliberately offering an alternative spin in context to the parent.

Case in point: ORM can handle 80% of query plans (patterns). Some, it can't handle. They are not patterns, they are the opposite: unique and non replicable. That doesn't make them bad.

However, I disagree with headline. Just because ORM doesn't handle non pattern situations doesn't make the concept an anti pattern.


"The whole point of an abstraction is that it is supposed to simplify"

No, it's supposed to abstract. A simplification is supposed to simplify. Often abstractions have the benefit of simplification, but it's not a requirement.

I migrated a project from MySQL to PostgreSQL last summer, and the project was built on Grails with GORM. I had to migrate the data by hand (mostly easy, save for a couple of edge cases like boolean columns), and I had to change the jdbc driver. That was pretty much it. No rewriting of SQL, no changing of escaping logic, etc. I tell a lie - the auto-sequence generation stuff of postgresql wasn't playing nice with some of the GORM identity stuff, and my code had made some assumptions that turned out not to be 100% true. Those likely would have shown up had I written my own stuff rather than relied on GORM, but it was a little bit of a pain to track those down.

All in all, using the ORM abstracted away the need to write against specific database commands and syntax. A byproduct of that was simplification of most use cases of the database, but the key use was abstraction.


I've seen a lot of the problems that ORM creates with big projects. The most egregious is lack of control. You'll run into some problem caused by some quirk of your ORM system and you'll dig down into the SQL and learn precisely what's causing it, but you still won't be able to fix it because you don't know the magic voodoo incantations to change your config or the ORM client code in the right way to fix it.

When ORM starts to get in the way like that it really makes you wonder whether it's worthwhile.


Not true. Any acceptable ORM system (and there's a lot of those nowadays) allows you to drop into pure SQL and several levels of abstraction between that and pure object code.

Really, I don't know which library you're using, but I have never seen such an issue with neither Django nor SQLAlchemy, LinqToSql, or Yii's ORM.


I would love to hear about an example of this kind of case.

My experience is that ORM queries, used properly, deliver predictable results. If they don't because of a bug in the ORM, can't you just drop down to the native query and move on?


Dropping down to raw SQL is equivalent to rewriting the affected components. This might be trivial with simple CRUD apps but with large, complex systems that can be a big undertaking. Not to mention that if you end up spending a significant amount of time using straight SQL instead of the ORM that draws into question the value of using the ORM at all. Not to mention causing maintenance and development headaches down the road.

Also, the "you can always drop down to SQL" defense of ORMs is extremely weak. It's great that most ORMs are not so intensely broken as to prevent you from not using it, but in general the "you can just stop using it" argument is not a strong one. Imagine if C++ was defended based on the idea that you can just drop down to raw machine code if you run into problems with the language.

All code has bugs. In my experience ORMs don't save you from having to drill all the way down to native SQL to debug. Then you figure out what SQL changes you'd need to fix the bug and from there you work your way back up to how you'd need to change your ORM config or client code to effect the same thing. ORMs are typically some of the most leaky abstractions of any sort. If people spent as much time examining and tweaking machine code as most ORM users do with SQL code people would question the usefulness of their "high level" programming languages.


Why not patch the ORM then? Personally, I suspect most of the issues people experience with ORMs starts with poor data modeling.

Also, I'm not sure what "large, complex systems" means, but it doesn't sound like the use-case for an ORM to begin with. ORM is best suited for small, flexible, simple systems – which ought to be 99% of what people and companies build.

I just don't see the benefit of complaining about how its an "antipattern".


A bad developer can hang himself with even a short length of rope.


So let's not give them any rope :)


Programming is rope, no matter what language you choose. Ever debugged a several thousand line stored procedure in SQL?


Unfortunately yes. It was 9350 lines of string concatenation that called sp_executesql at the end (T-SQL/SQL2005). I had to remove all the injection attacks...


My main experience with Hibernate but I could give you a nice long list of examples. The big problem is that there is a pretense of transparency and simplicity when it doesn't actually exist.

For example, Hibernate tries to optimise how much it loads and saves. This means it doesn't load foreign keys automatically and it has its own logic for exactly when and how it flushes data.

If you are loading an object it will by default not load the collections of objects that are related to it - that will happen lazily when those properties are accessed. Sounds fine, right? Exactly what a good developer would do by default anyway. Until you pass one of those objects around to a naive code module that knows nothing of the database and it tries to access the property - which looks like a nice in-memory collection to them - and it now executes an expensive query on the database that you weren't expecting. In fact, the naive code module might get a whole list of these things to work with and it might sequentially query each one. The best case here is that it's thousands of times more expensive than it should be, but the worst case is far worse - the database transaction might have ended or have been rolled back (which generates a giant exception+stack trace), or the rows might have been locked (which might just make things slow but could result in deadlocks since this naive code doesn't know anything about the database).

The way I see it in the end is that ORMs try to hide some of the most important parts of your software design from plain sight. You feel like they are hiding trivial stuff that is boring and you don't want to waste time on. But in the end that stuff is absolutely critical to your design and some of the most core stuff you should be thinking about. Having it all hidden away makes you not think about it at all until its too late and you've built a whole giant application on shaky ground.

After being through this enough times I am now expert enough to confidently build applications using Hibernate and save time over plain SQL. But the body of knowledge I had to accumulate just about Hibernate is equal to all the knowledge I ever had to learn about plain old relational databases. These days, if I start a new project using an ORM I try to minimize its use down to just being a convenient utility for loading rows and map them into an object's properties.

The one thing I will say in their defense is that when time is your most critical factor - eg: you're a startup with 3 months of runway and your end product is going to be mainly demo ware to get your next round of funding - an ORM like ActiveRecord, etc. is absolutely the right thing.


It's mainly due to the fact that MORE abstraction means MORE complexity. I currently have to deal with an NHibernate mess of over 1000 domain objects (!) at the moment and it's an absolute nightmare on the performance and maintainability front.


To be fair, 1,000 (or however many you need) stored procedures probably wouldn't be all that much better. I've seen systems taking both approaches and I've seen both go right and wrong. It has more to do with the team than the technology, I think.


We replaced 15,000-odd stored procedures with NHibernate. It's the same turd in a different coat.


if More abstraction is leading to more complexity, then there is something very wrong going on.


Have you seen a Spring/Hibernate Java app go pop on top of Glassfish? The STACK TRACE takes more memory than the address space of a 8-bit processor.

That's TOO MUCH abstraction demonstrated.


It's not worthwhile if you don't understand what the function of the ORM code is. ORM might be an abstraction, but it doesn't mean it doesn't take some elbow grease to learn. Like anything worthwhile, once you get over the learning curve hump, it becomes a tool that you enjoy using precisely because you understand it.

Even when you have to sometimes write SQL inside your ORM!


It sounds like you didn't bother learning how to use the optimization features of your ORM.

Almost all of them have them....plan on using them next time at that stage of the projet, it will turn out fine.


What is it about coders and blogs that brings out the "cranky old man" vibe?

An ORM is an insanely convenient way for newbies to use various data stores while avoid learning umpteen different query languages. The teaching value of ActiveRecord for newbies is hard to overstate. It's also a damn nice way to move your application closer to platform independence – a valuable thing in today's PaaS integrated stacks.

If you seek efficiency and performance, don't use an ORM. Lick the freezing cold metal, if you want. Nothing is stopping you from doing what you like!

(Also, I haven't dropped down to SQL since Rails 3.x and meta_where. Yes, I realize that my applications "won't scale". They are appropriately scaled for their intended purposes.)


It's not just for newbies. I'm competent with SQL, but that doesn't mean I enjoy writing query after query. We use Django and utilize the ORM... we jump down into SQL in the (rare) case that we really need too.

I'm sure at higher scale (we're mid-level at this point) that we might be hand tuning more and more, but that hasn't happened yet.


> An ORM is an insanely convenient way for newbies to use various data stores while avoid learning umpteen different query languages

Let's count these umpteen different query languages: (1) SQL, (2) ... ?

> If you seek efficiency and performance, don't use an ORM.

If you don't need efficiency and performance, why use a DBMS at all? Isn't the whole point of a DBMS to make data access not only secure but also efficient (B-tree indexes, etc.)?


The variants of SQL are different between different databases.

Even the data-types supported are different.

It's not uncommon for consumer and enterprise products to have to work with different databases. In the case of a consumer product, it'd be a client having only one database type installed and requiring that your software use it. You benefit by writing in an ORM because you can have the same codebase for multiple db installs.

With enterprise customers it'll be having multiple databases installed at once, and having your code interface between all of them. You benefit with the ORM by not having to remember and handcode all of the query differences.


Age.


And experience, to be fair. I think these guys need to get in touch with zen concept of "the beginner's mind".


One aspect that always comes up is the "inefficiency" of doing a select * from a table with 30 columns when you only need 4 columns. 99% of the time the millisecond performance difference doesn't matter, and if it does, there is a standard non-default way to handle it in most ORM's.

However, one aspect that is usually conspicuously absent in anti-orm blog posts is that of development time and cost. ORM usage practically guarantees known coded efficiencies, but it lets you implement and pivot really quickly, the time and money saved is easily more than enough to pay for a bump in hardware to overcome the 10% slower code. But to do so is heresy for these people....selecting columns from the database that you do not use is just not done, full stop. Which is cheaper, in dollars, is irrelevant.


This surprised me as well, I was told that in Postgres at least, there is no difference in query speed between selecting a subset of a record and the entire thing.


Short answer: Any relational database is going to read in the entire row when you select it. You will save network traffic and probably space in some internal buffers only.

Longer answer: Any relational database will read the entire page the row resides on when you select a row.

This means that tables with more columns will have less efficient pages which will need to be read/buffered more frequently per row.

This also means there is an exception: columns that are stored in overflow pages away from the rest of the row like blob or text columns or (in some db's) very long varchars may not necessarily be read if you don't select them.

There is an additional, very important exception: if you use a covering index then the db will not need to read the data page the row resides on. For example, if you have an index on (username, user_id) and you select "select user_id from table where username=xxxxx" then it will be able to read the user_id from the leaf node of the index and no bookmark lookup to the data pages will be needed. In some db's the primary key is always "covered" and you never need a bookmark lookup to get it.


It depends. In MySQL, if the only columns you select are indexed columns, the entire query will be pulled from the index, which is usually entirely in-memory, so enormously faster:

http://dev.mysql.com/doc/refman/5.5/en/where-optimizations.h...

(The columns currently all have to be numeric for this to be true, but that's surprisingly often the case)


The query speed may be the same (I don't know if this is true or not), but fetching the data from disk and returning it over a network will certainly be slower, especially if the unnecessary columns contain large strings or blobs.


Forget the speed of the database. What happens when you need to transmit the returned data across a network? Would you rather transmit 32 bytes per row, or 10kB?


Or, like I had happen recently, 10GB?

Of course, that comes down to the scaling problem. The application I'm currently working on can't use ORMs (though we have rolled a few of our own read-only ones), because of 1) the size of the data we're going through, and 2) the lack of support for splitting reads and read/writes to different connections.


(1) is definitely a problem I've seen. SQLAlchemy for example takes forever to initialize objects representing even a single row. Fine if you have less than 100 tables with less than 100 rows each and almost never join, but start scaling, and you run into issues.


"forever to initialize a single row" is obviously some pretty severe hyperbole, but the slowness of loading rows into fully formed objects is largely a product of C python's sluggishness in creating new objects. This also has nothing to do with how many rows are in the table, only how many rows you happen to be fetching over the wire for a particular query.

Performance is also something SQLAlchemy is obsessive about, so the speed of loading objects in SQLA has increased by orders of magnitude over the course of 0.4, 0.5, 0.6, 0.7. If your opinions are based on earlier versions, your impressions may be out of date.


Sorry to be vague. In my benchmarks it takes around 30 ms to initialize an instance of a model as a result of a query using Python 2.6 and sqlalchemy 0.6. This is in contrast to raw MySQLdb query initialized as a dict which takes 2-4 ms.


MySQLdb is written in C. So you're comparing the speed of raw C code to pure Python (which runs on top of that C code). The vast difference in speed between pure C and Python has little to do with SQLAlchemy which I can assure you generates those objects as fast as possible while still maintaining proper behavior.


MySQLdb is written in Python. _mysql is a Python extension wrapped around libmysqlclient. _mysql is about twice as fast as MySQLdb BTW.

Nevermind all that. My point is that if you want fast code, write it in raw SQL.


not really sure what you mean. MySQLdb, i.e. this project: http://mysql-python.sourceforge.net/FAQ.html ? When you ask this library to execute a query and return rows, the network communication with the MySQL server is done with C code, i.e. with _mysql . It doesn't make sense to say "_mysql is twice as fast as MySQLdb" - "_mysql" is an intrinsic component of it. The Python portion of MySQLdb does very little.

Here is from the FAQ included with the project:

"MySQLdb is a thin Python wrapper around ``_mysql`` which makes it compatible with the Python DB API interface (version 2). In reality, a fair amount of the code which implements the API is in ``_mysql`` for the sake of efficiency."

In fact, if you do want MySQLdb that's pure Python, there's an entirely separate project for that called pymsql: http://code.google.com/p/pymysql/. This library will be a lot slower than MySQLdb.


I think that would depend on the data type and sizes. A single row in a PG table cannot take up more than page size (8kB). If you have a 100kB TEXT field the row will instead contain a reference to an "overflow" table rather than the data itself. See http://developer.postgresql.org/pgdocs/postgres/storage-toas... -- presumably the corresponding toast data is not looked up unless needed (and you can check for e.g IS NULL in the main row)

(looking over my database, less than 1% of the 30-40GB size is in overflow tables so unless you routinely store large amounts of data it won't affect you).

(I used to think the "toast" tables were temporary tables and wondered when Postgres would delete them).


Well, it depends if your columns are blobs or not, it makes a difference (well, at least it used to).

On the other hand, I don't know which ORM you use, but they don't all behave like this, if you tell them you only want columns Name, Age, DOB, they'll only get those columns.


If your database server is on another machine you are now transmitting 30 columns of data over the network instead of 4.


Correct. And for most applications running on most networks, without access to programmatic benchmarking tools, you wouldn't be able to tell one app from the other.

Everyone likes to think they're writing the next google, Facebook,etc, but the reality is far more humble and boring.


Labeling an ORM as an anti-pattern is throwing the baby away with the bathwater. Sure, you will encounter some cases in which your ORM will be a pain in the ass or even actively work against you, but most good ORM's will allow you to talk to the database directly.

For example, both Hibernate and ActiveRecord allow you to just throw straight sql to your database, returning a bunch of key value data.

Which is exactly what a good solution does: provide large gains for the common cases, and get out of the way for the edge case.


> If your data is objects, stop using a relational database.

What does that even mean?

My data, is, well, data. Tables and rows are just ways to represent my data, as are the nested hash and array structures of document storage systems. Oh, and tables and rows are also objects.

What data is "object" and what data is "non-object"?


Yes, I could have clarified that. Here's an attempt:

"Relational" data is data whose value stems from its relationships with other data. For instance, if it is statistical data that is viewed in aggregate rather than as individual rows. Or if you need to answer the question "how many rows look like this row?"

"Object" data is data that is useful in and of itself, and is largely self contained. A pretty good example is a blog post: each post has a bunch of metadata, including possibly a string of comments. But you seldom if ever run queries across batches of blog posts (other than indexing them by date).

It's always bugged me that blog entries -- the staple of the ORM tutorial -- have little to no relational value, which is why they work so well in ORM.


Splitting blog posts by date is hardly something thats rare. See also, all blog posts by tag, all posts by author, all posts containing the word "javascript", latest comments by a particular user etc.

Well organised blogs have a reasonable amount of relational data.


Embedding strings of one language in a second language is an anti-pattern.

I've been at a bunch of NYC dev events recently, and people at both Goruco and Percona Live were hating on ORMs. ORMs have gotten really good in the last few years, I think the haters just haven't been using them.

Show developers a good alternative and they will go there. Some of the basic points made in this article ring true, but the suggested alternatives are weak. ARel is a great start to a non-orm database wrapper in Ruby! Somebody just needs to go there.


Somebody may have already mentioned this, but there's a fantastic essay The Vietnam of Computer Science (2004) on this subject. It's long but so, so worth it.

http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Compute...


My first reflex was to write a comment mentioning that essay and then I found this. Wish I could upvote you more...


Wow, thanks for contributing this to the thread. I had not seen this before. That has to be the best explanation of the tradeoffs/benefits of ORM I've seen anywhere. Most of the content isn't "new" in the sense that if you've done both a lot of SQL and a lot of ORM work, the issues are pretty obvious. But it does a fantastic job of showing why there's just no simple answer currently.


I'd be interested in the author's take on ActiveRecord's implementation of Relational Algebra with ARel[1]:

> To manipulate a SQL query we must manipulate a string. There is a string algebra, but its operations are things like substring, concatenation, substitution, and so forth–not so useful. In the Relational Algebra, there are no queries per se; everything is either a relation or an operation on a relation. Connect the dots and with the algebra we get something like “everything is named_scope” for free.

Also, if I couldn't use something like ActiveRecord in my Rails apps, I'd end up re-writing most of its functionality in my model code somewhere. If I don't get Model#find_by_some_attribute() for free, then I have to spend time writing it.

[1] http://magicscalingsprinkles.wordpress.com/2010/01/28/why-i-...


I like the iBatis (now renamed mybatis) approach: explicit queries in a separate file that say "input an object of this type reading these fields, output an object of that type setting those fields" and contain raw SQL to be thus parameterized.

This avoids the two largest flaws: live proxies with hidden state pretending to be simple data objects, and SQL being generated with no control. It also avoids a mistake I've only seen two ORMs make but they're common ones: defining its own dialect of not-quite-SQL.

You still get objects mapped in and out of DB queries, it saves you the pointless grunt work of "copy A, put it in B" and it prevents the as-bad-as-ORM anti pattern of "SQL scattered throughout your code".


I think Ibatis is great for querying data out of the db and ETL operations, especially when I'm using a lot of database features. However, IMHO, Hibernate is better for crud opts on individual records because it takes care of dirty flagging and managing relationships.


In my experience Hibernate fails badly at dirty flagging and saves every field in the object rather than updating the changed properties (active record is better). I suspect that's a design decision, to avoid an object in cache being partially stale relative to the DB. But it's an problematic solution to something that didn't need to be a problem.

Hibernate does manage relationships - but that is a misfeature. It's doing the wrong thing well. The right thing is not to try to model relations as objects, but to model queries as methods. The relationships exist only in the database - they are not duplicated into the data objects.


"The relationships exist only in the database" Entity Person contains collection of entity ContactInfo. In DB we have tables PERSON, CONTACT_INFO and foreign key from CONTACT_INFO to PERSON. What the difference in relations in code (between entities) and in DB between tables? In this example.


I wonder if you understood what I was saying? A tool like "mybatis" just reads and creates unproxied POJOs. The relationships are in the DB and are expressed in the SQL. The result objects are just results. With this type of Data access, there is no duplication between the DB's relationships and the ORM's relationships which shadow them.


"In the long term has more bad consequences than good ones."

I would hypothesize that one of the long term good consequences of healthy ORM options is the existence of the vast majority of database backed applications we all know and love. Sure, when/if they got popular someone had to tune some SQL, but how many of those projects would have even been started without ActiveRecord or Hibernate or EJB3 or CoreData?

The opinion that "ORM is an anti-pattern" is ridiculous nonsense.


The main problems with ORMs is that they're trying to work around non-object-oriented data stores. Layers of abstractions and ORMs in particular are generally good things—but they can't do magic when it comes to dealing with SQL.

If you're going to be using an ORM, I'd strongly recommend rethinking your data store. Object databases such as MongoDB is a perfect fit, but even a key-value store like Cassandra would be a much better option than SQL. I think it's interesting to note that Core Data, Cocoa's ORM, is one of the fastest data store out there. It uses SQLite, but defines its own schemas. I believe it'll also let you store pure binary data.


Small nitpick: If you're using a non-relational database, the library you use to connect to it is not an object-relational mapping layer.


You have a point! I guess I'm bastardizing the definition of an ORM. What I mean by it was a library that automatically maps the data layer to objects in your application code, and (often) gives you tools to work with these objects.

I wrote an "ORM" for MongoDB which adds functionality such as transparent relationships.[1] Basically, even though it's not a relational db, it lets you do things like this:

    foreach ($author->books as $book)
    //                ^         ^ this is a Book object
    //                ^ this is an iterator, it loads a
    //                  Book object lazily every iteration.
        echo $book->author->name;
    //              ^ this is an Author object, auto-
    //                matically & lazily loaded & cached.

[1]: https://github.com/kballenegger/MongoModel

Among other many cool features. The point of highlighting that though, was to illustrate that giving up traditional RDBMS doesn't mean giving up on awesome relationships. The only thing missing is subqueries—but honestly, I don't think that's a very big loss.


Yes, Core Data supports storing raw binary data. IIRC is also uses caching (for SQLite stores), and lazily-loads related objects as needed. You can customize this behavior to optimize its memory usage for your code.


This seems incredibly naive.

ORMs reduce code duplication. They speed up development, especially when you're treating the underlying data storage as a "dumb" datastore that could just as easily be sqlite or H2 as MySQL or Postgres.

As for ORMs having some sort of negative impact on the queries sent to the underlying database - it really depends on what ORM you use but any ORM I've used had support for pre-loading relationships in advance when required, removing that N+1 problem.

I also want to add that I wrote an ORM for the first company I worked for and when it was finished it was a drop-in replacement for 90% of the queries in our application - and I mean that literally the SQL generated by the ORM was exactly the same as the SQL being replaced. The queries that it couldn't replace (mainly reporting queries) already had an aggressively tuned caching layer in front of them anyway because they were so hairy.

But the real point is this: the performance of the ORM didn't really matter because we were a database driven website that needed to scale - so we had layers upon layers of caching to deal with that issue.

And that is an extremely important point - the way ORMs generalise a lot of queries (every query for an object is always the same no matter what columns you really need) lends itself to extremely good cache performance. Take the query cache of MySQL for example - it stores result sets in a sort of LRU. If you make n queries for the same row in a DB but select different columns each time - you store the same "entity" n times in the query cache. Depending on how big n is, that can cause much worse cache hit performance than simply storing one representation of that entity and letting all n use cases use the attributes they need.

Now, relying on MySQL's query cache for anything would not be smart, but replace it with memcached or reddis or whatever memory-is-a-premium cache and the same point stands. Another example to drive the point home is a result set where you join the result entities to the user query so that you can get all the results back in a single query. In theory this is a great way to reduce the number of queries sent to your DB but if you have caching then there are many times where you could have very low cache hit ratios for user queries since they tend to be unique (for example they use user id) but where you could still get great cache hit performance if certain entities appear often across all those result sets by leaving out the join and doing N+1 fetches instead.

ORMs prevent you from scaling as much as using Python or Ruby over C does.

So I guess that leaves the point about leaky or broken abstractions. Well I would never claim that you can abstract across a whole bunch of databases anyway, I think that's a ridiculous claim that most ORMs make. These types of abstractions when people try to hide the underlying technology are really just a lowest-common-denominator of all the feature sets. So if you chose some technology because you really wanted a differentiating feature then most likely you will find yourself working against such abstractions. Interestingly enough, the dire support for cross-database queries which are perfectly legal in MySQL but not in other vendors is the reason I had to roll my own ORM. But the productivity and maintainability benefits were well worth it.

So yeah I guess what I'm saying is: premature optimization is the root of all evil, there are no silver bullets and performance and scalability is about measuring and optimising where needed. And finally: ORMs are not an anti-pattern.


An IdentityMap would resolve the issue with your second-layer cache, allowing you to select only the columns necessary, which can be a very significant win.

Ruby O/RMs for example generally take a big hit from loading date/time-columns. If you don't need the columns most of the time, then why load them?

Many (not all) databases will take a significant hit when loading TEXT columns. Again, most of the time you probably don't need them, and could be better served with a 'summary' column.

Just nit-picking really. Though I do generally try to think of caching as a last resort; too often it feels like a hammer for compromised design.


Calling the author naive is, I think, uncharitable. I've also written my own ORM layer, used for large-scale, high-performance enterprise applications, and I largely agree with the original author's post, so I certainly don't think it's naive. It sounds to me like the thoughts of someone who's ran into some real problems in real situations.

I don't disagree with the points you've made around caching, but I do think you're simplifying the problem a bit. Not all performance tuning in DB-intensive applications is around caching, and it often involves query tuning, indexing, and traditional DB-level stuff.

A large part of the abstraction leak around ORMs is around both the caching and that DB-level performance tuning. You have to understand what code is going to generate what queries so that, at the very least, you can tune them by adding in the appropriate indexes in the database. All of a sudden, you're living in SQL land, examining query plans, etc. But if you decide that the change you need to make is to the SQL itself, the ORM layer suddenly gets in your way: you either have to bypass the ORM layer to drop into raw SQL, which at worst is hard to do and at best tends to massively reduce the value proposition of the ORM framework, or you have to try to tweak your code to get it to generate the query that you want, which is often frustrating and far more difficult than just writing the SQL yourself. I don't think I'm that much in the minority of having an experience like, "Hmm, the query I really need to write needs to use an ORDER BY statement that includes a few case clauses . . . now how do I convince this query-generation framework to spit that out so that I don't have to pull back all the results and do the sorting in memory?" It's also worth mentioning that caching doesn't help tune writes, so if scaling your product requires scaling writes, you're probably going to be mucking around in SQL land.

There's a similar problem around query-generation layers that attempt to allow you to just write normal methods and have things executed on the database; because the code is so far removed from the SQL, it makes it really, really easy to write really terribly-performing queries or to write things that will do hugely unnecessary amounts of work.

On a more trivial point, the fetching all columns when you only need a subset of them problem is really an issue sometimes, especially if you A) have to join across a bunch of tables, B) the columns that you want could be retrieved from indexes, rather than requiring actual row reads, or C) the columns that you care about are strictly several removes away from the original search table, but the ORM layer loads everything in between. (For example, Foo->Bar->Baz, my WHERE clause is on Foo, but the only columns I care about are the id on Foo, which is in the index, and a few columns on Baz . . . how do I tell my ORM layer to load nothing from else from Foo and nothing at all from Bar? It's a different problem than pre-fetching, because I just don't want anything loaded.)

Now, that's not to say that ORM layers can't be made to perform; of course they can, pretty much all of them have the sorts of hooks you describe, and there's plenty of empirical evidence to that effect. But sometimes the way you make them perform is by just bypassing them.

There's another abstraction point, which is that supporting multiple databases often leads to a least-common-denominator functionality approach; for example, if you want to use a db-specific spatial data type, the ORM has to either provide db-specific functionality, or it might just not support handling that type of data well. The same often comes to things like db-specific functions or query hints; if the ORM layer doesn't handle those things for you, you have to bypass it and drop into raw SQL if you need them.

So really, the argument is not, "ORM's are not functional and no one should use them," it's related to the value proposition of an ORM layer. The value proposition is "This tool will make your life easier, will save you from having to write SQL, and will help you work across multiple databases." If the tool makes life harder than it otherwise would be, then it's not useful, even if it's still possible to do work in it.

So the question is largely around whether or not they make life easier or not. In the simple case, I think the answer is that yes, they do: they make it easier for beginners to get off the ground, they make it easy to do simple queries and writes, and the performance probably doesn't matter anyway.

When things get more complicated, though, the question becomes a lot less clear. Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query. I know of applications that have basically had to bypass ActiveRecord more and more as they scaled to just do raw SQL queries because making ActiveRecord perform was simply too hard or not possible.

So personally, I prefer an ORM approach that does minimum stuff to let me do the simple things simply (pull rows back and map them to an object, execute simple queries directly on that table), but that's designed from the ground up with the idea that dropping straight into SQL is a normal, accepted part of the workflow, rather than some one-off thing that you should rarely do. But it really depends on your project and your comfort level with SQL.


> Yes, the ORM layer makes it easier to have structured queries that can be cached . . . it also makes it harder to have one-off queries that can be tuned easily based on exactly what data is needed and tweaked to convince the database to generate the right query plan, and it makes it much harder to look at some DB stats, identify a poorly-performing query, and then map it back to the code that generated that query.

The problem I have with your post is that you are repeatedly mistaking the high level idea of an ORM with the (seemingly) Spartan implementations that you have used.

Here is an ORM that provides support for automatically profiling the performance of queries over a period of time:

http://squeryl.org/performance-profiling.html

Sample output:

http://squeryl.org/profileOfH2Tests.html

I cannot begin to tell you how much time this has saved me when optimising performance of my webapp.

Note that that particular ORM also has the advantage of having type safe queries - i.e. it can tell at compile time if there's a syntax error in your query (subject to bugs in the ORM :)) - even in dynamically generated queries. In practice this is a fantastic feature as it is so much safer than building up SQL queries with string manipulation and dealing with multiple code paths that depend on user input. The test paths alone in such code (even if you have a "query builder" layer) are the stuff of nightmares.

There are many features missing from Squeryl though that I've had in other ORMs because it makes different trade-offs. But this is what you do when you choose a library, and it's important to understand what trade-offs you're making upfront... otherwise you might find yourself writing off an entire approach to software development as an anti-pattern because you picked the wrong library.


I think you're missing my point on the performance side; yes an ORM layer can help you identify slow queries, but it's pretty much the database query plan that will tell you why it's slow. Is it doing a full table scan instead of using an index? Is it applying joins in a sub-optimal order? Are the statistics just off, which causes it to use a bad query plan? At that point you're already in SQL/DBA land, but now you have to map that knowledge back to the ORM layer to fix things.

My experience with type-safe query layers is that they tend to be incomplete; they simply don't let you generate the full range of SQL queries because you're restricted by the language's type system. That said, I'm not particularly familiar with squeryl (and Scala's type system is certainly more expressive than most statically-typed languages), so I can't say what it's limitations are, I can only make general statements.

Anyway, I think it's fair to say it's difficult to talk about ORM generally due to the differences between frameworks and approaches. So I'll try to phrase things more clearly, and say that I think the author's original intent, and the part I agree with, is the fundamental premise that ORM abstractions are inherently leaky and that performance needs often result in a desire to go around the ORM framework to handle something more natively in SQL. Some ORM frameworks embrace those limitations, and allow you to use them when you want to and to work around them when you don't; other frameworks fight that limitation and attempt to swallow the world such that you never have to leave the ORM framework, and those frameworks tend to be the ones that become frustrating to work with.

So if I were to attempt to charitably read the original post, I'd say that perhaps saying it's an "antipattern" is taking it too far, but saying that it's a fundamentally flawed, leaky abstraction is totally accurate, and that recognizing that it's fundamentally leaky means that you, as a developer, should probably take that into account in your application design and your library selection, and that there are some techniques that might help you to do that.


In this discussion, everyone seem to broadly assume that it's an all-or-nothing affair.

More precisely, it's a cost-benefit decision. If most queries you will make are hampered by the ORM then by all means, don't use one. But if like in many (most?) situations, an ORM greatly abstracts and eases design and development for 90%+ of your operations and you have like 10% queries to be either tuned or handwritten (even if it has to be handwritten against multiple database types to preserve portability), then an ORM is a net benefit.

Compare it to inline asm in C, or C modules in Python: the higher level stuff makes it efficient to work with top level concepts 90% of the time, but sometimes you have to go down to be actually efficient, or even simply be able to do something, even if that means losing some form of independence (which would then mandate writing the same function for a different platform if you want to preserve portability). Not only it is not an anti-pattern, by no means does it mean either that the abstraction is fundamentally flawed.

The very idea that "going around" an ORM is somehow proving that ORMs are flawed is simply wrong. There are problems that ORMs are built to solve, and there are problems they can't ever solve. "Going around" is part of the deal because it's not a "work around", it's a "work together".

This is very visible in the article, especially the moment the author states that "I claim that the abstraction of ORM breaks down not for 20% of projects, but close to 100% of them". Indeed this is the case, but for close to 100% of those close-to-100%-projects where it "fails", the ORM is helpful for managing 90% of data access implementation. The 10% remainder may need to be implemented at a lower level, but wouldn't it be silly to spend a lot of time on those 90% of code that would get used 10% of the time? This is what ORMs are about, and saves a lot of time to develop the 10% of code that is critical both in usage volume and in performance. Of course such ratios are highly project dependent, and this is what warrants a thoughtful analysis to select the right tool for each task, of which there can be multiple in a single project, or even _object_. ORM, just as NoSQL, is simply not the end-all be-all solution, yet that does not make it a very valid pattern any less.

(edit: cosmetic/typo)


  Caching can only go so far . . . eventually you are going to need to refactor or tweak schemas and adjust queries. Especially if you are dealing with user centric / heavy write data although yes cache on write can also reduce some strain.

 Personally I would take hand rolled DAOs over magically generated ORM/AR logic any day. The upfront cost is marginally higher but modifying, adjusting and logically grouping data is much more straight forward.


That's what I do. I have a DB class in PHP that makes most queries easy (much easier than Pear or PDO), and all DAOs are similar to each other so it's a copy-paste-modify to add a new one.

Then I have static methods to do the special queries that the post author talks about, which may return direct result sets, an integer, generated objects with fewer columns, or large object trees depending on the case.

ORM would be faster (developer time) for a prototype site, but do-it-yourself DAOs work just fine.


treating the underlying data storage as a "dumb" datastore that could just as easily be sqlite or H2 as MySQL or Postgres

If you're doing that then you've missed the point of a database and might as well be using flatfiles.


One of the problems I frequently see is that people complaining about ORM and SQL are thinking mostly of some object wrapping a row (or set of rows) in a table. Then they get into trouble when they want to wrap something more complex involving joins between tables.

All these problems would disappear if people used database views. Then their nice ORM layer (say ActiveRecord) would work perfectly and the nasty joining and updating would be taken care of by the database. I've often wondered if people even realize that database views exist and how powerful they are: http://en.wikipedia.org/wiki/View_(database)

Of course, it's only relatively recently that MySQL has started supporting views properly (in 5.0).

The other nice thing about views is that it means your code using the ORM is simplified because you aren't indirecting through different objects to get at specific values you need to display. It also means that only the necessary data is retrieved from the database.


That is the other problem. Every database is a mash of semi intentional subtle incompatibility with the standard and a host of non-standard features. When an ORM comes along many try to expose the non-standard features that make sense in some way but end up needing a custom solution for each DB (see how you do an auto sequence ID for an entity bean between Postgres and MSSQL).

So you have a semi portable layer interfacing with a semi portable environment.


The relational-object problem is called Object-Relational impedance mismatch, duh!(http://en.wikipedia.org/wiki/Object-relational_impedance_mis...)

For me, fast Web Application development is worth the tradeoff. I usually begin to hate sql on the second month of a project.


I love SQL. What I hate is the tedium of turning the results into something useable. ORMs eliminate that pretty handily.

As they say, "If it didn't exist, I'd have to invent it."


This is a person who doen't write many large scale systems:

You will have considerably more (sometimes serious) bugs if you write all your SQL by hand all the time in a app that uses a lot of DB queries.

Yes, you still need to understand what the ORM does when you do certain things, you still need to understand what nasty joins you're writing and all that. But you can let all the minutiae of what you DO write work out well in a rote, well tested manner.

The article smells a bit of a guy who didn't know SQL or had a team member who didn't, and they though just using and ORM would work.

If your app is successful, you will usually need to optimize things. But this is true for SQL or any time saving abstraction as well, not just ORMs


What I found most telling was the talk about doing complex joins in your application.

You do them in views and stored procedures. Your application code should be distinct from the data it sources, so people don't have to wade through 400 lines of terrible PHP just to change 'fullname' to 'concat(etc)'.

Don't torture your DBAs.


I agree with rimantas. Views and Stored Procedures should not be used to perform business logic stuff i.e. I should not have to create a view with massive joins just because a logic need requires it. And I should not have to dive into my database when a business rule changes!!! Views and Stored Procedures are useful when you lack a layer of abstraction (e.g. An MS Access FrontEnd) where you may want to put security restrictions on the data that is exposed to a particular group of users, OR for performance reasons where a reasonably complex query can be run faster as a stored procedure. Of course this is my personal view, and is definitely a point of contention for many people.

The whole point of an ORM is to abstract the data from application code. Business Logic can be built on top of it with minimal knowledge of the underlying data storage system except in exceptional cases. ORM frameworks aim to simplify the process of writing these boilerplate code and continue to fulfil most common use cases.


Stored procedures are the sure way torture both, DBA and devs.


I had a lot of problems working with ORMs when I was 1-2 years into programming. However, I also felt a lot of resistance to learning to use a framework vs. straightforward, procedural code for web apps.

It's a matter of wanting to take the time to learn another system, API, DSL, what-have-you just in order to work with something you already know - SQL. The dislike of HQL resonates with me - I was wondering why I would ever work with PHP Doctrine's DQL. Building SQL queries out of a sequence of OO method calls seems absurd, too. As the article and comments note, you shouldn't have to know SQL well to use an ORM.

There are definitely issues with the ORM/Framework working against you, too. I love the organization and features in Rails or Django, but I hate when I spend hours working out how to do something that would take 5 minutes in plain PHP. Same with ORMs. Getting them to do the right type of join, not make unnecessary calls, etc. can be a pain. Sometimes it's that I don't know the software well enough, which could either be my own problem or just a reasonable lack of desire to devote my brain to it. Other times it's that the given ORM really does have shortcomings, conceptually and at level of development.

The one ORM I've had the most luck with is Django's. It's straightforward, does what I want, is well documented, and doesn't have too many features.


In the comments I'm noticing no one ask: when should or shouldn't you use an ORM?

Most of the discussions are over the merits of either approach, when to me it seems an ORM has many places it belongs. And a few it doesn't.

For most database of record systems, which are a large chunk of your average webapp, an ORM is a god send. When I say DBOR, I mean things like articles, posts, comments, users, products, transaction history. An ORM saves a large amount of work writing SQL, it covers 95% of your queries (particularly insert,update,delete and simple gets) with minimal effort. You create the model, and let it get dealt with by the ORM. Your objects are mainly records. The pain comes when you start wanting to do analytics and interesting reports - but stick with a reporting tool, and keep this out of your application, and you feel less pain.

But this breaks down when you move to a database that represents a complex real world system. If you're working on a model that represents, for example, an electrical distribution system, these are not really records. They represent a vast set of complex interrelations, Of course there are still records, but in isolation, away from the complex relationship of say pole->{location,type,maintenance history,conductors,insulator type}, and conductor->{poles traversed,length,a end location,a end join type,b end location, b end join type,material,material batch number,power circuit carried} etc. etc. Then your queries to "find all customers affected by the pole at these coordinates", requires joins through: pole, conductor, circuit, serviced area, customers... we're moving rapidly to lots of complex queries, where hand crafting really is the way to go.


Death by a thousand queries

Wait a moment here. In my experience, most problems like this can be solved by noting something like: When we get X, we also get all of the associated Y's and their Z's. Declarative association of a Batch Query with certain retrievals isn't a newbie project, but it's something a lone programmer can put together in a week in a good ORM. (I've done it.) I would expect this library feature to be common in the Ruby/Python world.


It has been a supported feature in just about every other ORM too (Hibernate, Entity Framework, Linq-to-sql..), for ages.

The author doesn't seem to have understood ORMs well enough.


Good observation. Anyone writing a blog article on ORM should probably read the c2 wiki entry with the huge feature comparison chart of ORMs first.


I think the author's point is that by the time you have gone to learn all of the complexities of an ORM, and all the steps you must take to sidestep when you can't bend it to shape... you could have much more easily written it by hand.


If you have a Declarative Batch Query, then you you get a bunch of SQL and plumbing code for free. The join and the iterative code you'd need to do that writing it "by hand" is 10x more.


The author couldn't be more wrong.


Yeah I generally do,

if object_in_cache_of_object return object else if object_in_cache_of_object_list add object to chace_of_object, return object else populate cache_of_object_list, populate cache_of_object return object


I think that designating ORMs as "anti-patterns" is a bit strong. Perhaps my understanding of what an ORM is supposed to accomplish is different from the author's, but I think that some of the criticisms that the author levels against ORMs are a bit off.

Inadequate abstraction - I would make the argument that it doesn't make sense to expect an ORM to be able to completely abstract away from the underlying database. The reason that the documentation of the various ORMs is sprinkled with SQL concepts is that the ORM is providing a window into an SQL-based environment. I would never

Incorrect abstraction - I actually agree with this point, but this does not really seem to reflect on ORMs. This point has much more to do with the ongoing debate between the NoSQL movement and relational databases.

Death by a thousand queries - I hardly think that this is a knock against all ORMs. Different ORMs have different solutions (or a lack thereof) to this problem. I use Django a lot, and Django's built-in ORM offers a lot of "frills" that can help to protect against this (lazy loading, selective loading of columns, selected loading of related models). I know that, in the Ruby world, Datamapper seems to have some ways of dealing with this problem as well. It really isn't as simple as saying all ORMs do this therefore all ORMs are bad. The reality is more nuanced.

Ultimately, my principle problem with this piece is that it seems to conflate its argument for NoSQL and its argument against ORMs. NoSQL is wonderful, but it seems to be somewhat orthogonal to the value of ORMs.

ORMs are not perfect, and there is plenty of room for improvement; however, writing everything in SQL solely due to performance fears will usually turn out to be a case of premature optimization.


FWIW even rail's ActiveRecord has the same frills (lazy/eager loading, selective loading of columns, loading related models via join or via grouped selects). It even has modules that do this fixes automatically :)


One other thing struck me reading this - it feels like premature optimization. Assuming that every ORM is going to be slow and inefficient to the point where you'll need to override or rewrite all the queries will lead to an inefficient use of developer time, and assumes you know a lot about what will matter under real world use conditions.

Yeah, sure, that ORM is adding 200% overhead to the SQL query - it's pulling back 30 columns instead of 4! And... it's taking 38 milliseconds and is run 4 times per day. So what?

And when the model changes and you have an extra few columns to represent more data? You've now got to hunt through every SQL query that could possibly reference that table and make sure it's dealing with the new columns appropriately, instead of having an ORM let the computer do what computers do - compute the changes required.

Yes, there are other ancedotes that can be trotted out to prove the opposite of my 38ms story above. Then we'll fall back to 'right tool for the right job', and ORMs are currently a good middle ground tool for many of the projects people are developing. Perfect? No. Useful? Yes.


Mhh the premature optomization school of thought. Kiss good and all but I think einstein had it right with his "Everything should be made as simple as possible, but not simpler."

A modest amount of additional upfront is probably worth it if it saves you time and effort in the long run. It's the same reason why its probably worthwhile to learn a framework rather than rolling your own organic solution, setting up templating, using version control from the get go. They all add upfront complexity but pay off in the long run.


The article makes a strong case against ActiveRecord, not against Object Relational Mapping.

Under the heading "The problem with ORM" the author writes,

    The most obvious problem with ORM as an abstraction is
    that it does not adequately abstract away the implementation
    details. The documentation of all the major ORM libraries is
    rife with references to SQL concepts. Some introduce them
    without indicating their equivalents in SQL, while others
    treat the library as merely a set of procedural functions for
    generating SQL. 
This is true of ActiveRecord, it's untrue of object graph ORMs like Apache Cayenne.

I find two patterns to be key to effective ORM:

* Data Access Objects. This is for when you have nothing, and want to get an entrypoint into the schema. In this case, you should be able to write near-pure SQL to get what you want.

* Entity Objects. This is what the DAO will give you back - either an individual or a list. Each instance represents a row in a table, and has methods that will do lookups to foreign keys. Once you have this, you have an entrypoint into the data graph, and can use foreign keys to crawl around to wherever you need to go.

The DAO layer is a simple, centralised place where you can implement permissioning logic.

If you need to do something high-performance (usually some sort of report), you create a custom DAO, and have it return custom entities (instances of classes that don't have 1 to 1 association with a table) that fit your need.

I've found that after a certain point of complexity in an application, it becomes impractical not to use an ORM. It's like working in a type-unsafe language. You refactor something, and SQL-in-code breaks all over the place. That path leads to the hiring of dedicated DBAs, and abstraction of the schema behind stateless layers of PL/SQL in a doomed attempt to get to grips with the complexity of the problem space.

I worked on a system with a very tough customer where they repeatedly demanded major schema changes that were sitting in front of a business logic layer and frontend that had already been written. While the project had lots of problems, those particular refactorings were very straightforward. I was able to modify the ORM, and then just fix complilation problems and a few obvious tentacles from them until the application recompiled, at which point it worked again.

Some more criticism:

    This leads naturally to another problem of ORM:
    inefficiency. When you fetch an object, which of its
    properties (columns in the table) do you need? ORM can't
    know, so it gets all of them (or it requires you to say,
    breaking the abstraction).
I'm rusty but remember that at least in WebObjects EOF at least you can nominate what you want to retrieve, including automatic joins to retrieve stuff over foreign key jumps

The author's first suggested alternative "Use objects" offers worse technical debt than ActiveRecord. I anticipate there are a lot of shitty systems being on top of key-value stores. You can get fast results doing it, but it has technical debt and doesn't scale horizontally. The key-value store is becoming the next generation equivalent of "Oh we'll just build it in excel, and worry about the consequences later on". But depends what you're doing. There are situations where foreign keys are good.

The second alternative is "Use SQL in the Model" The advice of the heading doesn't match the content of the text that follows. I think the author means to recommend building a service that wraps the model by answering questions. If not, that's the point I think that should be made.

It's common for companies to create a database, and then have many entrypoints into it. This is a mistake and creates technical debt. As soon as you have multiple entrypoints like this, you lose ability to refactor your schema (because it's impractical to get multiple stakeholders to make concurrent changes) and your system rots.

Instead, you create a model service that wraps the schema, but also has stateful knowledge. For example - it knows the permissions of the user who is talking to it and can tailor its response based on their permissions. Then you return results in a transport format. I can't recommend a good, mainstream mechanism for this. JSON, YAML are fiddly because they're typed, XML is unnecessarily verbose

Anyway - there's no reason not to use a good ORM in this business logic layer. For small systems - sure - use SQL in the model. For the larger stuff, you have a more maintainable system if you use an ORM. But if it's a complicated space, steer towards Cayenne or Hibernate, rather than active record patterns.


I never found anything wrong with Active Record patterns, in all the projects I worked with. Of course, it depends on the ORM and it's a pattern that blends itself especially well with dynamic languages (IMHO, Hibernate sucks).

Of course an ORM will have SQL concepts because that's the nature of the beast. You can't abstract those away.

There is also nothing wrong with generating SQL for you, as generating SQL by hand is a task that involves string concatenation. And when adding filters to that SQL based on lots of runtime conditions, it gets freaking painful to do it - it's a lot nicer to work with expressions that are building a syntax tree, to which you can add and add whatever you need.

Also, about efficiency, the dumb ORM in Django can do this (prefetching relationships on one level):

    user_rs = User.objects.filter(groups__name='Badass')\
       .select_related(depth=1)
It can also do this:

    rs = user_rs.values_list('groups__name', 'username')
Which returns an array of (group, name) tuples. No inefficiency there and no leaking abstraction either.

You can also drop to pure SQL, if you're so inclined for efficiency reasons. In that SQL you can also restrict the columns you're selecting, and so the model objects returned will lazily load missing attributes on request ...

    user = User.objects.raw('select id, username from auth_user limit 1')[0]

    # already retrieved
    user.username

    # this works, but a second query is made
    user.first_name
So, yeah, people can use raw SQL or whatever fad-du-jour they want. I'll just be busy working on stuff.


The way I think about it, active record is the strong option for crud, but doesn't scale well past it.

Once you're dealing with many tables I think you need the ability to accumulate a graph representing a new state and then commit it as a transaction.

While you could do this in an adhoc manner with active record, it's not a way of life in the way it is in EOF/Cayenne.

There are complications of the object to relational mapping that it doesn't deal with. For example - no mechanism for dealing with situations where someone change data underneath the object graph a user is working against in memory. AR would blast over the top of the data with what it has in memory, potentially breaking in the process.

Some ORM systems have locking levels. For example, with 'optimistic locking' you get an exception if data changed underneath you. It's handy to be able to catch an exception, email the support team, and tell the user to freeze and contact them, in a way that doesn't discard the data they've just entered.

I find it backwards that active record exposes methods like insert, delete, update. These are low-level sql concepts, and you shouldn't need to care about them within an object system where you're thinking in terms of a graph of data. It's very wired to the database though. As far as a "mapping" between object and relational, active record is crudely simple.

On a positive note, you can hammer an AR out in your language of choice from memory, without reference to any external libraries or the like. It's a good wrench in the toolbox.


Yeah, but the relational model is terrible for representing graphs, no matter the tools you use, you are still bound by the representation you choose in a relational database.

What I like about AR is precisely its close correspondence with a relational database. Objects have create/save because they correspond to DB-tables and that's not something I want to forget when working with such a database. Personally I want to control when a specific object gets persisted, as to me that point is crucial, both for performance and for data integrity.

When the relationships between tables get too complex, that's a good time to reevaluate your data-structures and how you process that data -- as in, do you really need a live graph inside your requests, or will a pre-processed view will do, while having hooks for inserting new data and rebuilding your models asynchronously (on a single thread to avoid race conditions)?

Also, there are some databases around that are specialized on storing graphs. Even if the relational model is very general-purpose, sometimes you're better off choosing a system that's more suited for your goals.

    a mechanism for dealing with situations where 
    someone change data underneath the object graph 
    a user is working against in memory
I kind of understand what you're saying, but you should avoid sharing of resources like that. Choose stateless requests, even when building non-web apps.

Do not let users trip over each other and if you do, construct a trail of actions taken to be able to Undo (as in, the Command pattern).


I like your points.

It clarifies for me that the goals of Active Record and other forms of ORM are so different that it's misleading to categorise them together. Active Record presents a simple and unambiguous interface to the way things are, [other systems that I don't have a noun for] try to abstract it away.

The way we use databases is interesting. In the early days of computing, a lot of the low hanging fruit was reached via software that wrapped databases. As a result, the database tools reached awesome power, maturity, and mindshare. As a result, they now get used for things well outside of their domain.

I suspect there are patterns that get squeezed out by this, and it's a topic I'm interested in exploring.


You described the reasons I stuck with my own psuedo ActiveRecord on top of NHibernate's basic Transaction and Session support. It's called "unit of work" and it makes much more sense to me. We're very specific (for simplicity and sanity) that all HTTP operations to our webapp are autonomous (from the user perspective) and stateless. No multipaged forms, etc.

So every request is given a session and whatever needs to happen as a result of that request is encapsulated in that session. If something goes wrong, that transaction in that session can be reverted. If NHibernate determines that it needs data modified early in the session for something later in the transaction, it handles it behind the scenes. Then an ActiveRecord pattern can be thrown on top of that and you still get CRUD within a "unit of work" context.

I like it. :)


I've used two different formal ORMs, ActiveRecord and JPA (backed by hibernate), and I've never felt completely at ease with them. In fact, I was lining up to agree that ORMs suck, except that I realized I'm probably using one no matter what I do.

If I have a model object, and I want to to persist it in a relational database, then I'm going to need to do something that persists and retrieves this object back and forth from the RDBMS, right? And if I want to retain the flexibility to switch to a different database (or different persistence strategy in general), then I'm going to some way to specify the implementation details for each possible approach, and swap them in depending on which approach I take. The java world tends to handle this with an ORM and a DAO tier these days, using DI to swap in the desired implementation (sigh, Java really is a soup of acronyms these days), whereas Rails developers tend to use migrations. But either way, I'm pretty much stuck with an ORM. It may be an ORM that works at a very low level, directly with objects, sql, connections, and transactions rather than through a higher level API, but it's still an ORM... (right?)

In spite of the inevitability of an ORM (I really hope I'm defining this correctly), I'm going to agree with a lot of the points made in this blog post. If I'm using SQL, I really don't like having the SQL hidden from me. And I really can't stand languages (like HSQL) that force me to re-learn a variant of SQL. ActiveRecord and Migrations are, without question, very productive, but I like to see the objects and understand very directly how they are being persisted and retrieved. I want to see the fields and methods, and I want to see the SQL. I've found that I almost always end up changing it, and it's easier to do that when it isn't all hidden from me.

Rails offers me so much that I can get over this little issue of mine, but I don't feel the same way about Hibernate. My personal experience is that if Rails isn't going to help me, and something has pushed me to use Java, I probably need to write a lot of lowish level code anyway.


Well, I can't speak for others, but in my small-scale use of databases, hand-coded SQL has never been an issue.


Same here, ORM just adds another layer of frustration for me.


I'm a noob when it comes to designing and implementing programs that interface with a relational database. Based on my small experience so far with an ORM, I'd say this post is spot-on, clearly articulating the frustrations I've felt on my project. A friend of mine even wrote a blog post about the problems I've had:

http://es-cue-el.blogspot.com/2011/03/entity-framework-and-i...

That said, though, I do wish there were more detail on this point:

The programming world is currently awash with key-value stores that will allow you to hold elegant, self-contained data structures in huge quantities and access them at lightning speed.

I'd love to know more about such libraries, frameworks, or tools, but this isn't a lot to go on.


I suspect a major problem with ORMs is that a number of people are using them without a full knowledge of SQL beforehand. Relational databases are complex and you need to understand what's going on if you want to make best use of them.

I've never had any issues with ActiveRecord (although I agree that it's mixing of data access and business logic can be a problem) - but I also know what needs joins, which columns to include, which indexes to use, when to drop to raw SQL; all from years of writing complex SQL and stored procedures by hand (and I never want to go back to that). And I don't ActiveRecord to magically guess that stuff for me.


search for noSQL. He's right that it's a misnomer, but it's hard to give a recommendation because they are optimized for different use cases. http://en.wikipedia.org/wiki/NoSQL gives a good set for the various use cases.


I inherited an incredibly hairy, large, mission-critical database at my current job. While we're slowly phasing in its replacement, we'll be interacting with the current mess for a long time to come.

There are seemingly endless little insanities, like "this column references upper(substr(othertable.column,5))". Instead of trying to remember all such idiosyncracies, I defined them all in SQLAlchemy and added a _lot_ of unit tests to make sure I don't accidentally break one later. Now I can use programmer-friendly ORM joins in production code and not have to worry about getting all the weird rules right each time.

I'm perfectly comfortable working in SQL. I don't want to write it directly all the time, though, any more than I want to have to write assembler all the time.


Yup, spot on analysis.

I do think he missed one thing, however. The few times I've seen ORM layers work well is when they're custom-built for a specific application. It's still not ideal, but it lets you put all the ugliness in one place, and regain efficiency by sacrificing generality.


I'd say the problem is more with OO. The author touches on this a little towards the end, but ORM fulfills the demand to put relational data into objects. The issue is with all these developers insisting that they want to think only in a limited set of types, even when the page they're rendering needs half the properties of one type and a few more from another type. Something like linq2sql can actually be the answer here, if you use it to select an anonymous type containing exactly what you want, the you only need one round trip and you haven't wasted and processing.


"If your project really does not need any relational data features, then ORM will work perfectly for you, but then you have a different problem: you're using the wrong datastore. The overhead of a relational datastore is enormous; this is a large part of why NoSQL data stores are so much faster."

I have never been able to receive a straight answer to this question: Is there a "NoSQL" database that provides the same ACID properties that major RDBMS databases do? Things like "eventual consistency" are entirely unacceptable for the software that I work on.


I'm not trying to say it's the best fit for you, but Apache CouchDB is fully ACID compliant: http://couchdb.apache.org/docs/overview.html



Interpreted pedantically, removing ORM techniques means dealing directly with resultsets or loosly typed structures. This is definitely what I DO NOT want in any of my views. If you're working with an OO language like C#.NET or Java, good luck!

Any abstraction on any level is going to add a performance hit no matter what.

If this was really an issue, wrap your ORM Framework stuff (differentiating from ORM the pattern) in a DAL layer so that your BLL does not worry about the existence of the ORM. Then as you scale, optimise your DAL with either inbuilt optimisations or when desperate write your own SQL (if you don't even know SQL then you're a poor excuse of a developer)

Think of them as like Ikea furniture - they don't look great, and they don't often fit in every household if they have complex requirements. But they're highly modular, and easy to assemble. So when you need something in a jiffy, just bring it home, fix it up and it'll perform its purpose. When it no longer fits the purpose, get something else. And every household has to just start somewhere.


I'd like to learn more about the author's background with ORMs and especially activerecord ... I know SQL pretty well, in fact when I started using Rails, I insisted on still writing my own SQL queries by hand.

ActiveRecord might be an anti-pattern, or it might not ... I really couldn't care less, what I do know is that I enjoy dealing with the database using active record far more than error prone dynamically constructed SQL queries I was doing back in my PHP days.

It makes my life as a coder easier ... I mean, have you ever tried to construct a really complex search function on a web app using SQL? ... its a pain and a half. ActiveRecord makes stuff like that much easier (named scopes in Rails especially)

Yes, if you don't understand databases, you're going to use an ORM in shameful ways, but it works well ... very well, if you know what you're doing and you take the time to learn your craft.

I'm glad to have ActiveRecord in my tool belt every morning when I get to work and that ... is what really matters to me.


I think that there are definitely some valid points about efficiency while using ORMS when the queries get more complicated. However on a simpler scale, like a blog, ActiveRecord or other ORMS aren't horribly inefficient and are faster and easier for the programmer which is why people ultimately use them.


The point Seldo was making, however, is that for simple applications like this a relational database doesn't make sense, and is in fact overkill.


Except RDBMSs are ubiquitous. Whether or not they're overkill, they're cheap, easy, and well understood.

What web host in the world doesn't run MySQL? What platforms don't support Sqlite3? What PaaS doesn't give you easy access to Postgres or another database.

Compare that to support and sysadmin knowledge of Mongo, Redis, Couch, whatever: there's no comparison.

It's only "overkill" in the very limited sense of, what, CPU cycles? Unused relational algebra potential?


I think the argument is that SQL+ORM is not that different from NoSQL. You have a point that it's probably easier to use an ORM than a NoSQL store in the real world.


I don't know... I've spent a couple of days writing code for Redis to store a list of 15 objects whose date fields are the 15 lowest. Granted it's my first time and I didn't/don't know what I'm doing, but it's quite a lot code for something that would be trivially simple in SQL.


In high quality RDBMSes like Oracle and SQL Server, there is not a whole lot you can do to tune your queries and the majority of your bottleneck is likely to be in how you are using your data. If you have inefficiencies, you either have a poorly designed schema or out of date statistics/indices.

The story is slightly different for open source RDBMSes which have inferior cost-based optimizers and multi-version concurrency control implementations that fail on a wider range of corner cases like:

select count(*) from Billing_And_Accounts_Receivable.Transaction

which is slow in PostgreSQL for reasons I'd rather not take the time to teach about here.


"there is not a whole lot you can do to tune your queries"

What does this even mean? For any query which relies on data from different tables, yes there are opportunities to tune the query, and there are opportunities to build different kinds of indexes, and there are opportunities to de-normalize the data and remove the join completely. This is independent of your RDBMS vendor.

The statement that Oracle/SQL Server are 'high quality' and the implication that open source RDBMSes are not, is flame-bait. All RDBMSes (that I have experience with - including Oracle and SQL server - so caveat emptor) will hit a performance wall when an index becomes invalid and the engine has to table-scan - and indexes can become invalid automatically as the amount of data in the tables changes - typically this will happen at 3am in the morning when you are on call.

There is also the hardware that the files are stored on, and the balancing act of price/performance of splitting tables onto separate spindles, using SSDs for semi-hot tables, cramming in more ram, upgrading the network connection between db and app servers etc etc. Every real world application needs constant tweaking from query to hardware as bottlenecks appear in different layers. Again these issues are almost entirely independent of the RDBMS (apart from with SQL Server that only runs on Windows, so the deployment platform is limited by what Windows server supports).

Nonrelational datastores can also hit performance walls too - the fact that some of the nonrelational stores are newer also has the added 'fun' that the performance characteristics are not completely known.


Commercial DBMSes have way better customer support than open source DBMSes. Not only do they better understand the performance characteristics of their clients, but they have dedicated teams whose job it is to isolate corner cases where the DBMS is not optimized for a particular environment.

My statement -- there is not a whole lot you can do to tune your queries -- is about rewriting the queries themselves, such as join hints, etc. Changing RAID configurations, organizing the data physically on disk differently, etc. have no bearing on the quality of SQL an ORM generates. So what does? Well, 20 years ago many DBMSes required you to compile stored procedures in order for the cost-based optimizer to cache anything. Today, not only does SQL Server 2008 R2 support fine grained plan caching and allow you to adjust the size of that cache, but you can remove from the cache any plan you dislike. You can also force a poorly performing query to use a specific plan cache. This is one of the ideas behind the LINQ Re-Motion project for .NET.

I am really arguing about where to put work effort into, and really NOT disagreeing with you. We are talking past each other.

I personally believe, as I wrote in the author's blog comments, that an ORM based on an algebraic model would likely be better than the big and irregular ORM APIs we have today.


If a project uses object-oriented programming and a relational database, it will have an ORM. Either one written somewhere else (hibernate, Entity Framework, etc.) or one written in house (whether or not it is thought of as an ORM). An ORM maps object data to relational data and back. That's all.


I started writing a response in a comment, and then I started frothing at the mouth, and pretty soon it ballooned into a whole blog post:

http://news.ycombinator.com/item?id=2659442


So I don't have deep experience with ORMs except for SQLAlchemy. I've worked with others such as Hibernate.

All I can say is that I love SQLAlchemy. Projects that use it make my life easier. It is easier to read and troubleshoot than pure SQL.

Just because the majority of ORMs such doesn't mean they all suck. It's like saying "web frameworks" are anti-patterns. ORM done right can be a god send. Like all patterns knowing when to apply it is key. Go around creating FactorFactoryFactoryObjectFactory and of course you'd think it is an anti-pattern.


god. thank you.

I have never met an ORM that didnt eventually rub me the wrong way.


As they say, familiarity breeds contempt.

I can name a hundred things my wife does that annoy me, but that doesn't mean I'm not worlds better with her than without.


True, but some people might be better off looking for new partners---and new alternatives to ORMs.


>When you fetch an object, which of its properties (columns in the table) do you need? ORM can't know, so it gets all of them (or it requires you to say, breaking the abstraction).

Not true. If you take the nHibernate approach of returning a proxy argument then you can get clients to "tell you" without breaking the abstraction. You normally don't worry about this, though, because pulling 30 properties usually isn't much different then pulling 3.


I have felt the pain of Hibernate. It just sucks. But I think that it is possible to make an ORM like solution work. A great example is NeXT's EOF and now Apple's Core Data. The combo of awesome mapping tools and Objective-C's dynamism make for system that works very well for it's intended purpose. Mind you core data is not a database and likely would not work well for a web service, but I believe that EOF did.


Agree. Talked about how severe ORM impedance mismatch is here: http://datasyndrome.com/post/3257282059/data-driven-recursiv...

Your model needs to fit your view.


Generalizations are an anti-pattern.


ORM's tend to suck, easy way out. delegating control to a custom ORM says to me, OK give me performance issues. Design your Domain model, keep it simple at the DAL and use Stored Procedures, easy life, ultimate flexibility.


Sometimes 'good enough' really is good enough.

For christ's sake, there's a zillion ways to mitigate ORM related performance hits. One of those zillion is 'stop using an ORM' but it's not likely to be your first choice.


Seems like a "horses for courses argument" but maybe that's just the sign of my being a cranky old man. More important than using an ORM is that everyone consistently uses the same toolset for the project.

Personally I can't say I'm a big fan of using an ORM. Just too many bad tastes in my mouth over the years from bad implementations. It's probably improved by now but I long ago developed tools to generate the boiler plate code I need to work with a database. This gives me a generated data layer and a bare business object layer that moves the data out of the data layer. With the metadata available from databases it's fairly simple to automate the generation of the code.

Once you have a tool that generates the code then an ORM has much less to offer.


Another mistake the author is making is calling the ORM "an abstraction".

ORM is a mapping technology: it takes input from one world and turns it into data suitable to another world. It doesn't abstract anything.


Technically, (my interpretation is that) it abstracts your application from your db. The ORM acquires the data, allowing your application code to concentrate on using the data.


I 100% completely agree with the article. ORM's are dangerous.

ORM's solve a problem we dont really have, but introduce a whole load of new problems we never had before (lazy loading does not work, object relations work completley different from table contraints, designing the domain layer to fit our persistance layer, learning new proprietary query languages, inability to control sql queries)

I am continuously amazed at how keen developers are to adopt them as a core part of a their product.

Thanks for the article - I feel relieved others feel the same way!


it's a pity and a sign of ignorance that people do an implicit assumption that ORM == active record pattern, while the other ORM pattern: data mapper is in fact widely used and superior for many use cases.


BTW, how are you working around these days the most obvious problems of an ORM?


By not using them


Yesterday morning I would've called this guy a cranky old man... but doing some coding last night made me want to kill something. Coming from an Active Record background I tried using Linq to SQL. My application has a WPF front end and a Windows service on the backend. Passing the same object between the 2 is driving me insane. The problems are so much more cryptic, and the code I had to write to go around it completely negates any reason for using it in the first place. I'll be ripping it out tonight.


ToList() usually solves most marshalling problems (not that that is your problem, but it's quite common). Nothing wrong with Linq2SQL, it's just another technology that has a learning curve.

There's more than enough information out there to solve most problems I've ever run into with Linq or EF. Whatever you're dealing with, someone out there probably already dealt with and blogged about it or asked a question and got an answer somewhere.


Is this guy talking about object relational mapping or object role modeling?


I'm guessing the former as I've never heard of the latter.

Is it time for a Central Registry for TLA's? (CRT! Damn. It's taken...)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: