Hacker News new | past | comments | ask | show | jobs | submit login
What's left of NoSQL? (use-the-index-luke.com)
144 points by MarkusWinand on April 25, 2014 | hide | past | favorite | 104 comments



As usual, the next-big-thing-to-replace-the-old-crappy-thing has become nothing more than a useful addition to our toolbox.

Which is not a bad thing, but I wish we could do it without the overhyped nonsense. It really is time for this business to grow the fuck up.

The "web generation" has brought us lots of great changes, I would have quit IT back in the 90's if the internet hadn't happened. But this immature attitude towards everything from technology stacks to how to run a company is really starting to grate.


You are right. They are just tools nothing more, nothing less.

For those of us that have been doing software development for a long time we would have heard countless vendors hyping their products. It's what they do and will continue to do. Normal developers have always ignored it and chosen the right tool for the job.

I actually find the anti-NoSQL crowd to be the immature ones and are often quite patronising as though I am some idiot for choosing a particular database. It's quite bizarre.


The "web generation" has brought us lots of great changes, I would have quit IT back in the 90's if the internet hadn't happened. But this immature attitude towards everything from technology stacks to how to run a company is really starting to grate.

Reminds me of back in the 90's when object-oriented databases were going to rule the world and software was all going to be write-once-run-everywhere, or the early 2000's when static typing was the downfall of society. Meh. Every generation's got something like that.

As skeptical as I am of the latest trends, I sincerely hope that people stay excited about them, and keep using them. That bubbling cauldron of ideas does generate a lot of noise, but it also generates new ideas and progress.


This is an incredibly short-sighted piece for two reasons

1) the data challenges of yesterday's internet giants are the data problems for just about every 21st century enterprise tomorrow. We're at the start of / in the midst of an irreversible data explosion.

2) Established players not taking new technologies and upstarts seriously is never evidence of the threat not being real. We know that on the contrary our industry is characterized by constant disruption where established players for whatever reason do not consistently stay at the front line of innovation and tend to get leapfrogged. (as an analogy Google+'s lurch into social may be akin to Oracle's lurch into nosql. Does that mean social was not real?)

Bottom line: It all depends on what you're doing. But more and more of us will be dealing with more and more complex data in the years to come. Of that, I'm certain.


I agree that's a bit short sighted, but I think it's one of perspective. The mistake most people make is to believe that Oracle (and other relational DB vendors) are competing directly against the various NoSQL stores (this includes doc stores, scalable key-value stores, etc.). I don't think that's true. I think that relational databases are really competing against industry-specific SaaS. So instead of implementing their own database for inventory & sales, companies may opt to use a service. These SaaS companies, in turn, are more likely to adopt a variety of technologies (including NoSQL and relational). Since the SaaS companies serve more than a single company, they're also more likely to adopt easily scalable technology like Cassandra. So fewer businesses will need to purchase databases (of any sort), but the ones that do will have greater scalability needs.


NoSQL is absolutely competing with Oracle and other relational DB vendors. You are seeing it today with the valuations for companies like Mongo or DataStax. SaaS is hardly a big enough market compared to say every enterprise. And surely the companies listed on the client pages agree with me.

And people forget something important with SaaS. Data sovereignty. Here in Australia for example there are many enterprise companies who are forbidden from using ANYTHING that is hosted in the US due to the grey legal area e.g. Patriot Act. So in-house databases are absolutely still here to stay.


It's interesting, my experience with companies here in the U.S. (mainly SMBs) are that they feel much more comfortable using a SaaS compared to adopting an internal NoSQL store. Granted most of these companies aren't highly technical, but I suspect for them using a SaaS is easy and doesn't require much internal expertise. Adopting a new NoSQL store though may require training, etc. Of course I concede that things may be different in larger enterprises and companies outside the U.S.

I would bet, however, that most of the clients that do business with Mongo or DataStax still intend on using/maintaining a relational system. I haven't really encountered many companies that's decided to completely dump their relational systems in favor of something else.


Agree. Where you see nonrelational popping up analogously is particularly in new SaaS shops, many of which aim to take some Oracle's pie.


NoSQL is not a superior tech, it's a trade-off.

What you are arguing is that future people will value data security and performance less than scalability... Well, I doubt it, but my crystal ball is as useless as yours.


It's a superior tech because it allows you to develop to your business domain (which changes all the time anyway). Supporting infrastructure really should not be more important than your app code. SQL is almost a first class citizen in most old apps, which is tying dependencies too closely and makes for some bad apps.

Also for all consumer apps, people value scalability over security and performance today, yesterday, last week, a year, 5 years ago, whatever. You should be using NoSQL for that. And B2B almost all business domains.


I'm not saying it's superior, rather it finds a significant place in the stack for many of the most cutting edge data focused companies. So it's here to stay. My argument is that those cutting edge data focused companies will be just about everyone to varying degrees in the years to come. There's room for many technologies in the enterprise abd it's naive to assume as the author does that Facebook and Google's challenges are unicorns.


Facebook's and Google's challenges are not unicorns. But they are leviathans, and the author is somewhat correct to observe that NoSQL solves problems that we all "would love to have".

Or perhaps only think we have - I know a lot of developers who believe that an RDBMS should be kept at arm's length and only spoken to through an ORM, or who've never spent much time exploring what they're missing by never straying far from SQLite and MySql. These same folks seem to have a remarkable ability run into "big data" problems that "can't be solved by an ACID-compliant RDBMS" well short of the point where I'd only be thinking, "Huh, this DB's getting big enough that I might need to spend more time thinking about my indexing strategy."

I do agree wholeheartedly that more and more companies will be encountering situations where ACID proves to be a impediment to scalability in the future. But that's then, this is now, premature optimization is the root of all evil, and "Getting ready for a problem we might encounter in the coming years" is often just a fancy way of saying, "Solving a problem we don't actually have."


I am wondering if the prolonged latencies from most of the Google services I am experiencing in the past few years are somehow related to using Spanner instead of the previously conventional "compute in advance offline"-"serve immediately" approach. The Spanner paper mentioned they sacrificed a bit of latency (from nanoseconds in DHT lookup to milliseconds). I understand it's more convenient for Google developers to have more predictable thought framework instead of making every single project a piece of art while spending most of the time fighting inconsistencies arising from eventual consistency. The question is if it was worth it? I remember time when Google services were amazingly fast - that time is unfortunately gone :-(


In what context are you referring to the "compute in advance offline" part? Sometimes it's costly to compute things in advance and store them, when you might not need them. If storage space becomes a concern, then Google would have to write some fancy algorithms based on data access patterns that determines what to compute in advance, and what to compute JIT. Which complicates things because now they have to write and regularly run programs that tell them what to compute, which simplistically means that they are at least running 3 different tasks, instead of 1 on-the-fly computation.


I meant the traditional NoSQL architecture (like what LinkedIn was using) where you had offline batch processing based on Map Reduce executed regularly during the day and the results were stored in a distributed hash table with super low latency and eventual consistency. This made access super fast but inflexible, i.e. only what was precomputed could have been accessed.


We switched from SQL to Mongo on http://versus.com one year ago and are quite happy—we couldn't imagine to go back.

But I think it heavily depends on the specific use case what DB to use.


I agree, Mongo is easy and fast.

My problem with NoSQL is that you MUST know there won't be changing requirements:

Later you might need to do the equivalent of joining 3-4 tables. Then you have trouble... But I'm a coward that look both left and right before crossing a street.


"My problem with NoSQL is that you MUST know there won't be changing requirements"

I thought the main advantage of document-oriented schema-less databases (I haven't used MongoDB but I have used CouchDB) was that they are supposed to give more flexibility in the early stages of projects.

NB Personally, I've moved to PostgreSQL and its JSON storage as then you get most of the benefits of both worlds.


> I thought the main advantage of document-oriented schema-less databases (I haven't used MongoDB but I have used CouchDB) was that they are supposed to give more flexibility in the early stages of projects.

Depends on your requirements, I guess. If you want to maintain even eventual consistency, any update that crosses document boundaries has to be thought about very seriously in order to avoid breakages in the face of concurrent modifications. So, ideally you design your documents such that any one logical operation only has to hit one document.

This is pretty easy for very simple tasks, but can get rapidly more difficult in the face of more complex or changing requirements. I suspect a substantial proportion of Mongo or Couch based applications simply ignore this problem, and are lucky that they don't have enough concurrent activity that stuff breaks frequently.

Using Postgres/JSON neatly avoids this problem because you get ACID back, and you can do cross-document updates all you like.


>>[NoSQL] are supposed to give more flexibility in the early stages of projects

I agree with what you wrote.

But my point is that _changing_ requirements can screw you, if you find that you _later_ need relational capabilities.


Well, that's why I gave up on CouchDB once I found that I was actually much happier with the hybrid approach of using JSON documents inside a relational structure.


MongoDB is actually well suited to changing requirements.

Because it is schema less and document based you can trivially add new rich schemas to existing documents. And in the case of doing joins you can always do it in application layer or using DBRef or MapReduce. There are valid options that are still quite performant.


"Because it is schema less ... you can trivially add new rich schemas" made me lol. Maybe this makes sense to Mongo users, but I have no idea what you mean here and the language you use to describe this feature is, well, contradictory to say the least.


it means a typo in release-15 auto-creates a whole new database without anyone noticing and you sit around and wonder where all of the data went yesterday.


Oh man, the memories.

Just today I enjoyed explaining to a developer why his insert into a timestamp failed because of formatting, with a nice english error message. Thanks Postgres!

Whereas the same thing last year with Mongo just inserted the wrong date into a misspelled key and we didn't figure it out for days.


Actually it does seem contradictory reading back on it. I meant it doesn't have a fixed, enforced schema like SQL databases and it is easy to add new data structures e.g. lists, maps, sets.


An interesting article and it does seem like people rushed to embrace NoSQL and are now trying to force it into some kind of consistency after the event - not a bad thing, incidentally - there's a lot of interesting work here (Vector clocks, CRDTs, etc.).

One thing that surprised me though was the lack of a key player: Amazon. Their Dynamo paper was hugely significant and as a company they use eventually consistent stores for a whole swathe of products at scale.

Why mention Facebook and Google but omit this other major player, especially are their experiences tell a different story.


Amazon estimated that each 1 ms of additional latency costs them a few million dollars a year as well as decreases rate of returning users. For low latency there is still nothing better than eventual consistency, so they may be driven by the bottom line. Also, from my experience of being a merchant on Amazon with hundreds of thousands of items in inventory that need to be updated almost realtime as prices/stock changes all the time, leading to a few millions updates a day (in bulk), I can't envision how Amazon would be able to achieve fast import from a horde of merchants like me on any SQL system.


Yeah, the Dynamo paper is very clear about the rationale behind creating DynamoDB and its usage:

- downtime (lack of availability) is very expensive to them, both directly and also reputationally

- traditional ACID stores don't scale with writes whereas an eventually consistent store can achieve this

So I completely agree with you in short: Amazon have great reasons to pursue approaches based on eventual consistency.


Relational databases don't scale horizontally. That's still as true today as it was a decade ago. Therefor engineers who need to cope with Big Data and Web Scale will continue to migrate away from relational database solutions towards persistence technology centered around distributed systems. It's as simple as that.

If you build your app around a relational database and you need to scale up big then at some point you're going to hit a brick wall in terms of scaling out storage and/or writes. You either have to build sharding logic into your relational db app from the beginning(which is a pain that NoSQL saves you from), or else you have to re-architect your entire app when the time comes that you need to deal with scale. Many shops end up borrowing VC money to build out a team to re-architect their systems to handle web scale, but this can be avoided by thinking about data access patterns from the beginning and choosing a technology that can handle your future needs.

https://en.wikipedia.org/wiki/Scalability#Horizontal_and_ver...


AdWords was run on MySQL up until two years ago. The vast majority of developers working on "web-scale" projects won't get to handle projects with larger requirements than AdWords. In that perspective the whole NoSQL trend makes very little sense, especially given the fact that e.g. PostgreSQL and MySQL have a lot of needed features that many NoSQL databases don't, and that the most "hip" NoSQL database a couple of years ago was a database that doesn't even scale that well (MongoDB).


AdWords was designed with sharding built into it's application layer. This essentially means that the developers were forced to implement a custom app-specific persistence layer that rides on top of MySQL.

This limits your ability to write SQL queries because you can only query within a particular shard and that decreases the usefulness of mysql and makes it feel more like a nosql data-store. At that point you find yourself asking why didn't we just use a NoSQL store ? And often times the answer is "because we wanted to use something we were already familiar with". Sometimes people are willing to add a lot more complexity to their application just to allow themselves to avoid having to learn something new.


"At that point you find yourself asking why didn't we just use a NoSQL store"

Because you still have the full ability to write general SQL within the shard. For many types of application this is useful.


That's a fair point. If you need highly relational queries within a given shard then using a relational database there can make sense, but for many apps the kinds of queries that are used are not highly relational in nature so using a NoSQL store could have reduced the need for all of that app-specific sharding engineering while still allowing for all the queries that were needed by the app.

I personally find that SQL relational databases are not flexible enough for my application so I go with a polyglot persistence architecture that ties together a nosql key-value store with a graph database technology. This way I can use the cypher query language to access data relationally across all my shards while scaling writes and Big-data horizontally.

To each his own.


"This way I can use the cypher query language to access data relationally across all my shards while scaling writes and Big-data horizontally"

That sounds pretty powerful. Do you mind sharing more information about the data storage/querying stack you use?


NDA and all of that, but if you pick up this book:

http://pragprog.com/book/rwdata/seven-databases-in-seven-wee...

and grep for the phrase "polyglot persistence" you'll find some great information to get you started.

a video promo for the book: https://www.youtube.com/watch?v=bSAc56YCOaE


Can you talk about what sort of queries you do in Neo4j?

Graphs are perhaps the only area where NoSQL may beat relational databases for expressiveness, rather than alleged scalability or ease of use. I've read the documentation for Neo4j and Cypher, but never used them in anger, so i'd be really interested to know more about how they are actually used.

In particular, i would love to be able to make a comparison between a graph database and a relational database queried using recursive common table expressions, in the context of a real-world problem.


The approach I like is to use a key/value document store as the system of record (SOR). If any data conflicts arise between Redis, the document store, and the graph database, then the SOR wins. A good SOR should contain all of the data necessary to rebuild any other data source in its domain.

Graph databases can scale write throughput fine when those writes are being fed to it from a single source so it's best to have a service who's sole purpose is to keep the graph database in sync with the SOR. The graph should only store the data you actually need in order to get the queries that you want.

The specific queries in my domain are exactly the kind of thing I'm not supposed to talk about, but I will say you can do things like:

Give me 50 users who who live near me (lat/long bounding box), who I'm not already following, who I have not already sent a message to, who have at least two friends who also live near me, and who have matching tags, order by number of followers.

there are a bunch of great examples in this awesome book: http://www.amazon.com/Graph-Databases-Ian-Robinson/dp/144935...


Thanks. Your example sounds like something it would be pretty easy to express with a RCTE in a relational database. How the performance would compare, i have no idea.


Yea performance is the thing. You're correct that most of these queries can also be expressed in a relational DB.

The underlying data structures involved in Neo4J do help with performance on these kinds of queries, but in addition to that I find that for me the Cypher query language feels like a more concise and elegant way to ask for the data.

With RCTE it feels like I'm first asking the DB to construct a custom data structure, then I'm asking the DB to query that custom data-structure.

With Cypher it feels like I'm simply asking for the exact data I need by specifying the relationships and attribute characteristics that I care about.

This might just be a matter of taste, but if you start playing around with Cypher it just might grow on you.


Sure. But only if ALL the data within that query exists on the shard. It is rare that this would happen if you have anything resembling a normalised schema. In which case you would still be doing a lot of joins in your application layer.

IMHO Sharded MySQL very much belongs in the NoSQL camp.


I don't know that this is true.

Imagine you're LeanKit, or Fog Creek, and you run a kanban board as a service. Or a bug tracker, CMS, whatever. You have many customers, each of whom has no more than thousands of users and millions of items. There are many relationships between objects belonging to a given customer, but precisely zero relationships between objects belonging to different customers.

Shard using the customer identity as a key, and you have nicely spread-out data and the ability to do any query the application might need to, while still having a normalised schema.

There are plenty of other application whose schemas have this property, or almost have it. In my company, we make financial applications, and a lot of the data has very similar siloed ownership structure.

The one thing you can't do is reporting queries across your customers. That doesn't seem like a killer, though - it's normal to farm that stuff out to an offline reporting database even in single-server environments.


You do understand how MySQL et al are used in those cases right ? They are treated as dumb key value stores and sharded horizontally with joins done in the application layer. They are NOT your typical SQL deployment and the features you talk about are often meaningless.

And you are 100% wrong about MongoDB not scaling well. The stories you hear of people switching are never going back to PostgreSQL or MySQL they are going to the next level in scalability e.g. HBase or Cassandra.


Like I said because the relational database doesn't scale horizontally you are forced to build a sharding layer into your application which introduces complexity into the application layer and limits your ability to use the relational features of the relational database. At that point you might as well just be using a key-value store that does the sharding for you and offers greater flexibility.


It's only a problem if your app needs cross-shard reporting. And things like Gearman can help you out with the application layer complexity of horizontal scaling.


there is still transaction in one shard beyond SQL. I believe that ACID is the main benefit comparing with NoSQL store


My NoSQL store of choices gives me document-level ACID semantics along with eventual consistency, MapReduce, and replication.

Here's an example that explains how to get transaction-like guarantees from this kind of NoSQL data-store:

http://guide.couchdb.org/editions/1/en/recipes.html https://en.wikipedia.org/wiki/BigCouch


Interesting, that isn't transaction, it is just a workaround, and I don't think you can design your app like that which treat document as a transaction log. And then using view to generate the real information.


that's exactly how I design the parts of my app that need a transactional nature. Map/reduce views make it a breeze to query for a consistent aggregate view of the world.

This is not something new it's a well established technique from the relational world called "event sourcing"

http://www.martinfowler.com/eaaDev/EventSourcing.html

when everything is a write you don't have to worry about conflicts or locks, so that's nice, plus couch is really good at scaling write thoughput with small documents.


Urban Airship switched from MongoDB to PostgreSQL.

I see they have since switched from PostgreSQL to Cassandra, though.


The usual thing, and I believe what was done with AdWords, was application-level sharding. This is in effect implementing your own database using MySQL as a glorified key-value store (you will NOT be doing any interesting joins, and you will not have cross-shard transactions unless you do those yourself too), and is thus not really a great argument for relational databases.


Sharding and SQL are orthogonal.

However SQL doesn't make the problem of handling distributed databases magically disappear.

An example of distributed, horizontally scalable database supporting strong consistency and offering an SQL interface:

http://static.googleusercontent.com/media/research.google.co... and a layer above: http://static.googleusercontent.com/media/research.google.co...

It still requires the users to carefully organize their data, according to a hierarchical data model provided by the database.

There is a mapping between a hierarchical relational model and a column store model. (EDIT: see one possible mapping in http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf)

The article skims over this aspect, as if all that matters is the syntax of the query language.


Oracle RAC does scale horizontally. Sort of: it requires putting your data in a SAN, but that is mostly horizontally-scalable as well.


I'm far from an expert on this topic, but I am a developer, and I am currently working on a project that leverages both MySQL and ElasticSearch for storage. MySQL handles all of the "boring" data such as users, profiles, comments, etc. ElasticSearch is basically a giant product database, no real relational data, just a normalized structure and easy to query. ElasticSearch is serving as a primary data store, there is no backing in a database because ES is just that good.

Whenever I see these "SQL vs NoSQL" arguments, I always have to wonder: Why one over the other? A lot of projects can benefit from both and there's no reason you absolutely HAVE to use one or another. It's perfectly reasonable (and probably ideal) to use more than one storage system in your projects.

If you have a bunch of nails to hammer in and bolts to tighten, you don't choose just a hammer or a wrench to do that job...you grab both and use each for what they do best.


> ElasticSearch is serving as a primary data store, there is no backing in a database because ES is just that good.

And you've been running this application in production how long?


I agree that using Elasticsearch as a primary data store seems risky. The situation is improving though; the v1 release introduces easy backups (for example, to s3) which from my experience work very well.

Prior to this backing up was a messy process.


NoSQL discussions always seem to conflate three very different things: storage engines, APIs and architecture. Where do we store the data ? How do we access it ? How do we make sure it scales ?

The "traditional" approach is to use Oracle/SQL Server/MySQL for storage, SQL and/or ORM as an API, and single-server tables-with-relationships as an architecture. Back in the early 2000s, everybody did this. Sure, there were a few performance-minded exceptions that went with sharding or master-slave architectures instead, but those were exceptions.

And single-server architectures tend to behave badly at medium loads. Spend the market rate for a genius DBA, and they still behave badly at high loads. The next step is a 32-core 128GB RAM monstrosity that costs an order of magnitude more than what eight 4-core 16GB servers would cost.

Most NoSQL solutions came with a new architecture. You had the MongoDB flavor of distributed storage, or the BigTable flavor of distributed storage, or the CouchDB flavor of distributed storage, and so on. Properly implemented distributed storage eats high loads for breakfast: just add more servers. This is a good thing.

My issue with the NoSQL movement is that they threw away the baby with the bath water. They threw away the single-server relational architecture, which was a nice change, and they also gave up the old battle-hardened storage engines and the highly expressive SQL language and replaced them with only-recently-experimental engines and ad hoc lean APIs.

It takes time for a storage engine to mature. To have all its performance kinks ironed out and all its bugs smoked out. I still remember the brouhaha around MongoDB persistence guarantees, or the critical data loss bugs in CouchDB.

And the lean APIs just forced back all the querying logic into the application, with all the filtering and the manual indexing and the joins and the approximate but ultimately incorrect implementations of whatever subset of ACID was required at the time. This wasn't an entirely bad thing: it certainly made many developers aware of the performance implications of some joins or transactions. But when you need to write a JOIN or GROUP BY or BEGIN TRANSACTION that you know will scale properly, and there's no API support for it ? Feh.

I'm a huge fan of the CouchDB architecture. Distributed change streams, with checkpointed views and cached reductions. But I have been burned by the CouchDB storage engine (can you say "data corÊ–NÑ %ñXtion" ?) and I see no point in bending knee to the laconic CouchDB API. So I took the CouchDB architecture and reimplemented it with a PostgreSQL back-end. It's _faster_ (don't underestimate the cost of those HTTP requests), I have trust that after PostgreSQL's decade-long history all threats to my data are long gone, and I can always whip out an SQL query when I do need it.

It's nice to see so many NoSQL solutions migrating back to an SQL-like API and gaining enough maturity to keep your data safe. In the near future, I expect them to be nothing more than "Architecture in a box" solutions for when you don't want to implement specific architectures in SQL. And I expect more and more "architecture plugins" to become available: with a library, turn a herd of SQL databases into a distributed architecture of type X.


Single-server relational backends can survive loads well in excess of most of us will ever see for our software. There are a handful of truly web-scale companies that need specialized engineering, and the time to solve for that is when you actually have that problem in sight. Conforming SQL engines can do some amazing things with simple and declarative code, giving that up in the hopes you might be Twitter-scale one day is, in my opinion, a quite poor tradeoff.


I certainly agree that most people who pick NoSQL solutions "for scalability" never add a third server to their cluster.


Amazon offers 1-8 800 GB SSD drives in their i2 instances. RAIDing them together gives you pretty awesome performance and space.


> I took the CouchDB architecture and reimplemented it with a PostgreSQL back-end

can you provide more details on this? is there a public repo?


You can check it here: https://github.com/RunOrg/RunOrg/tree/master/server/cqrsLib I'm sorry for the state of the official site, there will be one soon.

A `Stream` corresponds to change events (I'm not sure these even had an official name in CouchDB).

A `Projection` corresponds to a design document.

The various `View` implementations are optimized for various aspects of CouchDB views, with `MapView` being the literal equivalent of a CouchDB view. Except they can be chained (you can apply a map to a map).

Unlike CouchDB, views are evaluated eagerly, though the `HardStuffCache` (and other planned `Cache` implementations) are evaluated lazily on a per-document basis.


I think you hit the nail on the head. One other thing I'd add was that these new tools forced developers to build tooling around their databases, too. No more ad-hoc query system (like sqldeveloper, or the many fine web tools) where you could explore the data visually or by issuing queries. Since the underlying data formats were entirely domain specific, building these kinds of general tools became much harder.


I think you're understating the capabilities of sql databases. Stackoverflow runs off a single dedicated ms sql server machine: http://meta.stackexchange.com/questions/10369/which-tools-an...


Your link mentions three servers with 384GB RAM and 16 RAID10, and 16 cores each. Not very far from my 128GB 32 core example :)


How do people that abstract away their database via an ORM feel comfortable about not dealing directly with the data and accidentally dropping a db column via ORM? I know when I am dealing with the database I am a lot more surgical in my approach than when I am writing code.


Schema changes are often (normally?) done using a separate mechanism and operations called migrations rather than in the main code.

In a production environment they should be well tested before deployed to the live system. That plus backups should help prevent calamities.

There are still possibilities of deleting the data whether through the ORM or direct SQL, both could be easily accomplished so test before deployment and keep regular backups in case of disaster.


The same way I feel comfortable that I won't accidentally write and execute a SQL script that does it. ORM doesn't make it especially trivial to corrupt your database.


"dropping a db column via ORM" - prevent this via security configuration, release schema changes to production via plain SQL. ORM is not a complete replacement for dealing with DBMS.


You generally only let the ORM create/update your schema during early development. By the time you get to production (or heck, by the time your team is even seriously engaged), you will have disabled that feature. In pretty much every enterprise-grade ORM I've ever seen, schema modification is disabled by default, and must be explicitly turned on. In most Java shops where I've ever worked... unless it's a quick prototype, the database schema will be developed before you start coding anyway. Also, in a typical enterprise scenario, the username with which you configure the ORM lacks admin-level privileges to alter the database.

Most of the criticisms about ORM's come from people who have never used them, beyond maybe working through a Rails chat-room tutorial once upon a time. It really has nothing to do with "abstracting away the database".

As the name indicates, Object-Relational Mapping is merely about reducing the boilerplate required to map a relational schema to programming language objects. If you do that mapping by hand, then you have to make decisions when a table/object has relationships. Picture a CUSTOMER table, which has a foreign key relationship to an ADDRESS table. When your application loads a "Customer" object:

[1] You could "eager fetch", meaning that you go ahead and retrieve all of the ADDRESS rows related to that CUSTOMER, and attach the Address objects to the Customer object. Eager fetching is wasteful and leads to poor performance, because you're hammering your database for values that you often don't ever use.

[2] You could "lazy load"... meaning that your Customer object has an "addresses" field, but you wait until some code tries to use that field before you actually query the ADDRESS table to populate it. This is much better design, but complicates things. The lazy load logic has to go somewhere. You either have to ensure that every piece of code using that object is aware of the lazy load pattern, or you have to stuff database logic into the "getter" method for each lazy-loaded field.

ORM's give you highly-performant lazy loading, without the buggy boilerplate suggested by #2 above. Moreover, enterprise-class ORM's typically handle caching for you, to avoid hitting the database unnecessarily. Monitoring the state of objects to notice when they've gone stale due to changes on the database side, etc.

Lastly, for complex queries, most ORM's have a query language (e.g. JQL, HQL, etc) that is nothing more than a VERY THIN wrapper around SQL. It merely smooths out differences between various vendor dialects. You're not abstracted away from the database, you still very much need to understand SQL and the underlying structure of your data.


you still very much need to understand SQL and the underlying structure of your data

That's good to hear! Though in my experience of interacting with people who have learned development in last 3-5 years, many of them are very oblivious to basic database concepts. They mostly think in terms of the object in the code, relying on the framework to take care of the database. It might even generate the sql that they just need to execute. But anyone who's worked with production database from before the advent of frameworks like Rails etc. knows that the only thing worse than messing up an sql statement is running auto generated sql.


I think your characterisation of eager vs lazy fetch is oversimplified. Lazy loading can be a source of substantial performance problems as well, since doing larger operations that do hit a lot of data causes enormous chattiness with the DB, causing you to lose a bunch of time to network I/O. Lazy loading was the source of a lot of early 'ORMs are slow' arguments.

A good ORM will likely default to lazy loading but give you good options for eager retrieval when you know you're going to need it.


And what you do if in one scenario you need lazy and in another scenario you need eager fetching? Create two separate mappings and tho separate class hierarchies to hold them? Ignore the problem and pray noone will notice it? What you do if you already closed DB transaction, and you need to access lazy collection?

In my experience, this lazy/eager is the main reason I don't like rich ORM mappings. I usually don't map collections and don't map foreign keys. If I need to fetch something, I just do it. If I need to return a result of a complex query, I just define datatype for it. In general, my ORM just converts recordset to object, and that's it it, besides simple record level CRUD operations. One record operations like findByPK or deletByPK or updateByPK are fine, but for anything else I prefer writing SQL, instead of suffering with some ORM specific SQL replacement or, $deity forbid, some api. SQL is as good DSL as you can get for RDBMS access, and is fairly portable between different programming langagues and ORMs, so no reason not to use it. But many


Yeah, ORM is really about portability. Your app should be able to plug into whatever database you have drivers for.

And it's also much more convenient to call a find() method than to concatenate together a (vendor-specific) connection string and query string, and iterate over a result set object.


You really should not be doing that. Databases should be nothing but ice-boxes. That's why ORMs. It's a bad practice to spend more tending to your DB than your GUI. And I'm pretty sure it comes directly from using SQL and stuff like that.


This is exactly wrong for the majority of systems. Your application is ephemeral. The data the application works with might last until after you're dead.


Being more surgical is not the equivalent of spending more time.


>> NoSQL is nothing more than a storm in a teacup

There's been a difference with this "technology cycle" though, there have been some prominent, well grounded voices from the start of this cycle.

That's a good thing.

That wasn't the case for other cycles - the thin vs fat client cycle, the DAS vs NAS cycle etc. etc.

EDIT: i'm not saying NoSQL has no application. I earn a paycheck working with a huge graph db (and it's nothing to do with "social", yay!), and have previously been a heavy user of Cassandra.


> NoACID

Perfect!


NoSQL has always been a stupid term, especially as there are "NoSQL" datastores that support SQL, e.g. OrientDB.


The term was perfect. The spectrum of NoSQL implementations out there defies an easy name to cover them all. They aren't all key-value stores, they aren't all json stores, they aren't all big tables, etc.

Indicating that they were NOT within the current paradigm of SQL databases was the only way to go.


The term is borderline moronic!

SQL is a QUERY language. NoRDBMS would be slightly less moronic...

Thbbft!


Why is it a good idea to lump such technologies together, when many of them are fundamentally different? Do we call road vehicles that aren't cars NoCars?


Ironically, we kind of do, at least in the US. We call the trucks. SUVs are a bit of an anomaly, but I've heard plenty of people colloquially call our explorer a truck.


You call motorbikes, buses, and bicycles trucks?


It's stupid if you don't realize it stands for "Not only SQL".


Finally a good writeup that explains the strange phenomenon of NoSQL. Now, all that's left is to introduce a tax for “drinking the Coolade”.


We are stuck with NoSQL in HTML5, because Mozilla and Microsoft refuse to implement WebSQL (http://en.wikipedia.org/wiki/WebSQL )

IndexedDB is fine for storing JSON objects, etc. but a relational database with SQL query syntax, indexes, etc. more powerful and means less code to write. With IndexedDB one has to reinvent the wheel to just get basic query features.

WebSQL is not deprecation, the W3C Working Group Note actually says:

  'This specification is no longer in active maintenance 
  and the Web Applications Working Group does not intend to 
  maintain it further'.
WebSQL is only available in Webkit based Browsers (Safari, Chrome) which means most mobile browsers. As SQLite is in public domain, no company would "loose their face" if they choose to use it. They could fork off SQLite and change the SQL query syntax (parser) to whatever the W3C finds suitable. https://www.sqlite.org

Mozilla Firefox and FirefoxOS both already ship SQLite for years and can be accessed by its internal JavaScript API. And several Microsoft products already use it anyway (e.g. Forza Xbox games). Microsoft has of course also various other SQL database libraries like MS Access JetRed, MS Outlook JetBlue and SQL Express.

We had a discussion about it recently: https://news.ycombinator.com/item?id=7574754

The new hip things is "NewSQL" (http://en.wikipedia.org/wiki/NewSQL ). For example Facebook, Google Ads, etc. are powered by MySQL's InnoDB database engine. I would go as far as count SQLite to this group.

We would need a movement to convince Mozilla to finally add WebSQL to Firefox and FirefoxOS.


There are good reasons why work has stopped on the WebSQL specification — everybody was using SQLite, and the specification can't be tied so closely to a single implementation of SQL. The specification even has the line "User agents must implement the SQL dialect supported by Sqlite 3.6.19"[1].

1. http://www.w3.org/TR/webdatabase/#web-sql

Edit: here is Mozilla's rationale for not supporting WebSQL: https://hacks.mozilla.org/2010/06/beyond-html5-database-apis...


While that looks ridiculous at first glance, one of the important things about the HTML5 spec was reverse-engineering to describe what the browsers are actually doing. I'd be more worried if it says "the latest stable SQLite", but referring to a frozen version is only being honest about expectations.

I believe SQLite caused issues with MS, but Mozilla was more philosophically opposed to SQL as being legacy tech. Which is understandable but unfortunate when the standard had momentum and SQL is so well-understood.


One of the things that's well understood about SQL is that interop is terrible. No one expects to write anything substantial using Postgres and be able to port it over to MySQL or MSSQL without a serious amount of effort.

On the backend, that's more or less OK, although one of the value propositions of ORMs is that they make it somewhat easier to change databases later by reducing the amount of product-specific SQL that has to be rewritten. On the web, however, it's totally unacceptable. Sites are expected to keep working between multiple browsers and across multiple browser versions. To achieve that level of interop would have required someone to sit down and write a detailed specification of the dialect of SQL that should be supported on the web, and for browser vendors to take the time to implement it. In practice people weren't willing to spend the time doing either of those things; the level of buy-in was enough to fork sqlite but no higher.

Eventually the web might get something like WebSQL. But the way it will happen will be for people to implement SQL in js on top of IndexedDB. If that proves popular but has issues (e.g. with performance) then there will be renewed impetus to do a proper job of a WebSQL specification.


There are well respected standards such as SQL 92. They could also just take SQLite specification and use it.

Many SQL databases support multiple SQL dialects.


I fail to see how that's any different than the current case with JS and how we use shims to normalize the usage to a sane, concise API until it stabilizes (if it ever does). Do we really expect there won't be umpteen shims for WebSQL to normalize any syntax and feature inconsistencies?


No one expects to write something in Python and port it over to C# with a serious amount of effort either. I don't think changing databases is more common than changing programming languages.


Which is why web standards often specify javascript APIs, and don't generally say "this should be available in a language that's...well, mostly imperative or object oriented, and maybe there can be some functional features, I guess..."


SQLite implements the SQL-92 standard (http://en.wikipedia.org/wiki/SQL-92 , http://en.wikipedia.org/wiki/SQLite#Features ).

Firefox certainly ships with SQLite and even their IndexedDB implementation is based on top of it internally (the irony): https://plus.google.com/+KevinDangoor/posts/PHqKjkcNbLU

Microsoft has MS Access JetRed, MS Outlook JetBlue and SQL Express SQL Databases that ship already with Windows and/or Office and are SQL-92 compatible. It should be trivial for them to figure out which SQL DB engine would fit best and integrate it in IE 12.


First sentence of http://en.wikipedia.org/wiki/SQLite#Features : "SQLite implements most of the SQL-92 standard for SQL but it lacks some features."


[deleted]


Easy example: write a webapp that relies on SQLite's lack of type safety then encounter a browser that implements your vision of SQL but using a type safe backend. It'll break.

This is why for this it's important implementation and spec are separated.

Further, the trend for in browser APIs is for them to be lower level than previously. (See also WebGL, asm.js etc.). You could build an SQL abstraction over IndexedDB if you really want SQL querying, but people have noticed that the killer property for in browser DB libs is going to be replication and syncing after having disconnected - something a traditional SQL DB is not going to help with. For this reason PouchDB is closer to the API devs will actually be using day-to-day in web apps.


I assume that for an embedded DB, MS would be using a variant of Microsft Sql CE, which is their counterpart to SqlLite. And it's terrible.


SQL CE is no longer under active development.

Most Microsoft tools that need an embedded engine use ESE(Jet Blue), which is bundled with Windows.

For third-party desktop developers, the replacement is SQL Express LocalDB. For WinRT developers, Microsoft has paid the SQLite developers to port it over, so that it passes Store validation.

Windows Phone still uses SQL CE, but the only interface to it is LINQ-to-SQL. So Microsoft could conceivably swap out the backend at any time.


Safari are finally adding IndexedDB, which means for the first time we have a unified storage interface across all the browsers.

IndexedDB is kinda nasty to work with directly, but there are fairly good libraries on top of it. http://www.taffydb.com/ looks great for complex queries. One I work on http://pouchdb.com provides syncing (also works with WebSQL)

WebSQL is also pretty nasty to work with, hand crafting SQL string in the browser for one just feelds weird, even yesterday I had a bug filed which related to the internal SQLLite implementation (the entire reason WebSQL is a no go is it has one implementation, Richard Hipp would now own web storage)

I havent seen much support from webSQL, even between those of us who actually use it most are just waiting for it to die, indexeddb in safari is that final nail


"Safari are finally adding IndexedDB, which means for the first time we have a unified storage interface across all the browsers."

localStorage has been well supported over all browers since IE8... It's interface IMO is a perfect fit in JavaScript... with the one exception of not being callback driven...


Yeh sorry, not technically correct, while localstorage is great for quickly storing small items, due to having a sync api and having to encode / decode complex data its a non starter more data driver applications, so I always automatically exclude it


If you're doing accounting, sure you would want ACID compliant database. There you have a limited amount of kinds of data to store with strict and rarely changing constraints. You can keep it on one big expensive server (plus backup) and it's better to bring the system down to allow inconsistency.

However, for most web development SQL is seriously not good. You end up with hundreds of loosely coupled tables which constantly change their structure for new features. Half dozen of joins on every request. Hundreds of lines of SQL. Constant pain. And it's not like you cared so much for the consistency - if once a year three comments disappear from your web site, so what?

And it's painful to make SQL multi-master.

For (the most of) web development document databases are so much better. MongoDB is pretty nice because it makes hundreds of lines of SQL with ten files of code redundant - all per one complex document.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: