Postgres 9.2 will feature linear read scalability up to 64 cores

rosser · on April 3, 2012

As much as I love PostgreSQL (and I do; it's put food on my table for the last decade), I have to stress that this linear-ish scalability needs both pg >= 9.2 and a Linux kernel >= 3.2. It seems to be a combination of the lseek(2) changes in the kernel, and the lock contention/handling changes in the db.

That is: if you're running on an older kernel, you probably won't see quite as much gain.

atombender · on April 3, 2012

We upgraded from 2.6 to 3.2 recently and our Postgres has been flying ever since. The scheduler changes (and presumably the lseek changes) make a huge difference in load. We have not done any performance timings, though, so we don't know if the changes translate to better performance.

To be specific, the change reduced read I/O (http://i.imgur.com/L8NWO.png) and load average (http://i.imgur.com/7793A.png) both by an order of magnitude, and the variance is much tighter than before. (The system is a dual Xeon quad-core X5355/2.66GHz with 32GB RAM and RAID5.)

That improvement was fairly miraculous — a factor of 10x just by upgrading a kernel is not something that happens every day. Still, I would not be surprised if Postgres 9.2 pushes performance even higher.

jeltz · on April 3, 2012

There is another change of note in Linux 3.2 which could matter PostgreSQL, the major changes to how writebacks of dirty pages works.[1] Now in your case the changes where mostly to read perfromance so I do not think the writeback changes mattered here.

1. http://kernelnewbies.org/Linux_3.2#head-fbc26b4522e4e990a9ea...

kingmanaz · on April 3, 2012

This is going to sound pathetic and off topic, but do you have pointers toward any good materials on database performance benchmarking? Your screenshots display exactly the kinds of presentation I am after, yet I've never managed to progress beyond windows performance monitor and a pile of unorganized spreadsheets. I keep getting more and more DBA work dumped on me and it seems like somewhere I've missed the principles of the art of DBA (if there are any). Other than some Celko and Date, everything I come in contact with has market-speak and corporate buzzwords written all over it.

jpitz · on April 4, 2012

Greg Smith's PostgreSQL 9.0 High Performance [1] Cary Milsap's Optimizing Oracle Performance [2], and - a pale third, Davidson and Ford's Performance Tuning with SQL Server Dynamic Management Views [3]

The first 2 are bottom-up, structured approaches to benchmarking low-level system performance with an emphasis on *nix, and building up to database performance characterization and investigation. Despite their names, both have a lot of great general, non-product-specific use.

The third I include because it is the only MSSQL-specific book I have on the subject, and it sounds like you're in Windows Land. It has some real gems, but little coverage of the OS or methodology. I cannot over-emphasize how important that methodology is.

Make sure you learn how to use the Resource Monitor, and SQL Profiler.

If you want to chat about it more, I'm justinpitts at google's mail.

[1] http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Sm... [2] http://www.amazon.com/Optimizing-Oracle-Performance-Cary-Mil... [3] http://www.amazon.com/Performance-Tuning-Server-Dynamic-Mana...

bitsaplenty · on April 4, 2012

That graph looks like it was generated by rrdtool (http://oss.oetiker.ch/rrdtool/). If you are looking for monitoring of stats like system load, disk io, etc, check out cacti (http://www.cacti.net/). It has quite a few plugins to monitor other stuff as well. It also works for remote servers via snmp.

jeltz · on April 4, 2012

For motioning performance of Linux machines I recommend using munin. It has excellent PostgreSQL plugins, most of them written by a member of the PostgreSQL core team (Thanks Magnus!).

cacti is also supposed to be good but I have never used it myself.

atombender · on April 6, 2012

The graphs are produced by a web frontend (I forget the name) that reads from Collectd data. Collectd runs on each box and collects performance data that can be graphed on a central box.

We used Munin previously, and I must admit that Collectd feels like a step backwards. Both are very primitive, so we might look for something better soon, perhaps Graphite.

I recommend migrating to Linux (or one of the BSDs) if you want to get serious about system administration. The wealth of mature tools is vastly superior.

The Postgres book by Greg Smith is indispensible if you want to tune a Postgres installation.

btbuilder · on April 4, 2012

One thing to be careful of when producing benchmarks for databases is making sure you avoid artificial testing scenarios that don't match your actual workload.

If you don't already, understanding your concurrency and read/write patterns will go a long way to help you identify which benchmarks are meaningful for you.

justauser · on April 3, 2012

That's a tremendous improvement. Obviously, I don't know the particulars of your situation but are you planning to update to more current hardware? The 5300 Cloverton was 65nm process and 2006 era. I'd be curious about the effect of using the 3.2+ kernel on the new e5-2600 Sandybridge process.

atombender · on April 6, 2012

We have a new cluster that's intended to replace the current one, which as you point out is running on 2006 hardware. (It still performs extremely well, however.)

Those numbers would presumably translate pretty well to newer CPUs as the numbers reflect the change in read I/O. Newer CPUs would handle the load faster, but the read traffic would be the same.

andyzweb · on April 3, 2012

what distros are offering a 3.2 kernel?

psadauskas · on April 3, 2012

Ubuntu 12.04 will, when its released later this month.

lmz · on April 3, 2012

Debian Backports has a 3.2 kernel for Squeeze.

exabytes18 · on April 4, 2012

Fedora 16 is on 3.3.0.

efuquen · on April 3, 2012

Arch, which should be moving to 3.3 soon. Obviously not really a server distro, but still an option if you really want/need bleeding edge.

andrewcooke · on April 4, 2012

opensuse tumbleweed (install 12.1 and then follow instructions at http://en.opensuse.org/Portal:Tumbleweed).

not as stable as a standard release though, in my experience (which was a while ago - stability is an important aim for the project, so they may have got better).

pbreit · on April 3, 2012

The soon-to-be released Ubuntu 12.04.

dandrews · on April 4, 2012

Gentoo.

sciurus · on April 4, 2012

Debian testing.

shanemhansen · on April 3, 2012

At my last company one of the biggest complaints against postgreSQL (when being compared to oracle) was that it "could only scale to 20 cores", which made it unsuitable for "enterprise use". Nice to see that non-issue removed.

Hey Rosser! Didn't know you hung out here.

rosser · on April 3, 2012

Shane, I think most of the complaints about postgres at said company were made out of a pre-existing bias towards the expensive commercial solution, rather than against the (at the time, admittedly) somewhat less capable FOSS project. When I was asked about vertical scalability, I told them "16-20 cores right now, but at the rate they're improving things, it'll be 32, and then 64 in the next 3-5 years." Nice to see things tracking so closely with my predictions.

jeltz · on April 3, 2012

The lseek scalability issue was not very noticeable on PostgreSQL 9.1, but first became obvious after work had started on fixing the scalability problems in PostgreSQL itself.

http://rhaas.blogspot.se/2011/08/linux-and-glibc-scalability...

ssmoot · on April 3, 2012

Does that mean that FreeBSD will become a bit of a second-class-citizen with the release of 9.2? Or are similar optimizations available and planned there?

jeltz · on April 3, 2012

The llseek scalability issue was caused by the Linux kernel taking a lock on a data structure when no such lock was necessary. While it is possible that FreeBSD has the same problem with llseek, I see no reason to assume that is the case.

FreeBSD probably works as well as Linux >= 3.2 when it comes to llseek, but there might have some other scalability issue which is not present in Linux.

EDIT: Here is a link to the Linux kernel patch, PostgreSQL uses SEEK_END to check the file sizes.

https://lkml.org/lkml/2011/9/15/401

ww520 · on April 3, 2012

Database vendors typically charge by the number of cores. 64-core can get really expensive with database licenses. The hardware cost has decreased drastically over time but database licensing are still in the dark age. Postgres 9.2 has real competitive advantage here. Hopefully it would force the other vendors' licensing cost down.

jonknee · on April 3, 2012

Evidence:

http://www.oracle.com/us/corporate/pricing/technology-price-...

"The number of required licenses shall be determined by multiplying the total number of cores of the processor by a core processor licensing factor specified on the Oracle Processor Core Factor Table"

http://www.oracle.com/us/corporate/contracts/processor-core-...

They give you a big discount for buying Sun servers (.25 factor). Either way, it's a huge amount of money, the standard edition costs a cool $17,500 per processor so with 64 cores and the best .25 multiplier you're still looking at 16 x $17,500 or $280,000 for the DB processor license (that doesn't cover support or anything else). The Enterprise edition runs an astounding $47,500 per CPU, so you can easily run north of a million dollars per server if you're running a lot of cores.

etrain · on April 3, 2012

Oracle tends to be pretty opaque in their pricing, and part of that is because with any sale this big, there's always going to be a lot of negotiation.

Sure, it's going to be expensive, but only schmucks pay full price for a 64-core license.

Still, it's good to see the best open source database out there delivering cutting edge performance. Great work!

mapgrep · on April 3, 2012

Yes you can get a big discount. Then when your business booms, and you need to expand, upgrade or (for some licenses) renew, Oracle gets decide how much of your profit to take off the table in licensing fees, up to the full list.

But of course I'm speaking out of school here. I have no proof Larry Ellison or his minions will pursue such ruthless business tactics. Just a feeling :)

Ecio78 · on April 4, 2012

Microsoft SQL Server used to be licensed per socket and not per core (and they used to use it to market their product against Oracle). Guess what? the new 2012 version will be licensed per core too..

vladev · on April 3, 2012

To me, Postgres is the most underestimated database. Not sure if this is a bad thing...

crag · on April 3, 2012

Let me add another database that's "underestimated" (by mainstream corporate America): SQLite3.

SQLite is fast, small, portable, easy & simple to maintain and backup, AND reliable. And unless you are running a high traffic site (or application) it could handle everything a small (even medium) business would need.

Why small companies get talked into running MSQL or Oracle or MySQL is beyond me. And even if (and that's a big IF) they needed more "power", there's Postgres.

PS: Sorry for hijacking this thread. I'm a big fan boy of both SQLite and Postgres.

justinsb · on April 3, 2012

SQLite is a great embedded database, but is deliberately not designed for replacing a 'full' database server. It doesn't support highly concurrent usage, it can't be run as a network service, it has a comparatively weak type model.

All of these differences are actually assets for the embedded DB market. They could be fixed, but you would end up with a database that was a winner in neither space.

I think the killer problem with SQLite for businesses is that it essentially locks their data inside the application. With a full SQL server, the data is trivially exposed for use / integration with other systems.

gojomo · on April 4, 2012

From the description and benchmarks, it seems that the 'SQL API for BerkeleyDB', as a drop-in replacement for SQLite, should provide an easy fix for many SQLite-using projects that hit barriers dur to concurrency. See:

http://www.oracle.com/technetwork/database/berkeleydb/overvi...

But, I haven't seen nearly as much discussion/declared-usage of this as I would have expected, given the benefits of 'SQLite API way beyond prototype/single-user scale'.

Does anyone have experience with it? Any theories why it isn't so widely used/known?

jeltz · on April 3, 2012

I would say when SQLite added the WAL mode they greatly improved the concurrency. In WAL mode SQLite has better concurrency characteristics than MyISAM which not too long ago was commonly used.

justinsb · on April 3, 2012

Thanks for the WAL pointer; it does sound like concurrency is better now. Benchmarking concurrency against MyISAM is a bit like benchmarking reliability against MongoDB though ;-)

jeltz · on April 3, 2012

Agreed, my point is that since people could (and still can) run mostly fine on MyISAM and MongoDB despite their lack of real concurrency, then SQLite in WAL mode should not have any problems for a small to medium website.

Note: Personally I would pick PostgreSQL any day, unless developing an embedded application.

gcp · on April 4, 2012

Reasons not to use SQLite?

1) Very limited ALTER TABLE support. 2) Very limited JOIN support. 3) No real multiuser/multiprocess concurrency support. Limited concurrency in-process with WAL. 4) Poor query optimizers, compared to PostgresSQL and even MySQL. Poor index analysis in complex queries.

4 is really a big one. It's surprisingly easy to hit situations where SQLite is orders of magnitude slower than real databases, fails to make proper use of available indexes to narrow range queries, does terabytes more write traffic than was necessary, etc. And unlike MySQL/PostgresSQL, the query planner inspection tools are horrible, too.

On top of that, some SQLite features (R-tree, slightly less bad index analysis, ...) must be enabled and aren't compiled in by default. This complicates deployment.

crag · on April 4, 2012

And where did I ever say one should use SQLite over Postgres. I wasn't comparing the two.

I was simply stating that SQLite is also over looked by corporate America.

None of your reasons would impact, say, a department running their departmental "internal" blog on SQlite.

gcp · on April 4, 2012

You stated:

Why small companies get talked into running MSQL or Oracle or MySQL is beyond me. And even if (and that's a big IF) they needed more "power", there's Postgres.

I'm explaining you that SQLite has big, very real limitations that any of those alternatives are a good workaround for. And as for Postgres, that's just one more argument never to use SQLite. So how can it be "underestimated"?

SQLite make sense in one single situation: if you absolutely need an embedded database. A corporate blog is not one of them. SQLite isn't used much because it's limited compared to the alternatives.

jeltz · on April 3, 2012

SQLite even provides real MVCC (multi-version concurrency control) with decent read/write concurrency if you run it in WAL mode. Ran in WAL mode SQLite is a very competent database.

http://www.sqlite.org/wal.html

ceol · on April 3, 2012

I'd imagine these small companies dream of being big some day, so why would they invest the time to use SQLite when, should they hit it big, they would have to completely switch databases?

Of course, that's my speculation, but what's the downside to going with PostgreSQL/MySQL in the first place unless you never intend on getting bigger?

jeltz · on April 3, 2012

I agree with the thought. While I really like SQLite as a piece of software I see little reason to use it over PostgreSQL unless I need an embedded database.

And even if you have no plan of becoming big (e.g if you are in business to business) I see little reason to pick SQLite since PostgreSQL can do everything SQLite can and much more so you never need to worry about outgrowing it.

Cieplak · on April 3, 2012

"SQLite usually will work great as the database engine for low to medium traffic websites (which is to say, 99.9% of all websites). The amount of web traffic that SQLite can handle depends, of course, on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."

http://www.sqlite.org/whentouse.html

atombender · on April 3, 2012

SQLite is indeed a very good piece of software. It's also a fairly idiosynchratic piece of software: For example, table schemas are essentially untyped, and you can assign any type of value to a column value. The lack of good support for date/time data is problematic. And the complete lack of support for GIS is pretty serious; for many people, PostGIS is the "killer app" for Postgres.

jeltz · on April 3, 2012

I agree with your general sentiment, but there actually exists a GIS extensions for SQLite, SpatiaLite[1], so there is no "complete lack of support for GIS". Still I think PostGIS is vastly superior to most if not all of its competitors.

1. https://en.wikipedia.org/wiki/SpatiaLite

ryandvm · on April 3, 2012

Agreed. I get that MySQL is incredibly simple to set up, so I can sort of understand why people use it for pet projects or whatever. But what I never understood was why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)?

avar · on April 3, 2012

MySQL is not just some toy database. It's used by a lot of big installations that need a lot of performance, massive replication etc.

Sure it has its issues, but the objections people have against it mainly seem to be prejudice from the MyISAM days along with not appreciating how it shines on the workloads it's optimized for.

It sucks at reporting queries, but it shines when you have a large cluster of machines, multi-level replication chains, and mostly do queries that end up being primary key lookups or primary key range lookups with relatively simple constraints. That's the sort of thing you're likely to do for most of your traffic with any database once you scale up.

PostgreSQL also didn't have some of the features that made MySQL really fast until relatively recently, e.g. being able to entirely resolve a query on indexes without ever looking up the actual data rows.

I'd say the biggest problem MySQL has at scale is that replicated changes are applied in a single thread whereas updates on master servers are multi-threaded.

I'm very excited by recent improvements in PostgreSQL, and I wish it were the database I worked with professionally, but don't be so quick to dismiss MySQL.

wisty · on April 4, 2012

One of the big fundamental differences between MySQL and Postgres is that MySQL has "real" primary keys (the table is sorted by primary key), and Postgres primary keys are just an index with constraints. Neither solution is really better, but for some workloads MySQL will have an advantage here.

jeltz · on April 4, 2012

PostgreSQL's solution makes sequential scans and secondary indexes faster, while MySQL's makes primary key lookups faster. I believe there also are trade offs when it comes to updates and the cost of updating indexes.

rhaas · on April 5, 2012

Yep, see http://rhaas.blogspot.com/2010/11/mysql-vs-postgresql-part-1...

avar · on April 4, 2012

Indeed, one of the main advantages of having the primary key indicate the layout of your data on disk is that you can optimize access to your data for the least amount of filesystem page cache misses by making the primary key be the column (or combinations of columns) that you know you're going to have predictable batch access to.

Benchmarking that can be tricky for the same reason that benchmarking anything that reduces cache expiration can be tricky. If you reduce cache expiration your query might not benifit, but if you take a holistic approach to it and apply it to the whole database it'll benefit as a whole.

Any big MySQL powered application is likely to make wide use of this. Which goes to show that doing any benchmark comparisons between databases is often meaningless. You can't benchmark the same schema/queries between databases because they'll be optimized for the database you started out with.

einhverfr · on April 4, 2012

See "Cluster ON" in PostgreSQL.

Note that with this, you don't have to use unique indexes to indicate on-disk layout. Of course cluster needs to be run occasionally but it is far more capable.

ryandvm · on April 4, 2012

While CLUSTER is great for optimizing a given index, as of 9.1, it still requires an exclusive lock (no reads or writes).

http://www.postgresql.org/docs/9.1/static/sql-cluster.html

jeffdavis · on April 4, 2012

> why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)

This is speculation on my part, but I believe its because those companies have huge engineering organizations, and see their engineering talent as a competitive advantage. So their priorities are very different from other large enterprises.

Organizations like Google, etc., are going to have huge architectural diagrams, and then use whatever tools fit most nicely and perform the best as a component of that architecture. And they have the engineering resources to shoehorn it in there, and work around all of the bugs, misfeatures, caveats, and usability problems.

In other words, such companies are never looking for a complete system, because they are the ones building the complete system.

But for organizations where engineering talent is more of a supporting role, even at very large enterprises, the equation changes. Those companies simply can't afford to hire google's engineering team and put it to work in a supporting role. So these organizations are looking for something a little more complete, safe-by-default, extensible, adaptable to their environment, robust, low-maintenance, etc.

I believe it's a big mistake to misjudge what kind of company you are. For instance, blindly following Google's technical choices may be a disaster if engineering is not the central focus of your business.

einhverfr · on April 4, 2012

We are seeing that, slowly. It's a slow process because PostgreSQL used to suck, and MySQL used to be a lot easier to use. But this is changing as PostgreSQL has grown up.

falcolas · on April 3, 2012

Speaking only in broad generalities, it's because PostgreSQL was not suitable for large scale businesses until lately. The lack of replication, limited performance on multiple cores, etc. drove businesses to MySQL/Oracle in the past, and inertia keeps them there now.

vidarh · on April 4, 2012

The big deal for me:

Upgrading Postgres between major versions requires a dump and restore or using upgrade tools, meaning potentially long downtime, unless you set up some really ugly replication solutions.

I have servers I'm upgrading to 9.1 now, from 8.x, and we have wasted lots of time on it on some really hacky solutions because taking the downtime required to do an offline upgrade is just not acceptable.

We'd have paid thousands to avoid that problem, as that's what the time spent on avoiding that downtime is costing us (well, our clients).

j-kidd · on April 4, 2012

I think it is fairly reasonable to expect such downtime for major version upgrade. Is there a DBMS vendor that can promise you otherwise?

einhverfr · on April 4, 2012

Look into pg_upgrade.

vidarh · on April 4, 2012

It still requires the server to be taken down during the upgrade, and has a bunch of limitations that means it's still more cost effective for us to spend a few days on ugly custom hacks to avoid the pain.

fdr · on April 4, 2012

The most obvious balm is a logical replication solution. Unfortunately those require non-trivial implementation, but it's something that is in its broad generalities not controversial for inclusion.

Dimitri Fontaine has posted a large patch to implement a devilish component of that, DDL/Command triggers, receiving a lot of detailed review and attention: https://commitfest.postgresql.org/action/patch_view?id=768

But I think a cohesive solution can only be realistically realized in releases >= 9.3.

randomdata · on April 3, 2012

> But what I never understood was why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)?

An interesting selection of companies given that all three are known for their use of home-grown databases (BigTable, Dynamo, Cassandra) for their primary offerings that are not of the SQL variety at all.

Though I think it is still a good question. It may have something to do with the ease of setting up MySQL when you are a young startup trying to get something working as quickly as possible, leaving it often hard to justify a change after you've hit the big leagues.

justinsb · on April 3, 2012

They may be known for their home-grown databases, but they shouldn't be known for their use of them:

Amazon's primary database is Oracle. Dynamo is used for their shopping carts i.e. for storing sessions; Memcache would probably work just as well.

Facebook's database is MySQL, with sharding and Memcache. I was under the impression they stopped using Cassandra entirely?

Google's business (advertising) is built on MySQL, as are many of their sites e.g. YouTube. I think the newer Google-developed sites (e.g. GMail, Reader etc) are indeed built on BigTable.

haberman · on April 4, 2012

> Amazon's primary database is Oracle. Dynamo is used for their shopping carts i.e. for storing sessions; Memcache would probably work just as well.

What? Dynamo is a distributed, persistent, highly-available storage system with incremental scalability and advanced techniques for dealing with slow or unavailable servers. Memcache is a single-process daemon that vends an in-memory hash table via TCP. They are not even remotely comparable.

I'm not dissing memcached, it's cool and useful, but its scope is far, far more limited.

In particular, have you noticed that when you add something to your shopping cart on Amazon but don't buy it, it's still there months or years later? The data doesn't disappear just because some process that was holding the data in-memory crashes.

justinsb · on April 4, 2012

I can't say I've ever put something in my cart, not logged in to Amazon for months, gone back to see it still there, and thought "wow - they must use some sort of webscale storage solution - I'm now going to buy the thing I didn't want to buy six months ago."

From the point of view of solving the actual business problem, therefore, I think something based around Memcache would work just fine!

Dynamo is very useful for persuading you that working for Amazon would be interesting, though.

haberman · on April 4, 2012

Have you ever worked in an environment at the scale of Amazon? Machines go down all the time. I used the "six months" example to illustrate that the shopping cart is persistent, but the machine crash could just as easily happen the moment before you push "checkout." Losing shopping cart data just because one machine crashed is totally unacceptable.

Programming distributed systems that must survive machine failure and network partitions is a completely different ball of wax compared to simple web programming. If you haven't done it before, you would not believe how much more complicated it is.

Here is an extremely simple example. Suppose you're a radio station that's taking phone calls from people and you want to give an award to the 5th caller. Implementing a program that does this for a single machine is easy, and could be accomplished with a program something like this:

  import SocketServer

  class MyHandler(SocketServer.BaseRequestHandler):
    def handle(self):
      self.server.caller += 1
      if self.server.caller == 5:
        self.request.sendall("Congratulations, you are the 5th caller!\n")
      else:
        self.request.sendall("Sorry, you're caller #%d\n" % (self.server.caller))

  server = SocketServer.TCPServer(("localhost", 9999), MyHandler)
  server.caller = 0
  server.serve_forever()

Using this program, you can "call" the program by doing "telnet localhost 9999" and the program will tell you what caller you are. This took me about 5 minutes to write, and I'd never used this Python API before.

Now imagine that you want to implement this same logic, but using a cluster of machines that could go down at any time. You want the group of machines to form "consensus" about which number each caller is; consensus in this context just means that the group of machines arrives at a single answer, and any machine you ask will give you the same answer.

Finding an algorithm that can do this robustly is so difficult that it was a major breakthrough when one was discovered in 1988. It's called Paxos and you can read about it here: http://en.wikipedia.org/wiki/Paxos_(computer_science) Even though it has been known for over 20 years, it is still a complex topic that very few people understand the details of.

The point of all of this is just to say; you can't compare a single-process in-memory cache to a distributed and fault-tolerant system. They are completely different beasts, and many business problems do indeed need the latter.

justinsb · on April 4, 2012

I know about Paxos. I know when to use it, and when not to use it.

Seeing as you came up with the example, using Paxos for an "Nth caller wins" is a really bad idea - the Nth caller to your switchboard likely won't be the one selected. You probably _need_ a single server system, like the example you wrote (but without the threading errors.)

haberman · on April 4, 2012

Then why would you compare dynamo and memcached? Or suggest that Amazon could use an in-memory cache as the primary store for shopping cart data? It makes no sense.

If you want to be particular about ordering, then you can always use paxos to elect a master that handles everything serially and failsover.

justinsb · on April 4, 2012

I see you work at Google. How about advancing the conversation by telling us how Google stores its sessions?

haberman · on April 5, 2012

Let's make a deal. I'll start "advancing the conversation" once you stop misleading people (ie. admit that your initial sweeping claims were incorrect).

justinsb · on April 5, 2012

Deal. My initial sweeping claims were incorrect.

So: How does Google store its sessions?

haberman · on April 5, 2012

So I'm not an expert in Google's front-ends and I'm not sure that Google stores "sessions" in the way you'd generally think of them in web apps. But usually when we have requirements like what you'd need for sessions (highly available, low-latency, highly scalable) we use Megastore:

http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf

http://www.readwriteweb.com/cloud/2011/02/megastore-googles-...

Megastore is a layer on top of Bigtable that adds indexing, synchronous replication across data centers, and ACID semantics within small partitions called "entity groups."

justinsb · on April 5, 2012

Good deal! Agree that traditional sessions are best avoided, but good to know that megastore is suitable for session data.

pooriaazimi · on April 3, 2012

Thanks. I didn't know about Amazon's... Can you provide more links/sources on this matter?

justinsb · on April 3, 2012

Here's a 2007 post: http://highscalability.com/blog/2007/9/18/amazon-architectur...

In short (and filling in some of the gaps with my interpretation), they used "one big Oracle database" up until 2001. The webapps originally talked directly to the database (presumably with local caching), but they introduced an application tier later on. I'd imagine this was as much for data integrity reasons.

As they outgrew the one big DB approach, the service approach also let them split their database along service boundaries. So each service still used one big DB, but there were many services.

My understanding is that now there are hundreds of services, and each service is run by a team that can choose their own internal components. But each team is directly responsible for their uptime; if their service breaks the developers get paged in the middle of the night. Traditional databases are therefore still widely chosen; even if sexy technologies like Dynamo get the press.

"Dynamo for show, Oracle for dough" to mutilate an old golfing expression :-)

pooriaazimi · on April 3, 2012

Thank you.

justincormack · on April 4, 2012

Skype is a big Postres user, including contributing back.

einhverfr · on April 4, 2012

It will be interesting to see how long it takes Microsoft to move it all to MS SQL. My guess is that it will take at least a decade ;-)

fdr · on April 3, 2012

Postgres was just not popular enough, I think; it's still considerably less popular than MySQL, and familiarity and operations counts for a lot. So I think the number of "marquee" names falls about in line with what one would expect. Facebook is very nearly out of the running because of its PHP lineage, Amazon has very close ties to Oracle, but Google could probably have easily gone either way, but MySQL is popular, so it probably got there first.

notatoad · on April 4, 2012

MySQL gets used because phpmyadmin gives people an easy stepping stone to get started, and then it keeps getting used just because it's what everybody is familiar with.

mhurron · on April 3, 2012

PostgreSQL 7 and earlier kind of sucked regarding performance. Noticeably that is, especially compared to MySQL. With PostgreSQL 8 and on, that changed. Unfortunately a lot of those well known projects started before PostgreSQL 8 came a long and so it was sort of inappropriate for the level of performance they required.

rdl · on April 3, 2012

Heroku?

leftnode · on April 3, 2012

I just switched from MySQL to Postgres for everything (as a result of seeing how powerful and stable it is at my full time job) and it is simply amazing. Easily one of the most impressive pieces of software built.

stesch · on April 3, 2012

It won a lot of awards: http://en.wikipedia.org/wiki/PostgreSQL#Awards

robomartin · on April 3, 2012

Is there such a thing as a suite of standardized performance tests for large-scale, multi-core databases? How are people comparing Cassandra, PostgreSQL, mySQL and other options against each other for raw performance?

jacques_chester · on April 4, 2012

> Is there such a thing as a suite of standardized performance tests for large-scale, multi-core databases?

There is: the TPC family[1] and their opensource dopplegangers, the OSDL-DBT family[1].

I don't think they've been applied to non-relational databases as yet.

[1] http://www.tpc.org/information/benchmarks.asp [2] http://sourceforge.net/apps/mediawiki/osdldbt/index.php?titl...

zorked · on April 3, 2012

There is no such thing as "raw performance". It's all very, very application-specific.

kermatt · on April 3, 2012

With this improvement, how much is Postgres hampered by a lack of a parallel query processor?

For OLAP work, it seems to be the primary bottleneck.

einhverfr · on April 4, 2012

A good project to watch in this regard is Postgres-XC. Maybe not quite ready for production but it's close. (http://sourceforge.net/projects/postgres-xc/)

petrohi · on April 4, 2012

+1 on that.

gtaylor · on April 3, 2012

Yikes, that's a lot of cores. Glad to see the Postgres team keep pushing the scalability envelope.

TylerE · on April 3, 2012

Is it? I mean, bargain basement budget desktops have more cores than a typical server of 10 years ago. A 24 or 32 core server can't really be considered that exotic these days, can it?

wmf · on April 3, 2012

On HN anything that EC2 doesn't provide is considered exotic. And cheap servers still only have 16 cores/32 threads.

fdr · on April 3, 2012

To some extent true, but when core counts increase, they tend to double. Given the very new version of PG and Linux required to get these scalability benefits, it is nice to seem them pipelined ahead of the EC2 curve...it takes a long time and a lot of work to correctly alleviate lock contention and get new versions of Linux disseminated (the new LTS from Canonical will do a lot...I believe it features 3.2)

__alexs · on April 3, 2012

The number of places that run e.g. 4U $10,000 HP servers with 4 socket, 16 core Opteron servers is reasonably low I think.

justauser · on April 3, 2012

Change the branding to Supermicro and cut the cost to $5,000 and suddenly, many dedicated host providers out there have these servers. We're probably only 2-3 years away from this core count being a $2,000-$3,000 box. Throw ARM and Tilera architecture and whatever AMD is doing with the acquisition of SeaMicro in there and it might even accelerate that pace.

mrb · on April 3, 2012

Not everybody buys HP... A 1U Supermicro 4-socket 64-core Opteron server with 32GB RAM costs $3850.

http://www.newegg.com/Product/Product.aspx?Item=N82E16816101... http://www.newegg.com/Product/Product.aspx?Item=N82E16819113... http://www.newegg.com/Product/Product.aspx?Item=N82E16820239...

wisty · on April 4, 2012

Since databases are the bottleneck in a lot of apps, it makes a lot of sense to spend a ton of money on them. $10k will cover roughly the monthly wage of a good DBA. It won't cover anywhere near the costs of getting a software team to optimize everything for "web scale performance". The only reason you wouldn't want a 64 core machine is that Oracle will try to charge per core, so a machine with less cores and better per-core performance might be more cost effective.

ww520 · on April 3, 2012

That's awesome. I have to admit I have always know Postgres is great and toyed with it but never used it in a real project, due to the availability of MySQL or client preference. I'll try to put it into the current project. Client wants Oracle since they already have Oracle license, but I will change the requirement to support Postgres as well.

j-kidd · on April 4, 2012

If possible, try not to go down the "support Postgres as well" route.

I was in your situation, where client wanted SQL Server since they already have the license. During development, I use PostgreSQL instead, to "support Postgres as well".

At the end, roughly one-third [1] of the total development effort was spent on overcoming SQL Server's limitations, things that you would never have to think about in PostgreSQL.

So, try telling the client that they already have PostgreSQL license as well, with unlimited future upgrade.

[1] This figure was pulled from ass. The actual productivity loss could be more due to similar reasons outlined in http://news.ycombinator.com/item?id=3784750

xradionut · on April 4, 2012

Maybe the requirement is Oracle or SQL Server because the client has the resources (DBA, support contracts, etc) to support those platforms in house?

j-kidd · on April 4, 2012

Not in my case, the client doesn't have the resource, and we don't have the expertise. At the end, I became somewhat of an expert in SQL Server (i.e. I know how to start SQL Profiler and read query plan), at the cost of important functionality not delivered to the client on time.

verminoth · on April 3, 2012

I'm new to all of this, so does this mean that other databases don't have this kind of performance?

gauravk92 · on April 3, 2012

Performance is subjective, but let's dig into the performance optimization this patch includes. The update addresses an issue where to do a llseek (read) of the database, the linux kernel would lock the read, causing it to have only one output at a time. The patch removes the lock because it was unnecessary and thus the performance scales concurrently without a lock creating contention.

This kind of performance optimization isn't new, concurrency is the name of the game. Erlang is a language built around concurrency and it has some databases written in it (couchdb) that scale with more cores due to erlangs inherent capabilities. So has this kind of performance increase been seen before, yes.

jeltz · on April 3, 2012

The bulk of the fixes were in the locking in PostgreSQL though, and before those fixes were made the llseek problem did not appear to many since in almost all cases people hit a bottleneck in the PostgreSQL code before hitting the Linux Kernel one.

gauravk92 · on April 9, 2012

Thank you! Do you have any experience with CouchDB? I'm sure you've at least of heard of erlang and all the benefits it promises, I'm considering whether to use it for a future project, I might because the JSON REST api seems convenient to use without having to worry about scale. But I haven't done testing and could use advice.