These are fantastic improvements. But I wish there is auto-sharding support for PostgreSQL, or at least an auto-sharding layer. For some data sets I really sharding and building things manually just takes too much time. Unfortunately sharding is not as easy as 'insert into server number object_id % NUMBER_OF_SERVERS'; a proper auto-sharding application should also handle moving data from one shard to another.
Adding or removing shards can get complicated and it is better to handle this at your application layer.
For instance, you could keep track of the configuration history - when adding a shard, you create a new version. And when trying to fetch, you could try the latest version of your configuration (which says, for example, that it's on server 3) and if not found you could try the previous version (which says that it's on server 1). You can also add the possibility of removing a shard, however this requires that for every write you'll have to do a secondary write to another server, for redundancy.
In the background you could have a healer that whenever a new version gets pushed, it would go through all entries and move them accordingly. And at the end it would declare such a version to be "healed".
However, what I described above works in case you want to introduce a new server in production immediately. Like say, in the case of something like Twitter, you want to write to the new shard right away and the most important messages are the most recent ones. But you may also want to create a configuration such that a new server isn't hit by the front-end until the healing script hasn't finished migrating.
The thing is, a layer like this gets to be very optimized for your needs and there's no one size fits all.
I have a proof-of-concept that illustrates a way of auto-sharding a Postgres database using a patched version of PL/Proxy: http://code.google.com/p/hotrepart/
The big limitation is that your DB structure has to be hierarchical. I think a lower-level solution at the WAL level would be better.
Postgres has very good reliability (for the longest time, reliability and completeness were the only two driving goals of the project). It also has acceptable-to-good performances (on unices anyway) but the default configuration is completely untuned.
This default is meant to run on very weak machines, so the db "works" everywhere OOTB, many people don't configure their postgres (and it's not been trivial historically, it's gotten a lot better) and end up with miserable performances.
This issue is most visible with "the other open-source database" (MySQL) as it has 1. a much more aggressive default configuration and 2. MyISAM default (I think it might have been switched now, but that's pretty recent), which has very fast reads (and nothing else)
An other issue with postgres is probably the lack of postgres DBAs (let alone good ones) compared to Oracle and MSSQL, though any good and open-minded DBA should be able to get up-to-speed quick on it. The last issue would be tooling, I believe MS provides lots of administration and GUI tools for MSSQL, there's a dearth of those for postgres.
There's a fair number of really good Postgres tools out there, or tools that support Postgres among other platforms, They're just pricey. The free tools are passable but not that good.
For me personally PostgreSQL makes up for the lack of good free graphical tools by providing an excellent command line client, psql, and good utility views and functions. I much prefer it to the CLIs of MySQL, Oracle and SQLite.
I do not have experience with MSSQL, but I have been using PostgreSQL since 5 years ago.
It's a really reliable database, more reliable than MySQL. It has all the features you expect a real RDBMS to have.
It has good performance too. There's this myth going around, with people saying that it doesn't have good performance. However it is just a myth, started because the default configuration is optimized for poor servers without much memory. Make sure to optimize your configuration (i.e. specifying how much memory can PostgreSQL use for certain tasks - you can find guides for that online).
I used MSSQL up to MSSQL 2005 and MySQL up until about the same time. Around then I switched over to PostgreSQL completely and have not had a reason to go back. My biggest complaints had to do with simple replication and that was mostly solved in Postgres 9.0. Postgres is a damn fine database.
Well I am quite honestly bored of paying £240 a month per cpu licensing costs.
I have no objections for paying for licenses but it's just getting too much. They put their costs up recently from £180 a month to £240, when we tried to pass that onto a client they pulled their face, it's getting hard to justify the cost of running a web server on a Microsoft platform
Such comparisons are often difficult since MySQL and Postgres typical perform significantly better on Unix based systems than they do on Windows. Also using the Windows/.NET libraries for connections tend to perform worse even if the database server itself is hosted a Unix box.
In general, I would say you should stay on MSSQL if your server environments are Windows and only look at the OSS alternatives if you're willing (and able) to make a full stack switch to Unix.
At the company I work for we are currently in the process of migrating from MSSQL to PostgreSQL. However, we are trickling to a full stack switch to Linux. In the meantime, the Postgres databases have performed extremely well and on balance we are happy with it.
Unfortunately I would argue that at an enterprise level MSSQL is better than Postgres (pure conjecture here). If nothing else, it is way easier to employ MSSQL experts where I live and next to impossible to find Postgres talent. This pretty much means that we are learning by trial and error what the best practices are.
Great to hear that your company is forward thinking and willing to make the considerable investment to get off the MS Stack.
You might have more luck finding some Oracle DBAs and training them up on PostGres. There's a lot of similarities between the two, and PG/SQL is also very similar to PL/SQL.
Well we do currently use .net but I asked a question the other week on here about different languages etc and one good piece of advice I got was to create a REST service and interact with that.
We have put a lot of thought into it and we were going to give it a go with WCF and REST but I don't see any issue why we cannot buy a UNIX box and put Postgresql on it and put a layer on it with some other language serving the REST?
I will be benchmarking this with a current customer who has over 1 million products and a lot of visitors a day to see how it stands up against MSSQL, we are quite sensible into not jumping into decisions so I appreciate your comment and it shall definitely be part of the benchmark.
Having spent the better part of 3 years creating REST services with .NET, I strongly recommend you do NOT use traditional WCF. The WCF Web API stuff looks interesting but still seems immature. I've found ASP.NET MVC to be the best tool for creating REST apis.
Even MVC is a little heavy and you don't get a lot of things for free with it. Compare it to some of the Java OSS projects like Enunciate and it still requires a lot of extra work.
On the .NET side, I would suggest looking at Demis Ballot's excellent Service Stack library.
http://www.servicestack.net/
I know a lot of this will be up to the architecture and specific nature of each app... but has anyone recently compared a modern MySQL and Postgresql installation on a few baseline metrics? I'd love to see the differences. I am starting to use it more and more, being a Django developer surrounded by Postgres users. But, so far the core differences are scaring me away. psql template1? It doesn't make as much sense to me.
Why connect to template1? It should be invisible. Create a user and a database for your webapp. Matching the unix account name is the simplest way to do that.
In pg_hba.conf, let unix users authenticate as matching PostgreSQL users:
local all all peer
The postgres user on Debian/Ubuntu can create users and databases:
> psql template1? It doesn't make as much sense to me.
What doesn't make sense about it? It opens a postgres shell to the `template1` database, that's all...
And there should be very little reason to connect to `template1`: it is the default template for creating a database (it's copied to the new db file basically), so the only point in connecting to it is e.g. creating a few common tables or enabling extensions so that all the databases you create have them by default.