Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PostgreSQL 9.5 New Features with Examples [pdf] (hp.com)
140 points by snaga on Aug 11, 2015 | hide | past | favorite | 38 comments


INSERT ON CONFLICT and Row Level Security are awesome new additions.

If you don't know about RLS I recommend you read up on them[1], as they would allow you to handle select/update/delete rights on rows directly on the data itself.

[1]http://michael.otacoo.com/postgresql-2/postgres-9-5-feature-...


> Row Level Security

Awesome! Though having read the explanation I can't think of a secure way to use this in a multi-tenanted app, as once the DB user has permission to execute "SET ROLE", it's got the same security issues with accessing other user data as the current "WHERE company_id=3" style of separation? Would it help this use-case, and does it add security?


> as once the DB user has permission to execute "SET ROLE", it's got the same security issues with accessing other user data as the current "WHERE company_id=3" style of separation?

DB users can only SET ROLE to roles they belong to. If each tenant has its own role it isn't possible to switch between them.

    # CREATE ROLE bob;
    CREATE ROLE
    # CREATE ROLE alice;
    CREATE ROLE
    # SET SESSION AUTHORIZATION bob;
    SET
    > SET ROLE alice;
    ERROR:  permission denied to set role "alice"
"SET SESSION AUTHORIZATION <role>" changes the possible SET ROLE targets to those allowed for <role> (so only the roles it's a member of), whereas SET ROLE keeps the original possible roleset (which is every role if you SET ROLE from an admin account), so SET SESSION AUTHORIZATION is a better proxy for "logged as <role>".

The biggest annoyance is with connection pooling and the inability (to my knowledge) to prevent RESET SESSION AUTHORIZATION at the DB-level: when pooling connections you'll want to have a "base" role which is a NOINHERIT member of all "effective" (tenant) roles, so it can only SET ROLE/SET SESSION AUTHORIZATION but can't do anything (so it can't be exploited), the first thing you do when you get a connection from the pool is SSA to the effective user you want, and the last thing you do when you put the connection back in the pool is a RSA.

    # CREATE TABLE foo ( id serial, c1 varchar, c2 varchar );
    # GRANT SELECT (id, c1, c2) ON foo TO public;
    # GRANT UPDATE (c1) ON foo TO bob;
    # GRANT UPDATE (c2) ON foo TO alice;
    # INSERT INTO foo (c1, c2) VALUES ('a', 'b');
    # SELECT * FROM foo;
     id | c1 | c2 
    ----+----+----
      1 | a  | b
    (3 rows)
    # SET SESSION AUTHORIZATION base;
    > UPDATE foo SET c1 = 'd' WHERE id = 1;
    ERROR:  permission denied for relation foo
    > SET SESSION AUTHORIZATION bob;
    > UPDATE foo SET c1 = 'd' WHERE id = 1;
    UPDATE 1
    > SELECT * FROM foo;
     id | c1 | c2 
    ----+----+----
      1 | d  | b
    (3 rows)
That'll prevent honest mistakes, though an attacker could still RESET SESSION AUTHORIZATION then switch the connection to a different role and go to town. It might be possible to mitigate that at the application level, but that's less convenient.


MS does this with a magic cookie value. https://msdn.microsoft.com/en-us/library/ms181362.aspx

It's been discussed on the pg mailing lists but nothing has happened so far


An output parameter seems somewhat inconvenient to use from an application context, but yes that's a feature which — as the MSDN correctly notes — would be extremely useful for multiuser with connection pooling environment.


For small multi-tenant apps, we use one database per customer. It's simple and works pretty well.


It does help this use case, as you can restrict which role's a user/connection can have. i.e. you can't set role to anything you want if you are not a superuser.


If someone builds this into a database engine it'd be a big win.


by multi-tenant you mean tenant info in table column or by schema, or db?


I'm confused by your question, but I think "table column". Think generic SaaS application: one app, one (sharded) database, one set of tables, used by many different companies.


I wish they referred to it as UPSERT in the syntax.


Can't wait to use the new JSONB operators of 9.5, e.g.:

    SELECT '{"k1":"v1"}'::JSONB || '{"k1":"v2","k2":true}'::JSONB
=> {"k1": "v2", "k2": true}

In 9.4 this is the 'best way' I know:

    SELECT
        ('{' || STRING_AGG(
	    '"' || COALESCE(j2.key, j1.key) ||
	    '": ' || TO_JSON(COALESCE(j2.value, j1.value)
        ), ',') || '}')::JSONB
    FROM JSONB_EACH('{"k1":"v1"}') j1
    FULL OUTER JOIN
    (SELECT * FROM JSONB_EACH('{"k1":"v2","k2":true}')) j2 ON j1.key = j2.key


The JSONB 9.5 operators have been backported to 9.4 and are available here:

https://github.com/erthalion/jsonbx


The ALTER TABLE UNLOGGED feature is awesome, we've been looking for ways to speed up our Django test runner and this seems like it would help.

I've been thinking that a "test-only" mode of Postgres would be quite useful - i.e no persistence/WAL/crash-safety at all, in return for raw speed. My tests don't need any of this, and the Django test runner is awfully wasteful when it comes to setting up and tearing down tests.


Unlogged tables have existed since 9.1. The only thing that was added now was a quick way to toggle between modes. Before, you could achieve the he same by creating a second unlogged table (using 'like x including indexes'), followed by an 'insert into ... select ...' an the subsequent dropping and renaming (remember: DDL is transactional in Postgres, so you can do all of this in a transaction).

Yes, the new toggle is much more convenient, but you do not have to wait for 9.5 if you really want it.

In our case, we converted some tables we use as kind of materialized views the moment 9.1 came out.

edit: fixed autocorrect wittyness (unclogged => unlogged)


Thanks for that tip, I will try and integrate it with our tests later and see if there is much of a speedup (before I was trying to alter Django's CREATE TABLE statements which is harder than I thought).

I was also suggesting something further than unlogged tables, maybe something completely in-memory? Good tests are run in isolation and are really just care about the constraints rather than any form of persistence, combined with the fact that all data inserted is erased at the end of the test anyway it just seems a bit wasteful.


if it's just for testing, you can use a ram disk to store the data directory. If you don't want the entire db on the ramdisk, you can create a tablespace for the ramdisk, and then create tables on it. http://www.postgresql.org/docs/9.5/static/manage-ag-tablespa...


I'm very tempted to try that, but this warning in the docs you linked sounds pretty scary:

> Placing a tablespace on a temporary file system like a ramdisk risks the reliability of the entire cluster

Are you sure putting your test db on a ramdisk won't destroy the development db too?


The idea is, since it's a development db, after you destroy it, you can recreate it easily.

But there is a way to do it without destroying the db. When you're finished, stop postgres, copy the files off of the ramdisk, and then you can destroy the ramdisk. When you want to start it back up, create the ramdisk, copy the files back, and then start postgres.

Obviously something like MySQL's memory tables would be a better solution.. but postgres doesn't have anything similar. So this is as close as you can get.


Thanks for writing back! I don't love either of those options. Maybe I should just make a separate cluster for my test database?!: `initdb -D /pgramdisk` I wonder if anyone has tried this, and what the drawbacks are.


and when you use alter table unlogged.. that's actually what it does anyway -- it rewrites the entire table (holding an exclusive lock the entire time).

So this is really just a convenient syntax.


This was recently discussed on the django mailing list as well and resulted in them putting a quick mention about it in the docs:

https://groups.google.com/d/msg/django-developers/IkRgMxTTzP...

I think the performance gained varies quite a bit. I gave it a quick try and saw very minimal gains locally... still need to look into it some more.


Your tests may be slowed down by initdb more than anything. If you specify "initdb -N", it will skip the fsync at the end of initdb.

You can also turn off fsync in postgresql.conf to stop fsyncs for transactions/checkpoints.


Genuine question : why is this on HP's website ? Are they a Postgres contributor or use it massively internally ?


They have big enterprise software and consulting businesses.



What's the easiest way to try out an alpha version of postgres? I can't find a ubuntu or homebrew pacakged build of it... easiest to build it from source?

edit: seems like the alpha release is available on the downloads page now: http://www.postgresql.org/download/


Building Postgres from source for testing is relatively simple - at least on Ubuntu / Debian.

    useradd --home-dir /home/postgres \
        --create-home --shell /bin/bash \
        --user-group postgres

    apt-get install -y build-essential libkrb5-dev \
        libxml2 libxml2-dev libxslt1-dev \
        libossp-uuid-dev uuid python-dev \
        libreadline6 libreadline-dev

    ./configure --disable-debug --enable-thread-safety \
        --with-gssapi --with-openssl \
        --with-libxml --with-libxslt \
        --with-ossp-uuid --with-python \
        --without-bonjour

    make world && make install
    sudo su - postgres -c "/usr/local/pgsql/bin/initdb -D /home/postgres/cluster -E UTF8"


> "Relatively simple - at least on Ubuntu / Debian"

I realize we're all professionals here and technical know-how is expected, but nothing about this is "simple". Brings me PTSD of working w/ academics' alphabet soup of "open source" simulation code written in C, Perl and Makefiles.

Build tools have come along away, maybe not with vanilla Postgres installs, but I guess that's what DBAs are paid to know these days.


I honestly don't see what's so hard about the gp's set of instructions. Your grandma doesn't need to install alpha builds of PostgreSQL (though I shouldn't assume, there are some technical grandmas out there ;) ), for 99% of people out there they can use "apt-get" or "yum" or whatever and can wait until the release.

So let's turn this around, how would you like to see builds become easier?


I was cranky earlier, I just want better documentation. Postgres has slowly gotten better about this over the years.


> apt-get install -y build-essential libkrb5-dev ...

This can slightly more easily done using 'apt-get build-dep postgresql-9.4'. That'll also pull in the tools to build the docs, but I'd consider that a good thing.

> make world && make install

You're building world, but not installing it -> make install-world.


Wow, I would have killed to have just that list last weekend. I spent ~4 hours figuring most of it out myself.



There is a docker container with everything set up: http://www.databasesoup.com/2015/07/test-postgresql-95-alpha...


If you're on Mac, Postgres.app has an alpha build on github: https://github.com/PostgresApp/PostgresApp/releases


Docker


honestly who still produces PDF in A4 for content which 99% of a time will be read on a screen ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: