Hacker News new | past | comments | ask | show | jobs | submit login
Cloud SQL for PostgreSQL now generally available (googleblog.com)
278 points by CoachRufus87 on April 18, 2018 | hide | past | favorite | 105 comments



Started down the path of using this when it was in beta, but had to abort when we saw there was no option to connect to it from Python App Engine Standard.

Now that it's GA...it looks like that hasn't changed. Is the classic, Python, App Engine standard becoming a second class citizen? Or was there some reason why this wasn't considered GA worthy for Postgres?

Trying to understand if going forward Google is trying to push everyone to the flexible environment or not - as I would have really expected connectivity between these two products.


[I am a Googler and my team works on part of this]

You can connect to postgres from app engine standard... as long as its Java. See this doc https://cloud.google.com/appengine/docs/standard/java/cloud-...

And no, appengine standard is not a second class citizen. Hand-wave-ily, the connectivity path that flex uses works for postgres with minimal changes, but unfortunately some additional work is required to get appengine standard for other languages working for postgres. :(


Thanks. Could you please explain in a little bit more detail how it works for Java GAE Standard and not Python?


Thanks for that, but...not using Java.


Last I checked it wasn't possible to whitelist internal IPs (e.g. Kubernetes nodes or VM instances) to access Cloud SQL instances at all -- the options are either to use the non-standard cloud SQL proxy sidecar app, or allow connections from all endpoints (public or private).

This seems like a major omission, and AWS has had this for ages.


From the docs:

https://cloud.google.com/sql/docs/postgres/connect-external-...

> You can grant any application access to a Cloud SQL instance by authorizing the IP addresses that the application uses to connect.

> You can not specify a private network (for example, 10.x.x.x) as an authorized network.

> PostgreSQL instances support only IPv4 addresses. They are automatically configured with a static IP address.


Ah, misremembered exactly what the issue was -- you're right, individual endpoints can be whitelisted. Internal networks cannot, which is what I (or anyone else using GKE) would need, since node IPs are ephemeral.

I believe the same issue would apply to VM instances that are not pets, (in auto-scaling groups for example), since I'm not aware of being able to auto-assign static IPs there either.


there is also a thirth option. A small pod listening for node changes on k8api, that whitelists ips on cloudsql. I have been using this since two years ago.


I wouldn't say second class citizen (just yet) but the docs and Googlers have been gently nudging people to the flex environment. It can do everything the standard one does and more, so there's really very little reason to stick around on standard.


Well if they provide a free tier for flex, that is equal to standard, I wouldn't bother switching.


Cost maybe?


Why not use AWS though? It is so much better, reliable and overall super cheap.


Is point in time recovery available now in GA? I had checked it a couple of weeks ago for the beta service and it was not available. I think for a managed DB hosting point in time recovery is a critical feature.


[I'm the Cloud SQL TL] No, it isn't. We agree with you that it's an important feature for managed databases, and we're working to get it right. We decoupled it from this launch to get PostgreSQL to GA faster.


Unrelated to pg but could you badger the spanner team to make a mini spanner product :)

Related to postgres. We have many many concurrent connections but a load satisfied by an n1-standard-4 atm do you recommend a connection pooler or something to help us get down to the 100 to 200 connections we need to be at to use cloudsql?


Connection pooling is recommended for any non-trivial PG deployment. I can recommend pgbouncer, worked flawlessly for us.


[I'm the Cloud SQL TL] We do recommend connection pooling whenever possible. You save server resources, but you also save connection latency.


A mini spanner product in what sense?


What's a TL?


tech lead?


(I am also a Google employee, totally unrelated to this product).

Yes, "TL" is Google jargon for Tech Lead.


Does anyone have insight or experience using this in production? We're currently running PostgreSQL 10 w/ pg_partman on our own hardware but looking at a several options for cloud migration. Unfortunately, Citus Cloud on GCP doesn't appear to be an option (yet?)

- Google Cloud SQL (PostgreSQL)

- Citus Cloud (AWS Only)

- Citus (managed ourselves) on GCP


We've been running a production workload on Postgres/Google Cloud SQL for about half a year now.

While things are good for the most part, a couple of serious problems related to connectivity have us completely boggled. We're connecting from Google Kubernetes Engine, which seems like it should be a standard combination, but run into constant problems that we've dumped many many hours into debugging.

We still haven't figured this problem out. I've found the docs to be very weak on Google's part. A lot of the troubleshooting tips are not very helpful (and can consist of unhelpful broad strokes like "be sure to use indexes!"). Because Google Cloud is not as popular than AWS, there is less community guidance from others. And what guidance does exist is often in forum threads that feel less than reputable. There's a big push to try to get you to talk to sales rep that are not technically knowledgeable and just try to upsell.

Very frustrating. Unclear if moving back to AWS, or hosting our own Postgres, would help.


Could you elaborate a bit on the connectivity problems? What is your setup? CloudSQL Proxy, any loadbalancers?


[I'm the Cloud SQL TL] Note that we currently only support PostgreSQL 9.6. Obviously supporting major versions across both MySQL and PostgreSQL is a priority for us.


Hi, Craig from Citus here. If you're interested in Citus being available on other infrastructure provides aside from AWS as a fully managed service please feel free to reach out to me directly craig at citusdata.com.


We did, but now use https://aiven.io

Highly recommended if you want a fast and featured managed db service.


what's the benefit of using them vs. AWS or GCP directly?


Latest software (using v10.3), better performance (nvme SSDs), better backups (point in time, instant cloning), better features (more extensions, cross-region replication even across different clouds), better flexibility (migrate master across different clouds), better monitoring (logs and datadog metrics export), and more focused support with a smaller team.


AWS and GCP both have nvme SSDs for instance types intended for big-ol DBs

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-inst...


Yes, I know. This thread is about managed database services, specifically about Aiven vs cloud-direct. Aiven still runs VMs on AWS or GCP but offers nvme disks while neither RDS nor CloudSQL have that available.


Ah I see, i3 isn’t in the RDS pool yet. I imagine it will be eventually?


It's way more expensive. Have you found ROI there ?


For our needs, yes. They run on cloud VMs so there will be a markup but their startup-4 and higher plans on GCP use local NVME SSDs so we get much better performance for the price.

We also make use of cross-regional replicas and are looking at doing it across clouds so if you want that then there isn't any other option other than doing it yourself. It's more of the complexity of this deployment rather than raw db size for us so if you have several TBs then maybe it's not the best fit.


Interesting. I'm curious how they compare to Citus and if there are any tight integrations with their over services (managed Kafka or Elasticsearch)



citus isn't simply postgresql. It has many tradeoffs


What do you mean? It's just an extension to PostgreSQL and runs anywhere that allows you to load it.


We get that, but I left off (for brevity) the reasons why Citus would be a net positive for our use case.


I would appreciate if you could list some of those differences and tradeoffs, for some of us who are interested in Citus but haven't yet had time to look at it in more detail. Thanks!


We actually just migrated off yesterday... we went with an independent vendor like Aiven instead of the clouds because they move too slowly and don't have enough features.


Like aiven or aiven? :) I’m not aware of too much competition in this space. We’re happy with aiven so far, but they’re quite unknown / under the radar it seems...


They're the best we've found as well, but there are lots of managed database providers of all kinds. https://www.compose.com is another big one.


We’re using compose for redis, but we might switch it to aiven as well, despite aiven having some missing functionality...

For Gcloud and especially in Europe, I think these two are the only options that I’m aware of (at least for PG and redis)


If you want managed Redis, there’s nothing better than RedisLabs


Yeah, I was practically begging them to offer service on GCloud in europe ... :) but they don't cover it...


[I'm the Cloud SQL TL] What features would've kept you on Cloud SQL?


Major: Upgrade to latest version, 9.6 is 18 months old. Cross-region replication.

Minor: less downtime for maintenance, point-in-time restore

The rest is summed up here: https://news.ycombinator.com/item?id=16872723


How do you deal with custom extensions (for example plv8)?

I started off with heroku and they don't support the same subset:

https://cloud.google.com/sql/docs/postgres/extensions

https://devcenter.heroku.com/articles/heroku-postgres-extens...


They don't have a great story around extensions - the ones they do have are unsupported/buggy. For example, PostGIS is missing "ST_FromGeoJSON" because it was compiled with the wrong flag - and has been this way for over a year despite hundreds of user complaints.


As someone who was migrating from on-premises to gcp, and who needs extensive postgis support, this information is a deal breaker for me. Is there any place I can find more information about postgis and other extensions status? Are you aware of any other bug related to postgis?

Thanks a lot for the information


Aiven (https://aiven.io/postgresql) supports PostGIS on GCP if you are looking for an alternative choice.


Disclaimer: I work on GCP

Can you point me to the complaints? I will take a look.



This is what I expect from Google. They are not customer-centric and no-one is really prioritising or cares about customer feedback.


You clearly have never talked to GCP support. They're in a different league from AWS support.


They're all the same quality. Sometimes great, sometimes terrible, but will get the job done as long as you put in the effort. Also bigger customers will obviously get more.

GCP does win on pricing now with role-based support that is a flat rate per user: https://cloud.google.com/support/role-based/


As someone considering GCP, could you relate a story or two about GCP support’s technical competence specifically?


Still working for AMZN?


It appears not since 2011, according to the information sources you would have looked through.


https://cloud.google.com/sql/docs/postgres/extensions

Looks like a preset list of extensions. I'd assume custom extensions would be very difficult to support in managed postgres.


Why is that?

Systems are built for extension, to not allow it deprives one of essential qualities.


at the least, they were waiting for timeouts to appear in plv8, which came in at 2.3.1 - I do not know the current status of it being brought into cloud sql though.


Will we be able to use the Citus extension? Been itching to get Citus running on GCP.


Does any major cloud provider support it?


My understanding is people are able to run and manage Citus themselves on GCP (not Cloud SQL), but Citus Cloud (the managed solution) is only available in AWS.


They do not support the Citus extension currently and custom extension loading is not allowed. You would have to run your own Citus install using VMs.


I'm looking around for more details on these "regional" disks that replicate between two zones at the block level. Is that just a fancy term for os level mirrored disks using the cloud persistent disks?


There's a previous blog post[1] about HA replication. It uses block-level replication managed by PD infrastructure.

[1] https://cloudplatform.googleblog.com/2017/11/Cloud-SQL-for-P...


Block device based replication for Postgres seems a bit unconventional given that Postgres has native synchronous replication support with WAL streaming.

Intuition tells me that you might get better performance if you let the DB itself do the replication but I can't really justify that without real review of what happens.

The postgres docs (https://www.postgresql.org/docs/10/static/different-replicat...) say that the WAL solution has no "Master server overhead" in contrast to the File System Replication solution, but it's not explained and I'm not sure what is meant by that.

I guess with a block device based solution, recovery takes longer, because failover entails you have to actually mount the block device (as no 2 machines can mount it rw at the same time), and then start the DB (or in a more basic implementation, just boot the entire second machine as part of failover), while with WAL streaming both postgres instances would already be running? Wo failover would be faster with WAL streaming?

I would be great if somebody from GCP could elaborate what the tradeoffs here are, how long failover takes, and whether we can expect similar performance and behaviour as with WAL shipping.


Amazon's Aurora Postgres database does a similar thing: your master in one zone replicates to a disk that is in all the other zones. Unlike normal Postgres RDS instance it also auto-scales storage to what you use.

Amazon claims better scaling then ordinary Postgres for this.


Just speculating but it’s possible the block level is faster because it’s replicated over a dedicated and optimized SAN rather than (potentially) contending with normal network traffic. I assume the database state would only be crash consistent though.


I read that, but it doesn't answer the question. What are these regional disks, and can they be used directly?


A regional disk is a logical disk that is synchronously replicated at the block level across exactly two zones within the same region [1]. Since the disks are always identical, with no replication lag, the HA control plane can seamless fail over the whole database to a new master that plugs into the same disk. It's all in the article.

Regional disks aren't publicly available yet, but they are in alpha [2]. Like normal persistent disks, everything is backed by Google's internal Colossus system [3].

[1] https://cloudplatform.googleblog.com/2017/11/Cloud-SQL-for-P...

[2] https://cloud.google.com/sdk/gcloud/reference/alpha/compute/...

[3] https://cloud.google.com/files/storage_architecture_and_chal...


Great news! This has been a long time coming (over a year since the beta started), looking forward to migrating my MySQL instances over.


Great to see all three major cloud providers offering this now. Should be good for pricing!


Has anyone used the beta and got any feeling for how maintenance downtime impacts things? A bit nervous about how you can only set a "maintenance window" and not be able to plan ahead for disruption; as far as I can tell, they won't even tell you ahead of time. The HA seems really solid (zero-lag "regional disks"), but it's still a bit disconcerting.


The updates take the entire instance down for 2-5 minutes each month. While you can't avoid them, they can be scheduled for particularly low traffic times. If you're trying to avoid downtime, its a giant PIA. Even with HA enabled, you still lose master, slave and read replicas. Not entirely sure what they define HA as, but a mandatory monthly downtime doesn't usually fit into mine.

[Update] That said, from what I understand, they have a road map to maintaining read replicas and queued writes. Not sure what the date on it is though.


[I'm the Cloud SQL TL] I can't comment on timelines, but we're aware that customers are interested in more features around maintenance window scheduling, deferral, and notification, as well as shorter downtime for updates and smarter scheduling within a group of replicas.


Can you confirm that it's impossible to avoid downtime, even with HA, because of forced updates?

Surely that's what HA is? no downtime as you update each node one at a time?

If it's impossible then it's a dealbreaker.


[I'm the Cloud SQL TL] Confirmed. We know it's a problem that we need to fix. HA reduces downtime in unexpected failure cases (live migration for your primary only helps in planned shutdown cases, not if the physical machine fails), but doesn't currently help with maintenance-related downtime.


What's the point of HA if there is a still a maintenance downtime?


Unfortunately last time I used CloudSQL for MySQL it was incredibly unstable. They would take down our master AND standby at the same time for maintenance. When we filed a ticket they just said it was a known bug with no plans to fix.

A major client of mine migrated to AWS because of this and other issues.


I've been thinking about moving us to Google's Cloud Platform. What I found in regards to maintenance here: https://cloud.google.com/compute/docs/regions-zones/#mainten... states that they do live migrations without any down time. Can anyone elaborate? Is this only for Compute Engine? In that case, if one can run postgres on a Compute Engine instance, why not do that instead? Surely, if one can setup a highly available postgres cluster, Google can do updates without affecting uptime???

To be fair, we wouldn't use GCP for anything but virtual servers and storage replication... I have no desire to tie us to Google's infrastructure any more than necessary.

Were your master and standby in the same availability zone? Can't you set diff maintenance windows? WTF?

https://cloud.google.com/sql/faq#maintenancerestart

According to the link above, you can taper your upgrade windows, it looks like.


"Live migration" refers to how Compute Engine transparently migrates a VM to another physical host [1]. Disk and memory is copied over, and they have some ridiculous technology that keeps network connections alive and re-attaches them to the new VM when it's been switched over, so that it causes, in principle, zero disruptions. This is much more magical than other providers, such as AWS and DigitalOcean, where such a migration results in a reboot.

You can run PostgreSQL on a VM just fine. You just have to manage itself. Cloud SQL comes with some upsides (zero management, spectacular HA failover capabilities) and some downsides (lack of extensions, lives on a separate network, no control over maintenance window); you have to decide what you're willing to live with.

You can set the upgrade window, but it can't be predicted. What you can control is the order — e.g. set your staging instance to "early" and production instance to "late", then hopefully staging should be upgraded first and you'll know ahead of the production upgrade if any issues arose.

[1] https://cloud.google.com/compute/docs/instances/live-migrati...


> they have some ridiculous technology that maintains network connections and re-routes them when everything switches to the new VM

indeed, this is the primary reason i wish to switch. i have no problem maintaining our own stuff, we do that anyway. :) thanks for the details.


If you (or the parent) are interested in some details about that ridiculous technology, there was a paper in NSDI this year: https://www.usenix.org/system/files/conference/nsdi18/nsdi18...

(disclaimer: I'm one of the many authors on the paper, although for building parts of the underlying tech, not writing the prose)


GCP has the best compute, storage, and networking of all the clouds. They are cheaper, faster, more scalable and more reliable than the others. Their managed services leave a lot to be desired (beta status, non-standard interfaces, and other limits) but if you're just looking to run VMs then that is the perfect fit for their cloud.

We consolidated everything on GKE now which lets use use VMs but still have the kubernetes control plane looking after things for us which has been great so far.


Maintenance windows are set for the cluster, not single instances. We were distributed across 3 AZs and Google had no suggestions for mitigating the ~5 minutes of downtime we were seeing every week or two.

The whole experience was so amateur and unprofessional it really soured me on GCE. They do have some cool tech but it seems like their cloud division needs to mature a bit.


There is disruption yes. It's usually short however we always see retries in our logs for a few minutes. Our app doesn't need perfect uptime though and we haven't tried the HA setup.


Postgres wire-compatible Spanner when? :P


Wondering what kind of use case you would have for this compatibility?


I know that it's definitely not going to be 100% the same (especially since Spanner doesn't even support SQL DML right now), but I think a drop-in replacement into a managed autoscaling database is a really nice alternative to manual sharding.

Right now basically the options are Aurora, Citus, and running CockroachDB yourself.


Last time I was at Google for a workshop (If you have the chance to visit Google, do it. The food alone is worth it), they didn’t seem to push CloudSQL a lot, because they wanted to guide people more in the spanner direction. Without a solid RDS counterpart however, I don’t think bigger companies will consider moving from AWS. Happy to see they changed their mind and continue to expand their SQL services. The competition by Google put a lot of pressure on AWS who seemed to be gotten a bit lazy. Google was ahead of the game with their global load balancers and network speed and quality. Now AWS countered with their 5th series C5/M5, which solve the bandwidth problem of the smaller C and M instances


Finally! Nice to see more PG support. Any news about PostgreSQL 10 support?


I asked that yesterday in a meeting with my gcloud rep, they said all work has gone into getting to GA, and once that is done, look for them to start doing things like updating, more new features, etc.


Since there seems to be some Googler who work on Cloud SQL here, I wonder is there any chance Cloud SQL will be available in asia-southeast1 soon? It's the only region (I believe) where Cloud SQL isn't available at all, and one of the main reasons we can't fully migrated from AWS to GCP just yet.


[I am a Cloud SQL-er, and worked on region expansion... among other things]

Cloud SQL has only been available in regions with at least three zones (since we believe that is the minimum to make sure we can maintain HA in the event of a single zone failure). asia-southeast1 currently only has two zones, when a third zone is launched, Cloud SQL will become available in that region.


Thank you. That makes sense and totally understandable. Looking forward to when there's third zone in asia-southeast1. :-)


I tried using this, but couldn't get the root certs to work. So I went to AWS RDS instead. A pity, it's much cheaper ( especially in beta pricing )


[Cloud SQL person, SSL+connections is my jam]

When you say you couldn't get the root certs to work... what do you mean?

Cloud SQL automatically generates server certificates, and we offer UI+API for creating additional client certificates. The two should not share a root CA.


Do these use the standard persistent disks (capped at 240mb/s)?


[I am a Cloud SQL-er]

Yes, you can use both standard and SSD persistent disks. If you create a larger instance with more vCPUs and a big enough disk, you can achieve greater than 240mb/s, see the docs:

https://cloud.google.com/compute/docs/disks/performance#ssd-...


How expensive is this compared to, for example AWS RDS?


The cost is virtually the same.

A db.t2.small instance should compare to a db-pg-g1-small instance. Pricing is around 90$ on AWS vs 93$ on GCP.

I’m an AWS consultant so I could have messed up on the GCP instance type.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: