Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool!

Could you share the key difference between this and the previous pg_analytics, and motivation of making it a separate plugin?



Whereas pg_analytics stores the data in Postgres block storage, pg_lakehouse does not use Postgres storage at all.

This makes it a much simpler (and in our opinion, more elegant) extension. We learned that many of our users already stored their Parquet files in S3, so it made sense to connect directly to S3 rather than asking them to ingest those Parquet files into Postgres.

It also accelerates the path to production readiness, since we're not touching Postgres internals (no need to mess with Postgres MVCC, write ahead logs, transactions, etc.)


If users are already having datalake kind of system which is generating parquet files, the use case to use Postgres to query the data itself is questionable. I think having Postgres way of doing things should be prioritised if you want to keep your product in unique position.


Can you elaborate on what you mean by the "Postgres way of doing things"? Also, what is wrong with using Postgres to query data in external object stores? It is a common occurrence for businesses to store parquet artefacts in object storage, and querying them is often desirable.


It depends. If you're happy with Databricks, etc. you might be good. But we've seen many users want the simplicity of querying data from Postgres for analytics, especially in case of JOINing both analytics and transactional data




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: