Hacker News new | past | comments | ask | show | jobs | submit login

so, alternatives?

Aside from the Azure/GCP/AWS internal offeringa I know about Snowflake and Firebolt, Databricks is new to me.




Redshift is pretty terrible, stay away. AWS is even worse at delivering promises than Databricks and that's saying something.

I heard Google BigQuery is good. It is completely SaaS (like AWS Athena that works).

Unicorns often run their own stack and you could replicate that, if you have the apetite. Netflix and Apple run Trino + Spark on k8s + Iceberg. Uber used their own Hudi thing, not sure if they still do.


Apple is a big Deltalake (and Databricks) customer: https://www.youtube.com/watch?v=SFeBJxI4Q98


"Big" No, not really. They use Deltalake for the security use case, sure, but that pales in comparison to how much Iceberg they use.


https://en.m.wikipedia.org/wiki/Databricks

"Databricks is an enterprise software company founded by the creators of Apache Spark. [...] Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks."


Oracle and Teradata still have data warehouse pitches ;)


maybe clickhouse?


Clickhouse is good if you're building application. It has lot of great features and incredible performance, but there's an expectancy that people using it know what they're doing and can work around its limitations (like limited support for joins and sql in general).

Something like Snowflake works much better when you're building a platform that you can give to two hundred data analysts or various skills spread over fifty teams, so they can build their own stuff. The nice UI, broad feature set (materialized views, time travel, automatic backups, superfast scaling up and down, ...) and general just-work-iness makes it nice for that, but you're going to pay for the privilege.

Databricks is somewhere in the middle - things are way less polished, features don't always work and you still have to figure out things like backups and partitions on S3 on your own, but some people like that. Expect to also pay a pretty penny for hundreds of Spark clusters nobody knows who uses.


When was the last time you used Databricks? You should definitely try it again. Their product offering has improved a lot in the past few years.

> broad feature set

My experience is that the feature sets of Snowflake and Databricks are very similar. Both have time travel support. Snowflake has materialized views, but Databricks has Delta Live Tables. Databricks has a distributed Pandas API, but Snowflake recently introduced Snowpark. Databricks also has autoscaling and they recently launched a serverless offering that makes autoscaling super fast aswell.


Snowflake has much more advanced data security - table, column, and row level security and dynamic data masking policies. The zero-copy cloning is also pretty useful for CI/CD (pretty much the one practical way to do blue-green deployment for data application).

Databricks has some interesting features (we were originally interested in it as "nice UI" for our AWS data lake for citizen data scientists - using it for industrialized processing was price impractical compared to AWS Glue) but the security seems lacking - it goes just table level and only in SQL and Spark, with R you can't have security at all.

I really liked the Databricks UI and integrated visualizations, though, that's where they are better than Snowflake I think. Of course, they gained those by buying open source Redash.io and ending it.

The part that ended our PoC with them was when they gave us a price quote for expected number of users, the management was like "ok that sounds reasonable" until I told them that's just license and does not include EC2 costs - the real cost would be at least twice. That made everyone angry.


If you want Python, ML and SQL to be easily usable together on the same data nothing can touch Databricks.


* Apples and oranges: Clickhouse is a query engine while Databricks is a SaaS product/company. Apache Spark could be compared to Clickhouse, Databricks to clickhouse.com/company. The latter is barely a couple months old.

* Databricks pivoted from analytics to ML and it's not just marketing. Clickhouse is all about OLAP use cases.

* Clickhouse competes with Druid/Pinot/Timescale, Spark competes with Flink.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: