More

buttaphingas · 2024-04-29T07:12:50 1714374770

Snowflake has some examples of that: https://www.snowflake.com/blog/easy-secure-llm-inference-ret... https://quickstarts.snowflake.com/guide/asking_questions_to_... And I assume these new models would be now available for embedding. Is this different to what Databricks offers?

buttaphingas · on Aug 22, 2023

Vendor says their solution is cheaper than competitor solution. I'm shocked.

buttaphingas · on July 29, 2023

Check out 0MQ (ZeroMQ), Informatica Ultra Messaging (formerly 29 West), and aeron.io They create reliable uni/multicast protocols over UDP, including strategies for NAK storms etc.

buttaphingas · on Aug 22, 2022

And to get the very best price for those clusters your you'd need to commit to the CSP for three years!

Would love to know the TCO trade-off between procuring, securing and deploying on your own clusters vs having them managed via SaaS.

buttaphingas · on Aug 22, 2022

It's all around the ethos of ease of use. Snowflake does a lot of smarts in the background so that you don't have the overhead of managing indexes. And not just indexes, there is just less human intervention required overall compared to something like Teradata or even a modern lakehouse.

That said, they've kind of introduced it with the Search Optimization Service, which is like an index across the whole table for fast lookups, but even that is automatically maintained in your behalf.

buttaphingas · on Nov 17, 2021

They're still lacking things in the SQL space. For example, Databricks say they're ACID compliant, but it's only on a single-table basis. Snowflake offers multi-table ACID consistency, which is something that you would expect by default in the data warehousing world. If I'm loading, say, 10 tables in parallel, I want to be able to roll-back or commit the complete set of transactions in order to maintain data consistency. I'm sure you could work around this limitation, but it would feel like a hack, especially if you're coming from a traditional DWH world (Teradata, Netezza etc.).

Snowflake now offers Scala, Java and Python support, so it would seem their capabilities are converging even more, but both with their own strengths due to their respective histories.

doppelganger1 · on Nov 18, 2021

Actually, you would expect that in an OLTP world. DW's for the longest time, even Oracle, recommends you disable txn to get better performance. The logic is implemented in the ETL layer. Very rarely do you need multi-table txn in large scale DW.

Snowpark is still inferior.

buttaphingas · on Nov 17, 2021

But you do still have to secure the S3 buckets, right? And I guess also secure the infrastructure you have to deploy in order to run Databricks. Plus then configure for cross-AZ failover etc. So you get flexibility, but I would think at the cost of much more human labor to get it up and running.

Snowflake uses the Arrow data format with their drivers, so is plenty fast enough when retrieving data in general. But it would be way less efficient if a data scientist just does a SELECT * to bring everything back from a table to load into a notebook.

Snowflake has had Scala support since earlier in the year, along with Java UDFs, and also just announced Python support - not a Python connector, but executing Python code directly on the Snowflake platform. Not GA yet though.

buttaphingas · on Nov 17, 2021

You can use Scala, Java and Python with Snowflake now, as well as process structured, semi-structured and unstructured data. So I guess that means it doesn't fit into the data warehouse category, but is not a lakehouse either.

buttaphingas · on Nov 16, 2021

I actually see them as variations on the same architecture. Databricks keeps their metadata in files, Snowflake keeps theirs in a database, but they both, ultimately, are querying data stored in a columnar format on blob store (and, to be fair, Snowflake have been doing that with ACID-compliant SQL for a lot longer than Databricks). So using SQL over blob at high performance has been around for a while.

Databricks say their solution is better because it's open (though keep the optimizations you need to run this at scale to themselves, i.e. is ultimately proprietary). Snowflake says theirs is better because it's a fully managed service, meaning no infrastructure to procure or manage, is fully HA across multiple data centers by default etc.

Databricks push 'open' but really still want you to use their proprietary tech for first transforming into something usable (Parquet/Delta) and then querying with Photon/SQL, though you can also use other tech. With Snowflake you can just ingest and query, but it has to be through their engine.

Customers should do their own valudation and see which one fits their needs best.

buttaphingas · on Nov 16, 2021

Delta is open source, but Databricks keeps optimizations for themselves as proprietary. I'm not sure why it would be any better than Snowflake's solution, which is automatically deployed across multiple AZs as a fully HA system and gives full ACID transaction compliance across any number of tables (not just per-table).

bpaneural · on Nov 17, 2021

Essentially with Databricks making Delta open source, you can move away from Databricks to EMR or Presto (with their own optimizations) without incurring a data tax. You're also able to move between cloud providers at ease as data sits in low cost buckets.