Have you tried to implement row- and column-based security on direct access to c...

saj1th · on Nov 16, 2021

> Have you tried to implement row- and column-based security on direct access to cloud storage? It flat out does not work.

It is a solved problem. Essentially you need a central place ( with decentralized ownership for the datamesh fans ) to specify the ACLS ( row-based, column-based, attribute-based etc.) - and an enforcement layer that understands these ACLs. There are many solutions, including the ones from Databricks. Data discovery, lineage, data quality etc., go hand in glove.

Security is front and centre for almost all organizations now.

feqgmmr2 · on Nov 16, 2021

This is exactly what FAANGs do with their data platforms. There are literally hundreds of groups within these companies with very strict data isolation requirements between them. Pretty sure something like that is either already possible or will be very soon, there's just too much prior art here.

buzzscale · on Nov 16, 2021

Thats where Databricks comes in though, you can implement row/column based security on your data on cloud object storage and use it for all your downstream use cases (Not just BI/SQL but AI/ML without piping data over JDBC/ODBC).

glogla · on Nov 16, 2021

According to their documentation [1], Databricks does not have this capability even for their own engines, and definitely not for "without piping data".

This is what I've personally seen few times - Databricks claiming they can do something and then it turns out they can't. Buyer beware lying salespeople and HN shills.

[1]: https://docs.databricks.com/administration-guide/access-cont...

saj1th · on Nov 16, 2021

Check out https://databricks.com/product/unity-catalog when you get a chance. There are other solutions in this space as well.

turk- · on Nov 16, 2021

I don’t understand what capability you are saying Databricks lacks. This capability is literally the entire premise of the Data Lakehouse. With Snowflake you need to export data out/or pipe data over jdbc/odbc to an external tool. With Databricks you can use SQL for data warehousing and when you need you can work with that same data using python to train an ML model without piping data out over jdbc (using the spark engine). One security model, one dataset, multiple use cases (AI/ML/BI/SQL) on one platform.

buttaphingas · on Nov 17, 2021

They're still lacking things in the SQL space. For example, Databricks say they're ACID compliant, but it's only on a single-table basis. Snowflake offers multi-table ACID consistency, which is something that you would expect by default in the data warehousing world. If I'm loading, say, 10 tables in parallel, I want to be able to roll-back or commit the complete set of transactions in order to maintain data consistency. I'm sure you could work around this limitation, but it would feel like a hack, especially if you're coming from a traditional DWH world (Teradata, Netezza etc.).

Snowflake now offers Scala, Java and Python support, so it would seem their capabilities are converging even more, but both with their own strengths due to their respective histories.

doppelganger1 · on Nov 18, 2021

Actually, you would expect that in an OLTP world. DW's for the longest time, even Oracle, recommends you disable txn to get better performance. The logic is implemented in the ETL layer. Very rarely do you need multi-table txn in large scale DW.

Snowpark is still inferior.