data engineer here, offtopic, but am i the only guy tired of databricks shilling...

benrutter · 2024-03-27T19:26:38 1711567598

Lord no! I'm a data engineer also, feel the same. The part that I find most maddening is it seems pretty devoid from sincerely attempting to provide value.

Things databricks offers that makes peoples lives easier:

- Out the box kubernetes with no set up

- Preconfigured spark

Those are genuinely really useful, but then there's all this extra stuff that makes people's lives worse or drives bad practice:

- Everything is a notebook

- Local development is discouraged

- Version pinning of libraries has very ugly/bad support

- Clusters take 5 minutes to load even if you just want to "print('hello world')"

Sigh! I worked at a company that was databricks heavy and an still suffering PTSD. Sorry for the rant.

alexott · 2024-03-27T19:42:48 1711568568

A lot of things has changed quite long ago - not everything is notebook, local dev is fully supported, version pinning wasn’t a problem, cluster startup time heavily dependent on underlying cloud provider, and serverless notebooks/jobs are coming

gigatexal · 2024-03-27T21:08:16 1711573696

Glad I’m not the only one. Especially with this notebook stuff they’re pushing. It’s an anti pattern I think.

melondonkey · 2024-03-27T16:32:54 1711557174

Data scientist here that’s also tired of the tools. We put so much effort in trying to educate DSes in our company to get away from notebooks and use IDEs like VS or RStudio and databricks has been a step backwards cause we didn’t get the integrated version

mrtranscendence · 2024-03-27T17:50:29 1711561829

I'm a data scientist and I agree that work meant to last should be in a source-controlled project coded via a text editor or IDE. But sometimes it's extremely useful to get -- and iterate on -- immediate results. There's no good way to do that without either notebooks or at least a REPL.

alexott · 2024-03-27T19:35:49 1711568149

There is VSCode extension, plus databricks-connect… plus DABs. There are a lot customers doing local only development

pandastronaut · 2024-03-27T17:51:07 1711561867

Thank you ! I am so tired of all those unmaintainable nor debugable notebooks. Years ago, Databricks had a specific page on their documentation where they stated that notebooks where not for production grade software. It has been removed. And now you have a chatgpt like in their notebooks ... What a step backwards. How can all those developers be so happy without having the bare minimum tools to diagnosis their code ? And I am not even talking about unit testing here.

alexott · 2024-03-27T19:39:08 1711568348

It’s less about notebooks, but more about SDLC practices. Notebooks may encourage writing throwaway code, but if you split code correctly, then you can do unit testing, write modular code, etc. And ability to use “arbitrary files” as Python packages exists for quite a while, so you can get best of both worlds - quick iteration, plus ability to package your code as a wheel and distribute

P.S. here is a simple example of unit testing: https://github.com/alexott/databricks-nutter-repos-demo - I wrote it more than three years ago.

VirusNewbie · 2024-03-27T19:28:14 1711567694

Spark is pretty well engineered and quite good.

millenseed · 2024-03-28T12:34:09 1711629249

You might be tired, but there's tons of value for enterprises to only use one end-all tool. It's not personal you know.