Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Databricks cofounders weren’t interested in starting a business (2021) (forbes.com/sites/kenrickcai)
74 points by hbarka on Jan 20, 2023 | hide | past | favorite | 76 comments



> Down the line, $100 billion is not out of the question, Ghodsi says—and even that could be a conservative figure. It’s simple math: Enterprise AI is already a trillion-dollar market, and it’s certain to grow much larger. If the category leader grabs just 10% of the market, Ghodsi says, that’s revenues of “many, many hundred billions.”

Um... isn't that the same fallacy they trotted out before the dot-com bubble of 2001? [1][2]

1. https://www.inc.com/erik-sherman/the-1-percent-fallacy-that-...

2. https://www.linkedin.com/pulse/why-1-10-billion-100-million-...


Naturally. They're trying to keep a deflating tire inflated.

It's one of the worst con sales pitches you can ever see as an investor.


Upvote for the metaphor. Because how do you keep a deflating tire inflated? Pump, and pump, and pump some more.

(Apologies if I'm overstating what might be obvious to everyone else.)


I mean I know people have a short memory span, I guess I'm just surprised it's this short. I would have thought that most folks with even a cursory knowledge of Silicon Valley history know about the 1% fallacy. I'm not even a founder, and even I know what it refers to.

This quote should have been a giant red flag for Kenrick Cai and Forbes. It should have single-handedly changed the angle of the story from puff piece to investigative. To see them both uncritically regurgitating something so thoroughly disproven is another depressing nail in the coffin of modern-day journalism.


> Enterprise AI is already a trillion-dollar market...

It is? Source? Evidence?

> ... and it’s certain to grow much larger.

Same questions, with at least as much skepticism.


Databricks unique selling point is a scalable data lake. The AI functionality is obviously used as well, however, its real use case is inflated by a lot of hot air.


> “We would tell them, ‘Just take the software for free,’ and they would say ‘No, we have to give you $1 million.’ ”

I’d bet that they actually said “no, we will have to spend $1million to get it ready for us to use”


Some weird stuff surrounding Databricks.

I know multiple AE level folks who have left big cloud companies (ie Google, Amazon, SFDC, etc) for literal 7 figure pay packages at Databricks and then vaporized sometime there after.

Lots of NDAs involved!


What do you mean vaporized? Never heard from them after they joined Databricks?


Maybe Databricks has its employees work in a sterile, underground office and click on "scary" numbers all day.


And when they leave for the day, it’s as if they never worked on anything at all!


What's ae level


Account executive. Aka Sales rep.


Account executive - very senior salesperson.


The most interesting part of this article to me (from mid-2021) is the view of the world's most valuable startups:

- ByteDance $140b - Stripe $95b - SpaceX $74b - Didi $62b - Instacart $39b - Klarna $31b - Epic Games $28.7b - Databricks $28b - Rivian $27.6b - Nubank $25b

Times change...


That's missing Canva, which at the time was $40B, putting it 5th on that list. Canva is now in the $20-24B range.


what happened to their valuation?


> what happened to their valuation?

Figma got acquired by an enterprise superpower, simultaneously creating a Kraken of a competitor and closing a path to exit. Canva are also one of the worse companies with shareholder liquidity, which I know in at least a few cases lead to fire-sale dispositions.


Do those events change the fundamentals though? Canva's bottom line is still excellent.

The previous valuation was just PE froth. Canva just raised money at the top of the market at an absurd valuation, but it's the cash flow that matters.


I really don't know what the person is on about.

> Canva are also one of the worse companies with shareholder liquidity, which I know in at least a few cases lead to fire-sale dispositions.

"dispositions"? Makes no sense in the context. I worked at Canva and got to sell 20% in 2020 and 2021, the latter at the $40B valuation. So as a shareholder I'm relatively happy with the amount of liquidity in the stock.


> "dispositions"? Makes no sense in the context

You’re describing tenders. They’re fine for small-scale planned liquidity.

If a fund hits a risk limit, on the other hand, and must sell before the end of the year for compliance reasons, and the company is known to block transfers, the solution is a financially-engineered mechanism that throws off buyers and reduces the price. This liquidity discount gives management more control at the expense of shareholders. It’s also commonly priced into valuation expectations, at least by investors who aren’t all froth.


You're describing a theoretical situation.

While that can definitely occur, I strongly doubt that's what has happened to Canva.


> describing a theoretical situation

Nope.


The same that happened to the valuations of the rest of the companies on the list. There's been a massive haircut across private software companies in the last 12-18 months. Stripe has had a similar fall in valuation.


"Startups"


Yeah all those companies except Nubank were founded at least 10 year ago (Nubank being nine). Epic was founded 32 years ago!


Was my first thought. Looking at Epic on the list my first thought was Steve Buscemi’s meme “how do you do fellow start ups.”


So it is about single private company by founders of Apache Spark. Not sure I'd call it mafia. Further I see that a high performance distributed data processing system implemented in Rust is building up around Apache Arrow. So Databricks lead and hence valuations will be quite difficult to maintain.


Databricks unique selling point is the scalable data lake. The AI functionality is used, however, is hyped with a lot of hot air in this article. I can't see Databricks hitting $100 billion anytime soon.


This reads like a pre-IPO puff piece, nothing more. Without getting into how obnoxious the term is, there's no "mafia" here. And I'm not buying the altruism so vainly on display. Goes to show how modern tech journalism is basically commissioned.


My bet is that databricks will end up like MapR.


This sounds like an interesting contrarian take. Would you care to elaborate?


I was hoping the article would dive into some deeper insight as to why they started the business if they weren't interested.

I'm not "interested in starting a business" either. But I seem to keep doing it. I was wondering why the other day.

My core interest isn't business, but I did enjoy my time as a software engineer and I like technology. I also have a background in design, and love that part. I've worked in marketing and PR, and enjoy that. Finance I'm a bit meh on, but I know how to put together a forecast, worked in accounting for a bit, so I can do the job. I've never worked in HR, but I've managed a few teams.

Anybody else in a similar boat? What got you to start your business? Or why do you stick with it?


I feel similarly. My favorite things about running a business are:

* The variety of problems I get to work on. I write code, I do sales calls, I manage people, I negotiate contracts, I do taxes, you get the idea. I really like learning things and I never get bored.

* I like the intensity. I have to do everything as well as I possibly can, and so I feel like I'm running at close to 100% efficiency at all times. It is very satisfying to feel that you're putting your best work you could.

Both of these things are rare at "normal" jobs. I was lucky enough to have similar experiences when I "worked for the man", but those experiences were never close in terms of quality or intensity.


>Users feed in their data and the AI makes predictions about the future. John Deere, for example, installs sensors in its farm equipment to measure things like engine temperature and hours of use. Databricks uses this raw data to predict when a tractor is likely to break down.

I'm pretty sure this is not what databricks does. Isn't databricks just a way to run queries on unstructured data? Or am I thinking of something else?


You can definitely compute predictions with Spark, for instance: https://www.databricks.com/blog/2020/11/16/how-to-train-xgbo...


Unless they have radically pivoted, databricks is just managed Spark, plus supporting features.

By supporting features, I mean things like their storage functionality (data lake, delta lake, etc) and other libs + tooling.


>They wanted to replicate what the big tech companies were doing with neural networks, without the complex interface.

This is utterly wrong. What data bricks did has nothing to do with neural network. Probably Forbes wanted to mean Hadoop. I am happy for the team though.



Is this mostly just a story about Databricks?

PayPal mafia refers to how many other companies they founded after PayPal, not the fact that they founded PayPal:

https://en.wikipedia.org/wiki/PayPal_Mafia


+ Spark

+ Ray

+ Anyscale

"How Ray, a Distributed AI Framework, Helps Power ChatGPT" https://thenewstack.io/how-ray-a-distributed-ai-framework-he...

New frameworks to new companies.


Also, Opaque Systems


Submitted title from "There was PayPal mafia. Meet the Berkeley mafia". That broke the site guidelines ("Please use the original title, unless it is misleading or linkbait; don't editorialize.")

Since the article's own title is also linkbait, I've replaced it with a representative phrase from the main text.


Should the article title also include "(2021)"?


Added. Thanks!


True, but think of it this way:

If 1 of these 7 Berkeley CS Ph.D.'s decides to leave to found his own company, now that they all have exec experience - what VC would turn them down?

Every single one of them will have top tier startup experience, having been part of the founding team of a huge success story, AND the most technical of their first 10 hires at the same time.


But will it be successful? The PayPal Mafia were successful in their other pursuits. And not just successful, but all founded their own companies valued > than databricks (I think, based on rough memory of their valuations).


It's a mixture, most have not been anywhere near as highly valued as Databricks.

Databricks $28 billion

SpaceX & Tesla (~$120 billion & $421 billion)

LinkedIn sold for $26.5 billion

Yelp $2 billion

YouTube sold for $1.65 billion

Those are the major companies. Most of the PayPal mafia went on to do angel investing or work for VC firms.


Fair point, and I would love databricks guys to do similar because rising tides etc

Still amazing a group all went on to so many things - maybe like the guys at PARC


At least two of them already had that background, though. They weren’t blank slates: “billionaire startup alumni founds second billionaire startup with new colleagues” would be a perfectly accurate title.

> Stoica was an exec at $300 million video streaming startup Conviva, while Shenker had been the first CEO at Nicira, a networking firm sold in 2012 to VMware for about $1.3 billion


> billionaire startup alumni

Only Stoica, Ghodsi, and Zaharia are listed on https://www.forbes.com/real-time-billionaires and I don't think any of them were billionaires before Databricks? Exec at a $300M company or CEO overseeing a $1.3B exit means rich, yes, but not billionaire rich.


I meant person from a startup worth $1 billion, not person worth a billion.


  > a best-of-breed piece of future predicting code called Spark
You read it here first, folks: Apache Spark is software that predicts the future, not just a distributed job runner for in-memory datasets.


Yeah, lets see if it predicts Databricks future.


Yes it does and its bleak. I don't understand the valuation based on assumption that even the smallest startup will need to write some fancy map-reduce spark jobs to do analytics and AI. Most companies are best served by a warehouse like snowflake and a realtime layer for analytics. I don't understand the value add of databricks.


But you don't have to write map-reduce jobs at all? You can just write SQL queries or Pandas programs, and they automatically get parallelized by Databricks. Databricks is a data warehouse (just like Snowflake).

https://www.databricks.com/product/databricks-sql


In a twist, pandas programs don't get parallelized on Spark. Someone had to go and write a parallel layer that duplicated the pandas API, because otherwise you ended up with the entire pandas program executing on a single executor.


there is Pandas on Spark, included into Spark itself (originally Koalas) - the switch to it is very easy, and you get parallelization.


FWIW what we see are whole different categories of workloads. For primarily API-driven microservice workloads, ETL of data stores into Snowflake makes sense. But for primarily batch or stream workloads- implemented literally as batches or as data streams that have varying unit-of-work semantics, and where the target data model isn't read only analytics but read write operational- something like Spark can make a lot of sense.


Damn, thanks for that comment. I was actually wondering if they were talking about the same "spark" i already knew.


>Accidental Billionaires: How Seven Academics Who Didn’t Want To Make A Cent Are Now Worth Billions

We'll see for how long.


I worked for a competitor. This business is built on ignorance and exhuberance and is durable as egg shells.


Can you elaborate? I don't use Databricks, but it seems to me a like a bread and butter SaaS infrastructure offering.


We use Databricks on a data processing pipeline. I don't know of anyone in love with it. It is by far the number 1 source of problems on that pipeline. Just in the last week:

* It deleted hundreds of log files without warning

* We had a failure starting a cluster; the web UI listed the cluster, but in fact it no longer existed - we had to recreate it.

* Log files that it didn't delete show that it is having problems pulling some internal metadata from an AWS IP address (we are on Azure).

If the directive from on high came down that we are to rip it out and replace it with something else, nobody would be surprised, or care.


Databricks founder here. Doesn’t seem like DM work on HN anymore. Do you mind shooting me an email? Would love to follow up and understand the issues.

rxin@databricks.com


Ugh - the disappearing clusters are frustrating


As a data scientist, the greatest thing about Databricks is their marketing and sales departments. Not that they have a bad product, but they're not selling anything brilliantly unique.


Anything they do so well? What’s their moat


Most use cases currently served by Apache Spark clusters would run 10x faster on a laptop with fast SSD (Macbook perhaps), and an ad hoc cat/grep/sed pipeline /s


There's a DuckDB and Polars and similar tools now, which can finally outperform the venerable unix tools. Once the data is on the laptop you can get order of magnitude faster execution than Spark.

The unsolved problems are 1) what if the data and what you do with it suddenly doesn't fit on a laptop (giving everyone 64 GB RAM laptops for example seems like a waste) and 2) how do you deliver the relevant subset of the data from the petabyte place where you store it to the laptop.

If someone could solve that, Spark could finally go to hell.


The solution to that is called Snowflake


Go Bears!


[flagged]


right as if oops they just accidentally/on a whim filed all the paper work and hired HR and accountants... woke one day with a billion dollar company


Boy wasn’t it fortunate that some of them had done all this paperwork and exec-ing before in successful tech startups! They might have really struggled without a shit-ton of business background!


Such a whacky coincidence!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: