Why is Snowflake so Valuable?

physcab · on Sept 30, 2020

I've used Snowflake a fair amount. It's a decent product, probably on par with Redshift / BigQuery. Obviously theres a lot of hype and free money floating around but my take on why they are popular is that they are basically a replacement for large Hadoop installations that have become untenable to manage over the past decade. If a company is already using Redshift or BigQuery I'm not sure why they would switch.

I would be apprehensive in investing in Snowflake long term purely because their product is highly susceptible to being obsoleted in the next 5-10 years.

cgenschwap · on Sept 30, 2020

I was at a company that switched from Redshift to Snowflake. It was a night and day difference. Faster (orders of magnitude!), cheaper, and significantly easier to work with (since everyone had their own personal view of the data to mutate/work with).

As far as I can tell, it is a unique product in the database space. Extremely well executed ideas and design.

MrPowers · on Sept 30, 2020

Snowflake seems like a unique product and I can only imagine the complex math they're doing under the hood to achieve these incredible query times. memsql is the only real competitor I know of. Redshift is a lot less user friendly (constant need to run vacuum queries). Parquet lakes / Delta lakes don't have anything close to the performance.

Predicate pushdown filtering enabled by the Snowflake Spark connector seems really promising. Lots of companies are currently running big data analyses on Parquet files in S3. Snowflake has the opportunity to grab a huge slice of the big data market.

ruang · on Oct 1, 2020

What kind of math is involved in building a faster database? Genuinely curious. I would guess maybe linear algebra, indirectly.

acidbaseextract · on Oct 1, 2020

Not at all. I'd highly recommend CMU's 15-445/645 Intro to Database Systems course (sponsored by Snowflake lol) because they put all their lectures online on YouTube [1]! Here's what's involved in making fast databases from the syllabus [2]:

This course is on the design and implementation of database management systems. Topics include data models (relational, document, key/value), storage models (n-ary, decomposition), query languages (SQL, stored procedures), storage architectures (heaps, log-structured), indexing (order preserving trees, hash tables), transaction processing (ACID, concurrency control), recovery (logging, checkpoints), query processing (joins, sorting, aggregation, optimization), and parallel architectures (multi-core, distributed). Case studies on open-source and commercial database systems are used to illustrate these techniques and trade-offs. The course is appropriate for students that are prepared to flex their strong systems programming skills.

[1] https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...

[2] https://15445.courses.cs.cmu.edu/fall2020/syllabus.html

ponker · on Oct 1, 2020

Oof... CMU courses directly sponsored by Snowflake. Gross.

red_admiral · on Oct 1, 2020

Please elaborate? I can see a lot of ways a sponsored course could go badly, but I can't immediately see which ones apply here.

ponker · on Oct 1, 2020

I'm not qualified to evaluate this particular course. But any time there is a corporate sponsor of a course, it provides strong incentives to the professor to not harm that sponsor at a minimum. If there's a methodology that the professor would like to teach, but that sidesteps, or calls into question, the sponsor's main offering, then that content is in jeopardy. The corruption will always take root given enough time, so that's why editorial and advertising, or academic content and corporate sponsors, etc. should always be at arm's length. Snowflake should give money to CMU to fund "database-related research and teaching" and the university should decide what to do with it. There's still a possibility of improper influence, but it's harder to achieve. This is particularly bad because it's CMU and not University of Phoenix... CMU is in the highest echelon of computer science universities, so it's sad to see it so debased.

What if Kodak sponsored an imaging class in 1990... what do you think they would have said about film vs. digital photography?

Rochetshipz · on Oct 1, 2020

A lot of ML classes at CMU (and probably other prestigious campuses) are sponsored by AWS or GCP through cloud credit donation, including the popular Cloud Computing class. Is that any different ?

ponker · on Oct 1, 2020

Not really. Cloud computing has a lot of benefits, but a lot of risks and drawbacks. Who is sponsoring a class to teach about those? About keeping users’ data private by building your own infrastructure? CMU is actively tilting their students, who are the top CS students in the world, towards cloud computing, based on the choices of these sponsors.

mensetmanusman · on Oct 1, 2020

Sounds kind of conspiratorial.

I think any increase in educational content is good, even if ‘bad actors’ are funding it.

ponker · on Oct 1, 2020

Bad actors funding it always leads to bad actors writing it. Then it's hard to argue that an increase in its quantity is good.

javajosh · on Sept 30, 2020

>I can only imagine the complex math they're doing under the hood to achieve these incredible query times

Maybe its cynical/paranoid, but in this age of Theranos I must ask: is it possible their algorithm excels at showing you a reasonable looking number, rather than an accurate one?

dumbfounder · on Sept 30, 2020

It's SQL, if they were giving wrong answers people would notice.

kgbdrop1 · on Oct 1, 2020

It's not too terribly difficult to load test Snowflake to get a sense of scaling. Jmeter does the job well. Heck I can pass you along some sample projects I've done against them if you really wanted.

jeffffff · on Sept 30, 2020

yeah redshift is not at all comparable to snowflake. big query is much closer, it's ahead in some areas and in the last year has closed some of the gaps where it wasn't. big query's biggest problem is that it's tied to gcp which is a distant 3rd in cloud marketshare. they have big query omni coming which is multi-cloud but it'll probably be a while before it's comparable to big query in gcp.

philjohn · on Sept 30, 2020

The other problem with BigQuery is that you can very easily write a query that's going to cost you a lot of money to run - with Snowflake you can let it run for an hour or so, and then realise it was a bad idea and you're only out a few credits, a handful of dollars.

The killer feature for me was the query profiler - you can see WHY a query is taking a long time and optimise it - BigQuery just felt like Google were brute forcing the performance, and then charging you accordingly.

When the project I was on switched, the micro-clusters (and the ability to recluster a table) as well as the MERGE semantics beat BigQuery hands down - although those features my be out of beta now (but I've moved on to a new gig).

m0zg · on Sept 30, 2020

That's also a problem that it'd be fairly straightforward for Google to solve by automatically spinning up smaller, entirely separate serving clusters for customers who are worried about such a blowout (for a fee, obvs). It's just the serving tree (+ whatever in-memory storage service they use to do distributed joins nowadays), no need to duplicate the rest of the service. The caveat is, a smaller cluster will favor query optimizations specific to that smaller cluster. Some of those "small cluster" optimizations could hurt query performance when deployed against BQ proper with its tens of thousands of workers.

Also, BQ does explain the query plan to some extent: https://cloud.google.com/bigquery/query-plan-explanation. Not quite at the level of a "regular" SQL DB, but it does give you some info to work with when optimizing queries. If you haven't used it in a while I'd give it another try.

quirmian · on Sept 30, 2020

I believe this is exactly what slot reservations in BigQuery achieve. Instead of paying on-demand pricing that is determined by data read, you purchase a fixed number of “slots” that are shared by queries running within that particular project.

m0zg · on Oct 1, 2020

Ah OK, after reading their docs I see they've changed what "slots" used to mean in Dremel (internal version of BQ). It used to be that slots _guaranteed_ capacity, but did not limit it. Meaning that you could rely on having a certain number of workers in the cluster when you issue a query, but if Dremel had more it'd give you all it's got. Obviously this is not viable when people have to pay per terabyte read, because a ton can be read.

What they have now strikes me as an even better solution to the problem of bankrupting someone with a query IMO. Not sure how pricing compares to redshift et al, but pricing is the easiest thing for Google to change.

manigandham · on Oct 1, 2020

Slots don't control how much data you consume, your query does.

If you need to read a terabyte of data to answer your query then more slots only gets it done faster.

statusgraph · on Sept 30, 2020

BQ Slots lets you do essentially that (pre-commit to a particular cluster size)

theptip · on Sept 30, 2020

I was hitting some rough edges / complexity with BigQuery's MERGE recently, but wasn't able to ascertain any significant difference with Snowflake by scanning their docs briefly -- what aspects of the MERGE semantics are better in Snowflake in your opinion?

Wondering if this is a somewhat new feature in BQ since you used it, or if there's still a feature gap here (e.g. see https://cloud.google.com/blog/products/gcp/performing-large-...).

5ersi · on Oct 1, 2020

BQ has per-project and per-user cost controls. Normally when running new large queries one would run them under a special user with a limit on costs.

ghgdynb1 · on Sept 30, 2020

I think the obsolescence issue is complicated.

I recently saw a criticism of Palantir which went: "The company has largely succeeded, they say, not because of its technological wizardry but because its interface is slicker and more user friendly than the alternatives created by defense contractors."

A lot of the most successful tech firms started post-dot-com are decent interfaces to not-particularly-revolutionary databases. In high-end consulting and investment banking, appearances are hugely important. You can't have trash decks. It's unsurprising to me that the same is true in defense and intelligence. You can get a roof over your head and breakfast at a trashy motel or the Ritz. Everybody knows the Ritz can command a much higher price because "its interface is slicker and more user friendly than the alternatives."

I think the same thing is true here.

novok · on Oct 1, 2020

The ritz has far better beds, cleaner & safer rooms, better food and is far more likely to deliver that consistently. It's not just the appearance.

ghgdynb1 · on Oct 1, 2020

A closer reading will reveal that I'm not talking about superficial appearances, but the interface. That's an important distinction.

When I start talking about the Ritz and high-end consultants, I'm discussing the interface, which of course includes the "far better beds, cleaner & safer rooms, better food..." and consistency you're trying to contrast with appearance. I would agree that those things are more than superficial and are extremely important to the experience of the user, because that's exactly the point I'm making.

The beds and concierge are nicer at the Ritz, and the interface (note: not appearance) and support are better at Palantir (or, as we're discussing here, at Snowflake).

ethbr0 · on Oct 1, 2020

Maybe your Ritz experiences have been different than mine, but IMHO all hotel rooms are concrete boxes with a facsimile of home stuffed inside them, copied and pasted as many times as local demand will merit.

Hotel restaurants are the same principle, except replace furnishing with food.

Spooky23 · on Oct 1, 2020

Stay at an aging Courtyard Marriott. Some boxes are nicer than others.

ethbr0 · on Oct 1, 2020

I've stayed at everything from a Motel 6, to Courtyards / Residence Inns / Sheratons between NYC and San Diego, to Four Seasons / Ritz Carltons.

I stand by my claim. The relative differentiation in niceness is swamped by their mass produced boxness.

Ironically, my favorite road chain tends to be Aloft. At least they're upfront about their capsule-esque nature, in a sort of ironic/not-ironic way?

Least favorite: Embassy Suites. shudders It's like every Disney vacationing family's fantasy about what a hotel should be... packed with every Disney vacationing family. Omelette?

peferron · on Oct 1, 2020

The point of hotel chains, and chains in general, is the consistency of the mass-produced experience. I can walk into a DoubleTree hotel anywhere in the world and get the same welcome cookie. It's a positive, not a negative; people often enjoy knowing what they're going to get. If you prefer a more unique experience, which is perfectly understandable, then simply avoid chains perhaps?

ethbr0 · on Oct 1, 2020

That's my point, but extended: I feel like walking into any hotel chain (including different product tiers and luxury brands) gives effectively the same experience.

Don't get me wrong, there's a benefit to consistency of product (especially when you travel Su-F for consulting).

But that benefit, parent company consolidation, and economies of scale drive a net result of overwhelming homogeneity.

Spooky23 · on Oct 1, 2020

Totally get your point of view, and I share it in vacation contexts.. As the hotel chains have consolidated, they slice pennies everywhere.

When I'm travel for business or putting my head on a pillow on a roadtrip, consistency makes my life easier and less stressful. I'm a gorilla-sized person :), I would rather stay at higher end hotel that provides an actual bath sheet than a marriott whatever where I have to call for 6 towels. Surprises aren't delightful at 10PM when you've been on the road for 15 hours.

xapata · on Oct 1, 2020

I got eaten up by gnats (they claim not bed bugs) over a week at a particularly nice hotel. On the plus side, nothing came home with me, the bites healed, and they gave me enough "points" as compensation to cover a luxury hotel in Barcelona for 2 weeks. So... Future Self can look back on the experience with a smile.

paxys · on Sept 30, 2020

Nothing ever gets obsolete once it gains a large foothold in the enterprise space. There's a reason why Oracle and IBM are worth what they are today.

mr_toad · on Sept 30, 2020

> Nothing ever gets obsolete once it gains a large foothold in the enterprise space.

Lotus? Delphi?

ncmncm · on Oct 1, 2020

Both still in very heavy use. In 2014, anyway, every single IBM employee had to keep a Lotus Notes window open. It was hellish.

Dunno if that's changed since Red Hat took them over.

hkmurakami · on Oct 1, 2020

Used Lotus notes as recently as 2010, I am pretty sure it's going strong in my megacorp former employer.

Spooky23 · on Oct 1, 2020

Lotus is all over in government and insurance. As a mail client it is mostly dead, but the apps live on.

sheeshkebab · on Sept 30, 2020

There is a reason, but ain’t bc of their cloud databases...

scarface74 · on Sept 30, 2020

Novell, Word Perfect

bawolff · on Sept 30, 2020

Wordperfect was used in certain industries (legal especially i think) long after it started dying everywhere else. I don't think its an exception to this rule.

scarface74 · on Oct 1, 2020

Yes but it’s dead now.

rrdharan · on Oct 1, 2020

There was a post maybe two weeks from Tavis Ormandy (a tweet) that made the HN front page, about how he uses WordPerfect:

Tavis Ormandy (@taviso) Tweeted: @mkolsek Funny you should mention that, I was recently curious if there are any console word processors. I discovered there's a community who still use WordPerfect 5.1 for DOS. They kinda sold me on it, got it working in DOSEMU. https://t.co/t6j0c1G3w1

tjalfi · on Oct 1, 2020

WordPerfect still has some users.

Last year we recruited an attorney from a firm that still uses WordPerfect for all their documents.

closeparen · on Oct 1, 2020

My school district still runs on ZENWorks.

bpodgursky · on Sept 30, 2020

At the end of the day, all the data warehouses run on SQL, with a bit of customization around ingestion and export. Most of them are backed by object storage (S3/GCS) and those integrations look very similar.

I wouldn't be that worried about lock-in or being made obsolete. Business logic is going to be pretty easy to port between Redshift, BigQuery, Snowflake, or whatever comes next.

tomnipotent · on Oct 1, 2020

> going to be pretty easy to port between Redshift, BigQuery, Snowflake, or whatever comes next.

This isn't even remotely true. Each has unique SQL syntax, and once you have few hundred or thousand queries written using vendor-specific SQL (be it date functions or JSON), it is non-trivial to migrate.

worker767424 · on Sept 30, 2020

> Most of them are backed by object storage (S3/GCS)

Redshift is backed by worker instances that have their own stores in what's basically an EC2 instance. It's definitely not backed by S3 like Athena.

Bigquery and GCS are both built on top of Colossus, but they have different layers in between them.

EwanToo · on Sept 30, 2020

With the newer Redshift ra3 instances you use S3 backed storage with local SSD caching

https://aws.amazon.com/redshift/features/ra3/

AtlasLion · on Sept 30, 2020

Same applies to Teradata vantage on cloud.

bpodgursky · on Sept 30, 2020

Sorry, probably should have been more precise. Meant to say: most users are going to interact with the warehouses via object storage for import and export of data.

Since the object store APIs are almost identical across platforms, it doesn't matter that much which warehouse you actually use for production work. It's something that does massive SQL, imports data from S3, and exports data to S3.

tomnipotent · on Oct 1, 2020

> most users are going to interact with the warehouses via object storage for import and export of data.

No, most are going to be using SQL IDE's to query and export data.

zeroxfe · on Sept 30, 2020

> I would be apprehensive in investing in Snowflake long term purely because their product is highly susceptible to being obsoleted in the next 5-10 years.

This can be said about most products and companies. What keeps them alive is how robustly they capture (and hold on to) the market, reduce costs through economies of scale, and innovate. This specific market is also very rapidly growing.

boh · on Sept 30, 2020

I would think it wouldn't be the same product in 5-10 years.

sjg007 · on Oct 1, 2020

Lots of companies have built on top of snowflake.

soumyadeb · on Sept 30, 2020

People are excited about Snowflake because it can completely disrupt the traditional data-warehouse market.

The legacy players like Teradata and Exadata (from Oracle) really don't scale. Teradata has ~2B in revenue, Exadata is probably in the same range. That's all up for grabs but that's only scratching the surface.

Historically, only transactional data was dumped into the warehouse. Snowflake is selling storage at S3 price (plus you get compression so often ends up cheaper) while they are making money of compute/query. If they can provide all the right query abstractions (SQL, full-text search), in theory all data can be thrown in Snowflake. Yes, tech savy bay area companies can setup their own stack using Presto etc but rest of the world is not like that.

CharlesW · on Sept 30, 2020

[Teradata employee here.]

> The legacy players like Teradata and Exadata (from Oracle) really don't scale.

I get why Teradata gets labelled "legacy", but one of Teradata's main differentiators is scale. Teradata engineers have been tackling incredibly interesting scale problems (on many dimensions of "scale") for 40 years. Teradata has many customers who routinely manage and perform analytics on many petabytes of data.

> Historically, only transactional data was dumped into the warehouse.

That was once true, because initially that was all the data that companies had. However, companies have long since used data warehouses for all kinds of data — sensor data, text, behavioral data, product info/BOMs, vendor info, contract info, etc. — whatever's necessary to run the business.

> Snowflake is selling storage at S3 price…

This is important, but not unique. For example, Teradata's current product has native support for S3 and S3-compatible object stores, and you can query them just like any other database table, join that data with data in high-performance native storage, etc.

soumyadeb · on Sept 30, 2020

Sorry, I didn't clarify well. I am sure it scales technically well but not on cost.

My experience of TD is > 10 yrs and then the multi-node version was substantially more expensive than the single-node version. Also, storage and compute was coupled which meant I had to pay for nodes even if 99% of my data was cold. That's a problem with RedShift too but not for Snowflake.

De-coupling storage and compute was a brilliant move by Snowflake. BigQuery can completely abstracted compute - you don't provision compute and only pay for data scanned. However, it gives you a sense of insecurity around cost - A single bad cron job running a query every sec can blow up your cost (real-life experience). Snowflake provides the best cost/performance tradeoff I have seen.

CharlesW · on Oct 1, 2020

> I am sure it scales technically well but not on cost.

The honest answer is "it depends". Because Teradata is a different beast, per-query pricing can be significantly cheaper than Snowflake with high-volume workloads. It's worth trying both to evaluate cost and performance.

> Also, storage and compute was coupled which meant I had to pay for nodes even if 99% of my data was cold.

Yes, it used to be that everything had to into Teradata's high-performance filesystem. These days, Vantage's native object storage support means that you can keep that cold data in S3.

mdavis6890 · on Oct 1, 2020

>>However, it gives you a sense of insecurity around cost - A single bad cron job running a query every sec can blow up your cost (real-life experience).

Only if you're using on-demand. Instead reserve some slots and pay flat rate. The minimum quantity is very low, and the minimum time is 1min.

chrisjc · on Sept 30, 2020

> Teradata's current product has native support for S3 and S3-compatible object stores too, and you can query them just like any other database table, join that data with data in high-performance native storage, etc.

Storage costs for S3 (or any cloud-provider object storage) are only one dimension of the price. The other is interaction costs which can get prohibitively expensive, for example if you accidentally forget to provide a partition key in your query predicate. Snowflake absorbs this cost if you use internal storage (or just copy into tables).

manigandham · on Oct 1, 2020

Snowflake doesn't absorb the cost because there is no cost.

The benefit of native tables for all columnar databases is that it provides an optimized format with metadata for each column, which is then used to eliminate most of the data retrieval during query time. The more selective your query, the faster the results.

hilbertseries · on Sept 30, 2020

> Yes, tech savy bay area companies can setup their own stack using Presto etc but rest of the world is not like that.

My last company was an early adopter of Snowflake. And we tried Presto first, circa 2016 and Presto was sloooow. We were using vertica at the time and it was so much slower. Snowflake on the other hand was able to perform on the same order of latency as Vertica, which was pretty crazy to us.

soumyadeb · on Sept 30, 2020

That's interesting. I thought Vertica's pitch was real-time analytics for which draditional disk based data-warehouses are too slow.

hilbertseries · on Sept 30, 2020

Vertica is a disk based analyctics database. It was very fast, but also very expensive. And hardware failures could be particularly difficult to recover from.

detroitcoder · on Sept 30, 2020

Vertica was very powerful for us but the separation of compute from storage was a critical feature of SF that motivated us to switch. We evaluated EON mode but SF was too easy. But from performance were also running nightly batch processing on a 22 node Vertica cluster that was taking 6 hours per night to run on highly optimized projections. We threw the same query at SF and 8XL cluster and it finished in about 30 minutes. The biggest difference however is cost where we are spending more on SF than we otherwise would have on Vertica.

hilbertseries · on Oct 1, 2020

SF prices may have raised. We had a 48 node cluster in AWS, which wasn’t cheap obviously and our license was also seven figures a year. And we got a good deal from Snowflake because we were an early large customer. One big advantage for us was that we could auto scale snowflake based on our load to save more money.

kwillets · on Sept 30, 2020

So why did you switch from Vertica (or did you)?

hilbertseries · on Sept 30, 2020

Vertica was too expensive, their licensing fees were terrible at our scale. Operations were also awful, if we had two nodes go down we were always in trouble. We built an EBS solution that made it a little better, but it still wasn’t tenable long term.

kwillets · on Sept 30, 2020

Good info -- thanks.

Their Eon mode product is very similar to Snowflake, with S3 storage and semi-dynamic compute nodes, but they may not be as slick at marketing it or providing a UI.

detroitcoder · on Sept 30, 2020

Yes Eon is similar BUT it is not nearly as turnkey. Agree it is not well marketed.

bydlocoder · on Oct 2, 2020

My previous company still uses Vertica (on-prem) but it still wasn't as fast as it should be. Trying to hire anyone for an operations team was an enterprise in futility, since it's a rather niche tech. Maybe it would've been better in the cloud, but here large companies are cautious about vendor lock-in and after 2014, potential impact of sanctions.

ETHisso2017 · on Sept 30, 2020

>>>Yes, tech savy bay area companies can setup their own stack using [insert open source tool here] etc but rest of the world is not like that.

It sounds like a ton of these cloud infra companies have this product strategy (datadog, snowflake, elastic, hashicorp, etc)

Spartan-S63 · on Sept 30, 2020

When you look at cloud infra companies like that, their competitive advantage is in quickly being able to ingest data and make it accessible, so an off-the-shelf solution likely doesn't exist for their particular use-case. Also, since that operation is your competitive advantage, you should look to in-source it rather than reach for a COTS solution.

bawolff · on Sept 30, 2020

Ironic that products named exadata and teradata (guess we skipped petadata) dont scale.

baskire · on Sept 30, 2020

Smaller companies with presto won’t get the same performance benefit.

Snowflake & BigQuery get the ability to have multiple customers on a large cluster.

It’d be cost prohibitive for a single smaller customer to have all that compute sitting idle for a few queries per minute.

Storage also benefits as snowflake/ BQ can shard your data across a much larger array of disk giving you better IO.

Think is it faster to drive a car 100 ft starting at 0mph and flooring it. Or to drive 100ft with a car which starts off doing 120mph

TuringNYC · on Sept 30, 2020

Almost all the big companies I worked for had a "database gang" -- a database group which, in the name of centralization, forced you to bow to them to get anything. New DB? bow to them. More nodes? bow to them. Reboot? bow to them. The internal budget "prices" would be off the charts unbelievable.

It makes sense to centralize, but only at a certain cost. Beyond that cost, it is better to just de-centralize because not every project can spend 4-5 months of meetings to spin up a DB.

The cloud changed this because it became an OpEx discussion and something you could spin up on your own. For non-production workloads, it comes especially obvious to do this.

closeparen · on Oct 1, 2020

The database gang at my company offers most things through self-service tooling a sub-24-hour Slack queue for most others. Meanwhile the "spend the company's money" gang is way off in some ivory tower, and I'm pretty sure a purchase order would take 4-5 months of meetings (heavily involving the database gang). My director has a budget for headcount and budget for travel & entertainment which is divided among the Sr. Managers and Managers, but I think we'd have to reach the CTO or CTO + 1 to get the authority to spend money on products and services.

Baffled both by how bad your internal infrastructure is and by how easy it is for you to buy stuff.

stingraycharles · on Sept 30, 2020

This still in no way answers why Snowflake is so valuable, though. I completely understand our argument, and I agree with it; I just don’t think the article’s arguments are anything else than ex post facto rationalization. When they mentioned NPS I almost snorted my coffee, that metric can be gamified in any way you want it.

stcredzero · on Sept 30, 2020

>> Almost all the big companies I worked for had a "database gang" -- a database group which, in the name of centralization, forced you to bow to them to get anything. New DB? bow to them. More nodes? bow to them. Reboot? bow to them. The internal budget "prices" would be off the charts unbelievable.

> This still in no way answers why Snowflake is so valuable, though.

It explains it to a T! You have something you want, but internal company politics and territoriality keep you from getting it the way you want. An outside provider lets everyone get it for a bit of cash. It's basically the same play as Salesforce. It's not some kind of technical moonshot. It has to do with a modicum of technology delivered by a 3rd party who can avoid all of the internal friction.

The next founder who can think of this kind of play, then execute on it, will be the next Salesforce/Snowflake, and will probably have the ear of the same investors!

stingraycharles · on Sept 30, 2020

What I meant was why Snowflake specifically is so valuable. BigQuery, Redshift or any other cloud db would fill this gap as well. Why Snowflake?

chrisjc · on Sept 30, 2020

Snowflake fanboy here who can't really answer your question about why it's so valuable. Not sure I can rationalize the current value. Not sure I think it should be valued this much.

But I can probably answer why Snowflake instead of Redshift (sorry, not too familiar with BigQuery)...

First of all it's cloud-provider agnostic so you can set up Snowflake on any or all of the 3 major cloud providers as well as set up replication between them directly or indirectly through their data exchange. Probably the most powerful feature is the way that Snowflake has the ability to scale (up or down) compute (vertically and horizontal) and storage independently of each other. Furthermore, you have the ability to scale compute down to nothing, and spin up "instantly" when the demand arrives. On top of all of this there is an incredible selection of functionality that i could go on and on about.

texasbigdata · on Sept 30, 2020

Honestly, if you don’t mind, please do. I believe the decoupled elastic compute / storage advantage has been well described; what are the more granular or technical things you like?

Edit: seems you’ve already answered this :) https://news.ycombinator.com/item?id=24265856

mr_toad · on Sept 30, 2020

> What I meant was why Snowflake specifically

Marketing. Vast amounts of marketing.

stcredzero · on Oct 5, 2020

Not just that. Being the neutral 3rd party who can overcome intra-company roadblocks has real value. I'm not sure Redshift, etc, could be as well focused on being that.

_vbdg · on Sept 30, 2020

It's a chance for investors to get in on the next big thing and invest specifically in data warehousing. No one can put his money directly into BigQuery or Redshift.

Edit: why are so many people downvoting this? Is there some other reason for Snowflake's valuation (aside from tech bubble playing a role)?

TuringNYC · on Sept 30, 2020

I find Snowflake much easier to use than BigQuery, Redshift. It is also cloud-service-provider agnostic. So your only hook at that point is ingest + any snowflake-specific SQL (and obviously security migration etc.) So for retention, they compete on UX rather than walls.

W/r/t value, the idea is a disproportionate of egress from Oracle, Teradata, etc will end up at Snowflake, hence huge TAM, SAM, and SOM.

chrisjc · on Sept 30, 2020

> it is better to just de-centralize

Until you want to join and then all of sudden it's not your problem. Then you end up with a "gang" of cowboy analysts running ad-hoc data-dumps against operational datastores affecting production uptime and stability only so that they can do a lookup between the multiple (source_table_column_count * source_table_row_count) sheets in their uber excel document.

I'm all for decomposing the monolith as long as you have a plan for recomposing when it's necessary.

TuringNYC · on Sept 30, 2020

Yep, thats why I said centralizing makes sense -- up until some price point. Beyond that point, you might as well just spend the money on re-composing when you need.

emn13 · on Oct 1, 2020

It's not just the technical costs of recomposing to achieve a jaoin; it's also diversity of this kind makes long-term maintenance a nightmare, and moving people/functionality/whatever between apps almost impossible.

If everybody cowboys their own storage, you're risking building up a legacy cruft that's very hard to work your way out of later.

The costs if recomposing later can be prohibitive, if a few particularly poor choices were made early on, and the hidden costs of inflexibility can bite too.

There's nothing wrong with outsourcing storage; that's not the issue - the issue is the culture in which it's easier to just not talk to the rest of the company (assuming the company is small enough to have any kind of cohesion in the firs place). If it's too expensive to talk (or even pick from a few common defaults) beforehand, how are you ever going to interop later on? You're getting the downsides of a large organization without the upsides.

chrisjc · on Sept 30, 2020

Fair point! Sorry I missed that.

hitekker · on Sept 30, 2020

> The internal budget "prices" would be off the charts unbelievable.

I find that this occurs when an infrastructure team considers itself a "platform". The only supplier of an asset that everyone else demands can set the "price" of the asset as high as they want.

P00RL3N0 · on Oct 1, 2020

This isn't actually how monopolies work. Theoretically, monopolies cannot just charge what they want for a product, unless demand is perfectly inelastic. i.e. At some point, people (consumers) will stop paying or move to a substitute.

TuringNYC · on Oct 1, 2020

I've seen multiple corporate re-orgs (Fortune 100) and you are right -- internal monopolies cannot just charge what they want. Eventually, internal customers get fed up and "revlot" -- then you have a corporate re-org where IT infra/services get distributed across business lines or lower.

Eventually that blows up too -- re-composing data at the parent level (e.g., for quarterly financial reporting) becomes too exhausting and the company decides to revert to centralized services.

jng · on Sept 30, 2020

Can someone summarize Snowflake's unique technical value? I'm quite familiar with both Redshift (I would summarize it as Postgres adapted to sharded, columnar OLAP functioning) and BigQuery (there is a famous paper explaining the architecture). Also with more traditional databases such as MySQL, PostgreSQL, SQL Server, and columnar OLAP databases like Vertica. I explored the website a little bit, but couldn't construe a clear statement of the technical architectural value. Some of the comments here are valuable, but I'm missing a clearer "big picture" overview. Thanks!

malisper · on Sept 30, 2020

(I'm the author of the post.)

I've worked with a large Postgres cluster before (~1PB of data) and have been experimenting with Snowflake recently. I would say there's two clear technical advantages of Snowflake over Redshift. First is there's no maintenance when using Snowflake. You just signup for a Snowflake account, upload a CSV, and you can start querying the data. This is in contrast to Redshift where you have to manually provision a cluster, resize it as you add more data, etc.

The second is their pricing. Storing data in Snowflake costs the same as it would cost to store in S3. The tradeoff is you also have to pay based on how long your queries take. Depending on your workload this can result in a massive cost savings. If you access only small amounts of your data infrequently, it's like you're storing the data in S3 and you only have to pay a bit more when accessing the data. This is in contrast to Redshift where you have to pay for the full cost of the cluster regardless of whether you are actually querying the data or not.

Snowflake also has a ton of quality of life improvements compared to Redshift. One really nice thing is you can change the amount of compute used for any individual query. For example, if you have one specific slow query, you can allocate 4x the compute for that one query, pay 4x as much while the query is running, and get the query to run 4x faster (ultimately costing you the same amount as if you used 1x the compute).

One neat thing is there's ultimately only one "Snowflake instance" in each region. Everyone's tables are in the same instance, but you can only access the tables you have permission to access. This allows you to easily share data between different Snowflake accounts. You can store the data in one account and query it from another.

So the core value proposition is really strong and it also has a bunch of extra features that are all pretty useful at the end of the day.

This post focused on Snowflake solely from a business point of view. I'm considering writing another one that focuses on it from a technical point of view.

jng · on Sept 30, 2020

Thanks for the details, very useful. Please write that post, I'm sure it will make it to the front page here :)

tixocloud · on Oct 1, 2020

Thanks for the great clarity between the 2 tools. Do you have any thoughts on Dremio/AtScale and whether they complement or replace Snowflake?

iblaine · on Oct 1, 2020

Snowflake owes much of its performance benefits to "micro-partitions" [1]. BigQuery is a worthwhile comparison. MySQL, PostgreSQL, SQL Server, and Vertica are not close equivalents.

[1] https://www.infoq.com/presentations/snowflake-automatic-clus...

kwillets · on Oct 1, 2020

That's an interesting look at their internals; I wasn't aware of their dynamic sorting feature.

At read time, though, Snowflake's zone map is the same as Redshift's and Vertica's; you'll see similar pruning for many queries.

Redshift however doesn't prune during joins, which is a huge deficiency.

Snowflake looks more flexible about getting the data into its final ordering.

jacques_chester · on Sept 30, 2020

FWIW, if you're looking at PostgreSQL-at-humungo scale, there are a few options around. TimescaleDB, Citus (which Azure now offers as a service called Hyperscale DB) and there are others I always feel terrible for overlooking.

I work for VMware and get along well with folks who work on Greenplum. It's still doing massive workloads with massive amounts of data for lots of customers, has the ability to operate over blob stores with predicate pushdowns and recently merged up to parity with the PostgreSQL 12 upstream[0]. It's the fruit of a six year effort to return to the upstream from a heavily modified fork of 8.3. A truly monumental effort.

[0] https://github.com/greenplum-db/gpdb/commit/19cd1cf4b68faff2...

soared · on Sept 30, 2020

Snowflake’s value is that they provide the same technical products as amazon/google/etc, but are not amazon/google/etc. Some shops like buying into the google ecosystem, some are afraid of vendor lock in.

Probably other things, but many companies exist just to be alternatives to faang. If you’re good enough, you surpass that intention.

jng · on Sept 30, 2020

I saw them a year or two ago positioning themselves in contrast to Redshift and BigQuery. I though "these guys are building something for Microsoft to acquire" (my thought was, something with a more modern OLAP architecture than SQL Server, which they can offer via Azure). Naive me, they were so much more business savvy than that...

sjg007 · on Oct 1, 2020

The snowflake team came from Microsoft.

kwillets · on Oct 1, 2020

SQL Server had too much political pull within the org for such a deal to succeed.

mathraki · on Oct 1, 2020

The thing to keep in mind is that leadership in data management lasts about 5 years, definitely less than 10. An (incomplete but representative) timeline:

- late 90s: Early DWs like Redbrick

- early 2000s: Oracle, Teradata

- late 2000s: Shared-nothing Data Warehouses (Vertica, Aster Data, Greenplum) - bought up by Teradata, EMC, HP

- early 2010s: Hadoop and Hive

- late 2010s: Redshift and cloud DBs

- early 2020s: Snowflake

- late 2020s: probably something else...

All these technologies felt they were here to stay at the time, but they didn't. Will Snowflake be the exception? Maybe, but the odds are not nearly as great as their valuation implies.

zwieback · on Sept 30, 2020

I guess this quote from the article sums it up: "There was so much hype, my mom, who doesn't even know what Snowflake is, decided to invest in Snowflake."

joshdick · on Sept 30, 2020

"Taxi drivers told you what to buy. The shoeshine boy could give you a summary of the day's financial news as he worked with rag and polish. An old beggar who regularly patrolled the street in front of my office now gave me tips and, I suppose, spent the money I and others gave him in the market. My cook had a brokerage account and followed the ticker closely. Her paper profits were quickly blown away in the gale of 1929."

snird · on Oct 1, 2020

This article explains why Snowflake is a great company. And there's no doubt about that.

But it does not explain why it is valued at 60B$. Or if that value makes sense.

To put a price on a company, I need to know in projection for the next 5-10 years what income they will generate to shareholders.

The fact that they are a great company does not guarantee they will generate an income to shareholders that value them at 60B$.

This is how bubble starts. Overvalue a great company, then overvalue fair companies just not to miss out and "in comparison" with the great companies, and in the end, overvalue nothing (if Tesla is great, then Nikola was the nothing).

Cthulhu_ · on Oct 1, 2020

That assumes the stock market is logical and rational, but you know it isn't. Stock prices are mostly independent from the underlying company's actions and performance, and instead vary depending on investors. The bigger investors have so much money and influence (e.g. operating multiple large 'financial news' / stock news websites) they can sway the prices wherever they want.