I can't believe that they will succeed in the long run as an independent player ...

bpodgursky · on Aug 24, 2020

Snowflake is wildly better than Redshift, no matter how you want to look at it -- integrations, cost, performance, etc.

Like, in a sane world I agree with you -- Redshift SHOULD have a crazy competitive advantage. But somehow they've been unable to execute on that goal for half a decade, and I don't see that changing quickly, given Snowflake's mindshare and growth.

dataminded · on Aug 25, 2020

I agree with you.

Snowflake is better. Redshift has been really slow to execute. AWS is doing the world's worst job of articulating whatever vision they have for analytics. AWS's message is laser-focused on infrastructure folks and machine learning engineers (not analyst, data scientist, not absolutely anything else).

The higher you go up the stack, the slower and less meaningful, AWS's solutions feel. There is a fantastic job opportunity out there for someone to reconcile AWS's data analytics offerings. They have so much upside.

I'm still not betting on Snowflake winning a direct competition with their primary supplier. For the enterprise and the highly regulated: Redshift is good enough, already there, and they don't NEED the efficiencies that Snowflake makes available.

1290cc · on Aug 28, 2020

Redshift is an onpremise piece of software that was converted into a cloud platform (acquired by AWS). Snowflake was built from day 1 as a cloud platform with awesome big data frameworks as its internal architecture. Its very hard for Redshift to rearchitect itself in the way Snowflake was designed from the start because they need to continue supporting existing instances and create an entirely new product.

hodgesrm · on Aug 26, 2020

You don't need to own the public cloud infrastructure to build a better product.

Example: you can play inside ball on storage infrastructure costs to get a 2x cost benefit at the expense of a lot of extra engineering. Better DBMS storage organization, which is available to any implementation, gets you 10x (or greater) improvement. Which would you rather have?

In fact, products like Redshift don't even really game the infrastructure prices. Costs to customers are comparable with Snowflake for equivalent resources as far as I can tell. They both charge what the market will bear.

choubix · on Aug 26, 2020

Hi, what yo are saying is cryptic to me would I would love to understand. would you mind breaking it down for the financially literate but tech handicapped person I am please? thanks much!!

hodgesrm · on Aug 26, 2020

Sure! Sorry to be so obscure, it was not a good explanation. To take the above example, let's say you have a database with 1TB of tabular data in Amazon.

1. You start out storing it on Amazon gp2 Elastic Block Store, which is fast block storage available on the network. It costs about $0.10 US per month per GB, so that's $102.40 per month.

2. Data (sadly) has a habit of getting destroyed in accidents so we normally replicate to at least one other location. Let's say we just replicate once. You are now up to $204.80 per month.

Now we have a couple of ways of reducing costs.

1. We could make the block storage itself cheaper thanks to inside knowledge of how it works plus clever financial engineering. However, the _most_ that can get us is about 5x savings, because prices for similar classes of storage are not that different. The real discount is more like 2x if we want to make money and be reasonably speedy. You likely have to do engineering work--like implementing blended storage--for this latter approach, so it's not free. So, we're back to $102.40 per month.

2. Or, we could build a better database.

2a.) Let's first build a database that can store data in S3 object storage instead of block storage. Now our storage costs about $0.02 per GB per month. Plus S3 is replicated, so we can maybe just keep a single copy. We're down to $10.28 per month but we had to rewrite the database to get it, because S3 behaves very differently from block storage and we have to build clever caches to work on it.

2b.) But wait! There's more. We could also arrange tabular data in columns rather than rows, which allows us to apply very efficient compression. Let's say the compression reduces size by 90% overall. We're now down to just $1.03 per month. Again, we had to rewrite the database, but we got a huge savings in return, like 100x.

The moral is that clever arrangement of data just about always beats financial shenanigans, usually by a wide margin. The primary reason that Amazon has done well in data services like Redshift and Aurora is partly that they have been extremely smart about data services, not any inherent advantage as platform owners.

Edit: fixed math error

manigandham · on Aug 24, 2020

For them to be an attractive acquisition target means they are succeeding, otherwise what would a cloud vendor gain from buying them?

theflork · on Aug 24, 2020

talent, patents, less competition