In 2017 Cloudflare had an HTML parser bug that caused encrypted HTTP traffic to be leaked. Any website served by Cloudflare was vulnerable to having all of its traffic leaked into the HTML body response of the website that Cloudflare proxied. Given that Cloudflare is the proxy service for 80% of websites that use proxies, this affected a significant portion of the internet.
Cloudflare served private HTTP traffic in response bodies, meaning that website results contained cookies, session data, encrypted traffic, all personally identifiable, and because it was served as response bodies, it was *indexed by search engines*, not to mention anyone else who was scraping websites during the time of the incident. It included credit card information, frames from videos, PII, the works, all linked to individual users.
This was ongoing for *months.*
Anyone savvy could use this information to hijack accounts, scrape personal information, view private browsing habits. Even when Cloudflare publicly announced it (and tried to blame others) when they thought they had cleaned up most of the data, you could still easily use search engines to find people's personal information by searching for the Cloudflare header strings that started the leaked session information.
Many countries have legal policies around data breaches, including required disclosure policies and penalties. In the greatest blind eye turn of the history of the internet, Cloudflare managed to get away with a single blog post, and no other penalties. https://blog.cloudflare.com/incident-report-on-memory-leak-c...
"More importantly, AWS itself is locked-in to its integrated approach: the entire service is architected both technically and economically to be an all-encompassing offering; to modularize itself in response to Cloudflare would be suicidal."
Eh, somewhat. AWS is already modular in a lot of ways. You want S3? You got it, no matter where you are. (We're talking after them doing some sort of fee drop here.) You want to run exactly one EC2 instance? No problem. You want a message queue? You don't need anything else. You can integrate it with the notification service but it's optional.
Sure, some of their services are integrated, but a lot of that integration is just "this service pulls from S3 and writes to S3", not massive integration at every level.
There is some stuff that is deeply tied in, yeah. But it's not like every single AWS service is deeply tied into half the other ones and the moment you open an EC2 instance you also are buying into a dozen other services. (It may feel like it if you put together a network and override the default block storage, but that's really just giving you knobs that are simply preset elsewhere, not really "lockin".) A lot of it is already pretty modular.
I think you are missing the point. AWS is modular WITHIN AWS. It's not a decentralised modular system. ie, it can't play well with existing companies because of the hefty egress prices they charge. The point in the article is that, maybe, if you take away the egress fees that opens up a new world where services of different companies can play together nicely (instead of waiting for AWS to implement something), and that creates a new form of innovation that we can't quite predict that could possibly compete with AWS
You forgot about the egress fees. Try running BigQuery on a (big) dataset stored in S3. You probably wouldn't even think of that because of how stupid that is at the moment.
The big 3 have gotten away with crazy egress pricing for too long - I'm really hoping that Cloudflare's R2 puts a huge spotlight on egress bandwidth price gouging by AWS, Azure and GCP, and further hoping that they reduce pricing in response.
With the huge margins they must have on egress bandwidth, I'm not holding my breath though.
Like others have pointed out, it’s not about the margins on egress. Cloudflare claims that AWS marks up egress by 4-17x depending on the region, but this isn’t about making money off egress directly. It’s about creating a moat around all AWS services. If you’re already using some services inside the moat, you’re strongly encouraged to use the other services within the moat.
If your data is in S3 or DynamoDB, egress fees will encourage you to process the data within AWS. You’re not going to use BigQuery for that. If you want to add a search index on that, you won’t go shopping around for search products on other clouds, you’ll likely use AWS elasticsearch. That’s where AWS makes money - through this soft lock-in.
They’ll survive and maybe even thrive if they lose the revenue from egress. But who knows what’ll happen if the moat is destroyed? Every AWS service would need to compete on merit with every service on other clouds. They won’t be chosen simply by default.
Have you ever had to pay for your own business internet connection that is offered by ISPs for the purpose of serving unlimited requests? It is not cheap, at all. Your home and/or business internet connection is a joke compared to that type of service.
I don't think they have huge margins on egress at all. There needs to be some incentive for customers of cloud services to minimize bandwidth usage. It is a limited resource.
There are, literally, dozens of hosting providers that bake unlimited egress into their offerings - still. Bandwidth is cheap and has only gotten cheaper. Remember that Amazon doesn't always need to pay for Internet access. It's very likely they participate in Internet exchanges tied to other large providers. Given Amazon has a massive footprint they likely participate in these all over the globe to reduce congestion on their Internet connectivity.
They very much have massive margin on egress. Given some of the cost comparisons floated today comparing R2 to S3 egress AWS is likely hitting 1000s of times (likely more) their return on the actual bandwidth cost they pay for month over month.
> Have you ever had to pay for your own business internet connection that is offered by ISPs for the purpose of serving unlimited requests?
Come on, Amazon is not serving traffic through a Comcast business connection, they're peering directly with other large operators for free or for next to nothing.
AWS bandwidth is definitely over priced however the comparison to an unmetered colo isn't quite fair. What happens when that unmetered colo gets hit by a 100G ddos attack? Everything you have on it goes down?
but regardless, the EGRESS charges are what are absurd. The only logical reason for charging so much for data exiting is to make sure that you can't practically leave the AWS ecosystem.
We have a customer on Azure running a bunch of medium-sized websites. Their mean outbound throughput is just 25 Mbps, but they're paying on the order of $3K/month all up for that Internet egress. (Not just bandwidth but some additional overpriced services on top like App Gateway.)
I have had to. I pay $400/month for the following:
- A 42U cabinet
- A 15A 120V circuit
- An unmetered 1Gbps IP transit link
All at a proper datacenter, namely Hurricane Electric's fmt2 facility. Includes a /29 of IPv4 and a /48 of IPv6, allows me to announce the /24 and /36 I own, and there's a free internet exchange onsite which I have a 1Gbps port at.
1800 Watts for a full rack? That’s bordering on useless. You can run maybe two real servers on 1800 W.
You need to provision enough power for simultaneous startup after a power outage, unless you have some really smart PDUs and automation. We have a 10 year old DC with 30A@208V per rack and we have to leave racks half full because modern servers are so power-dense.
Gatekeeping server power!? We run 1U on 250-350w (230v).
So yes, odd power allocation but servers are real way before 900w. We're running dual Xeon Gold low core count machines (because of Microsoft licensing) and they're pretty decent.
Absolutely, I'm with you. I just meant that 900w isn't where "real servers" start.
If you're at the point where you're experiencing issues with scalable power on you could disable that feature and script ipmitool to power the servers on once ipmi is reachable but in a queue. Could help you eek out a bit more density.
You could possibly run this fictional daemon on your management switch :)
My 900W per server number comes from experience with “hyper-converged” infrastructure where each 2U node has two filled CPU sockets, gobs of RDIMMs, and is stuffed with flash-based storage.
I think this is the most common “enterprise” datacenter server type in 2021, mostly due to licensing constraints from VMware/Microsoft/Red Hat/Oracle/etc. Such servers give the most “bang” per dollar when licensing costs are included.
We're running VMware with vSAN on 1u nodes with 2x Xeon Gold 8 core cpus @ 3.6ghz base, something about the Microsoft SPLA licensing as we're mainly virtualizing Windows machines. 384 GB RAM per node, and we're just pulling about 400.
To move bytes out of network, you need more than a transit contract. You need routers, you need people to operate them. All this is absent from Cloudflare's blog post. With the example provided on South Korea, the conclusion should be that the egress fees are only marginally infuenced by the transit cost.
The blog post on egress speaks to that - but the true scale at which the major clouds buy hardware and deploy it - just changes the dynamics here. We're (collectively) not used to seeing products marked up to such an extent.
The $/Mbps prices there - about $6k/Tbps in the US - are based in reality and are absolutely reflective of what it costs, hardware, software, redundancy and all - for an effectively 1Tbps pipe.
If you're pricing as $/GB on top of that capacity and keep it reasonably heavily utilized—which can be hard given diurnal demand—the margins only get better! Products like Glacier (S3) exist to fill exactly those gaps.
(Note: currently work at Cloudflare, but wasn't part of this blog and I've been around a bit...)
Yet CloudFlare is able to do it for free up to a point, and at a lower-than-AWS fee after that. It might not be the exact figure but there is most definitely a huge markup there.
>Try running BigQuery on a (big) dataset stored in S3.
Seems like a pathological use case to run a query engine on one cloud provider datacenter (Google) against the disk storage at another cloud provider (Amazon).
Even if egress were $0, I still wouldn't want to do that. I want queries to run as fast as they can and the WAN link bandwidth is opposed to that.
Is there anything about BigQuery that would compel anyone to do that instead of just using AWS RedShift?
My experience in most enterprises is that we don't get to pick all the tools and sometimes we don't even get asked our opinion. Recent case: The data team picked S3 for storage, and they picked Power BI for analysis. Don't ask me why they didn't ask my opinion at the time (What would I know, I'm only the principal cloud architect here).
Things like operational overhead don't always get a look in when a team has convinced someone with the purchasing authority that tool X is going to solve all their problems. Even if the entire org has zero experience with it and it's going to have flow on effects.
A recent example at one of my customers was a team deciding to outsource a platform to the provider. (Outsource, not SaaS it's a managed service hosted in AWS). I told them the network design and AWS build on our side to join the two would require significant effort and they said that's fine. Now we've spent almost their entire budget for the move on just working out how to connect their VPC to ours (there are some legislated controls we had to put in place and the vendor architects were less than helpfull). Of course it's all my team's fault because we are the ones who say "you can't just plug the two together" and it would be much better if we had a "can-do attitude like the other team instead of naysaying all the time."
Keep in mind that most of the big three regions are located in the same metro area, often times right across the street from one another. They have private network peering that circumvents WAN circuits, your data is literally transiting between ethernet ports on the same switch.
So generally speaking, latency and bandwidth between services is not a significant concern. It's all about egress billing.
It can make a lot of sense. Store data at the provider that can do it best, analyze your data at another provider with a superior product.
You'll find data centers of all major providers within a few miles of each other in at least 10 locations around the world. Latencies are <5ms and the links between those data centers are cheap and can provide huge bandwidths (even though they don't want you to know this and still charge huge prices).
So from both a technical and an economical perspective there should be no reason why you can't shift terrabytes of data daily between GCS, AWS, Azure, Hetzner, OVH, Cloudflare etc. Only artificially high egress pricing set by the three biggest providers keeps people from doing that.
In Europe with OVH and Hetzner that's exactly what a lot of people already do (from my experience), also nicely visualized in their Weathermap (look at Frankfurt) [1]. These two providers are tiny compared to AWS and yet the 400gbit/s links between them are utilized ~50% pretty much 24/7. And from my experience I can say that there are quite a few use cases where mixing these providers is cheap and easy.
RedShift isn't evening even competitive within AWS! Snowflake.com runs on top of AWS and provides a better product than RedShift as a third-party product which means they don't get the same hardware avantages RedShift gets. If you use RedShift to any real degree you owe it to your internal users to talk to one of their salespersons.
Heh ... I did something similar without thinking about bandwidth.
I had a huge corpus of data and ran some instances that were basically downloading+parsing+summarizing data for a couple days. Again, I didn't think of bandwidth as they were EC2 instances querying S3 objects, what could go wrong, it's Amazon to Amazon, right? Wrong.
Bill came up in the 10,000s USD range ... fortunately, it happened during a trial period where I could spend a lot of money for ~three months.
I moved my stuff into a small-ish cluster of unmetered VPS and the whole thing is 100x cheaper.
They don’t explain this, but the naming of AWS services indicates how modular they are. Services prefixed with “Amazon” are supposed to usable independently, while services with the “AWS” prefix need to be used inside the ecosystem.
Not on their backend. S3 goes down, nearly everything else does. We found out last year if Kinesis has issues, so does a bunch of other internal AWS services.
I was with the article until this final point about integrated solutions. I see the argument around data lock-in, but integration is typically non-trivial unless it's coming from the same provider. Plus, I'm certain cloudflare would equally love to provide more services in an integrated way.
AWS Polly is an example of a service that is tied into others.
To do anything more than a demo, Polly requires S3. To get a notification of a completed TTS synthesis, you must use AWS SNS. To get logs you have to use their logging service.
I suspect the primitives, like these required ancillary services, are the ones not tied in to each other. But how could they? (AWS probably has found ways)
For my project using Polly, I didn’t care. It was kind of interesting to explore AWS some.
But I’d guess a lot of folks might not want cloud service primitives. They want cloud products.
If they must use primitives to enjoy cloud products, they should want flexibility at least to not be stuck with pricing and feature set on primitives if they don’t compete.
I've had relatively bad experiences with Cloudflare's DNS solution. Here are a couple of examples of pain-points: 1. You can't set NS records for an Apex domain registered through cloudflare (even at more expensive service tiers). You can delegate management of subdomains with NS records but only if you shell out for an expensive plan $$$. 2. Cloudflare performs cname flattening for cnames by default. This prevents you using cname based dns validation for third parties. One such example is certificates in Amazon Certificate Manager which performs cname lookups. You can disable cname flattening but you have to shell out for a more expensive plan $$$$.
Thanks for the feedback. I don't think we're intentionally charging to turn off CNAME flattening. May be that we just don't expose that to lower plans because we're worried it'll confuse people. Raised the feedback with the team. In most cases, CNAME flattening is a significant win on performance. But understand when you'd want to do it in some cases.
Will also check on NS delegation. Again, my hunch is that we only charge for it because it's something that less sophisticated users we worry would get themselves in trouble messing with.
> May be that we just don't expose that to lower plans because we're worried it'll confuse people.
It does. IIRC GitLab Pages used to have docs saying to set a CNAME for your domain without warning about doing it on the apex domain. I'm not sure if GitHub Pages was any better either.
It's only confusing the first time you mask your MX records. LMAO.
To me, my biggest pain points is I cannot set NS record for a sub domain.
My use case is this: I had a certain subdomains where I want to use LetsEncrypt DNS with DNS validation. I don't want to give the whole domain to the auto renewal script. With another DNS provider, AWS Route53 for example, I can easily create another zone for that subdomain say blog.domain.com and set NS record on blog to that zone. Then create an API key with only privileges to manage that sub zone. I cannot do that with CloudFlare though.
Agree on Cname flattening. It cause some issue for my email forward service in the past.
So we have customer use githb page and set CNAME on apex domain. Then they add a MX record for apex domain. CloudFlare UI allow them to do that. But upon resolving won't return MX records for the apex. So my customer aren't able to finish setup. Eventually we have to set the A Record on Apex to an IP.
> You can delegate management of subdomains with NS records but only if you shell out for an expensive plan $$$.
Are you sure? I have a free site where `in.example.com` is delegated to Namecheap's FreeDNS so I can use it for DynDNS on some devices that don't support Cloudflare's newer (fine grained) API tokens.
Aren't Cloudflare Workers a very specialized kind of lambda that's severely resource constrained and whose runtime is capped at 15ms?
If anything Cloudflare Workers compete with Lambda@edge, but it's very disingenuous to compare them with AWS Lambdas and it's completely absurd to claim they eat anyone's lunch.
Cloudflare Workers's usecase is extremely limited and specialized: run a script comprised of a couple lines of code that do not do much at all right at the edge. We're talking about things like adding a response header. Even then they are immediately killed if pretty much they don't exit immediately.
The all-inclusive Lambda workers are limited to 50ms of actual CPU runtime and can execute forever (i.e. hours) for IO bound workloads, as long as you stay below the 50 network requests per execution. And for that they cost $0.50/million, have unlimited in/egress bandwidth and free in-DC caching.
But they also have a more AWS-like pricing option that’s about 20% cheaper and charges per request, per GB-hour (for runtime, not actual CPU usage) and for bandwidth with a maximum runtime of 15 minutes.
They also have Durable Workers which provide you global singleton persistent functions for stateful architecture.
If you haven’t had a look at CF’s serverless stuff for a while, it’s worth a look again.
> If you haven’t had a look at CF’s serverless stuff for a while, it’s worth a look again.
That's especially true if your workload makes sense for the all-inclusive Workers. I evaluated only the pricing a while back and Cloudflare Workers are far more attractive than anything else in the market IMO.
With AWS and Azure, it's really hard to calculate just how expensive things are going to be. I'd say it borders on impossible without just running your workload for a bit and waiting for the bill.
With Cloudflare Workers, it's dead simple. As long as your Worker runs in <50ms, it costs $0.0000005 per run. I can tie that directly to (page) hit counts and calculate costs with very little effort.
For my own reference point, I ignored the fixed cost per run, which is actually more expensive for Lambda@Edge, ignored the variable cost per run, which actually has minimums for Azure Functions, and calculated the egress cost per byte.
Assuming they use GB and not GiB for egress, it's $.09 / 1,000,000,000 = $0.00000000009 per byte. Now take your Worker cost and divide it by the per-byte egress cost and that's 0.0000005 / 0.00000000009 = 5,555.
That's <6KB of egress for the same price as a Worker run on Cloudflare. Even if AWS and Azure started offering free Lambdas / Functions, it would still be a bad deal if Cloudflare Workers meet your technical needs.
> With Cloudflare Workers, it's dead simple. As long as your Worker runs in <50ms, it costs $0.0000005 per run.
I'm not sure you're looking at the problem right.
I mean, if you write your AWS Lambda code to do the same thing Cloudflare does to their workers in their Bundled Requests pricing model and automatically kill them if they reach 50ms, you also get a dead simple way to know what you pay per request.
However, Cloudflare also charges per request and per execution time their Unbound requests pricing model, which leads us pretty much to AWS Lambda's pricing model.
Cost for log processing is $0.50 per GB and the minimum size of logs (just for the START/END/REPORT lines output by Lambda itself, before you start logging any of your data) is about 260 characters (or 260MB/million == $0.13/million).
Honestly, unless you have gone out of your way to implement something that isn't CloudWatch for logs (and there's almost no documentation on how to do that), its not hard to get an extra 5-10KB ($2.50-$5.00/million to process by CloudWatch) of logs per request.
Like AWS egress charging, CloudWatch Logs can quickly dwarf the cost of using the services themselves.
The really are. By Cloudflare's ow docs, Cloudflare workers are just scripts that are designed to be executed before a request hits the cache.
Quite the far cry from what AWS Lambda offers, which is a generic compute platform that handles both long-running batch jobs and handles events, and can be invoked any way that suits your fancy (HTTP request, events from other AWS services, AWS SDK).
At most, Cloudflare workers are comparable with Lambda@edge.
> The all-inclusive Lambda workers are limited to 50ms of actual CPU runtime and can execute forever (i.e. hours) for IO bound workloads, as long as you stay below the 50 network requests per execution.
You are right. According to Cloudflare's docs, Cloudflare Workers are capped at 10ms CPU time on their free tier, but their Bundled Usage Model plan bumps the CPU time limit to 50ms. There's also Cloudflare's Unbound plan which not only charges per request but also adds charges for duration instead of CPU time (i.e., also charges for idling time when waiting for responses) and that's bumped up to 30s.
> But they also have a more AWS-like pricing option that’s about 20% cheaper and charges per request, per GB-hour (for runtime, not actual CPU usage) and for bandwidth with a maximum runtime of 15 minutes.
No, not really. Cloudflare announced a private beta for their Cloudflare Workers Unbound Cron Triggers a couple of months ago, but that's about it.
I'm not sure how familiar you are with AWS Lambdas, but if you check their docs you'll notice that, unlike Cloudflare Workers offering, they are general purpose and can be even invoked from all kinds of events, including directly from HTTP requests. So, neither they are available nor are they comparable to AWS Lambdas. Thus I'm not really sure why you brought up something only made available through a private beta and is very limited in it's capabilities to compare with AWS Lambda, which is production ready for years.
So very limited, but really solve some huge problems.
I run datacenter(s) with thousands of autonomous machines. These machines run a small binary daemon. That daemon needs to check for a new version of itself, which is built/released as a CI push job on github (after all the tests pass).
A super simple CF worker serves as a reverse proxy to the GH API + the download of the binary. For $5/month, I've worked around the GH API limitations, in a massively scalable way.
Why not use package publishing tool like packagecloud.io (or setting your own private reprepro, dak variation) with unattended-upgrades configured for the repos and frequency you need ? Was the tooling at OS layer not adequate for this kind of setup ?
You are running a DC with thousands of machines, while $700 is not insignificant, it shouldn't be a decision making factor at that size. You could run package manager locally and save on that managed cost if you really wanted to.
As developers we constantly discount the value of our time. Your time in developing/maintaining these scripts is not free.
From an org perspective, to maintain a home grown solution is not free either, inevitably someone would take over this part of your role, they would need skills they otherwise probably won't need to have (making hiring harder/costlier) and they will need more training and also will have to spend time maintaining it.
This is true even if you are the founder of the organization. I thought being a founder where I am going to go so early on built solutions like yours. It turns out that doesn't make any difference The role always changes . As an employee we become older acquire more skills/knowledge/experience and role changes or simply leave the org. As a founder we grow the company and have to hire more people to do what once we did.
I get what you're saying and this isn't my first rodeo. =)
FYI, running it locally would involve buying hardware and maintaining that hardware, at each datacenter (there are multiple). We don't have any server hardware at the data centers.
> Aren't Cloudflare Workers more comparable to AWS Lambda@Edge than regular Lambda?
Yes that's my point. Unlike AWS Lambda, the usefulness of Cloudflare Workers is very specialized and narrow, like adding response headers or update a response document.
AWS Lambdas on the other hand can run freely for over 15min, have virtually no limit in how much RAM they can use, and can be pushed as a Docker image with a max size of 10GB.
If that is not enough, AWS Lambdas can be tied together into workflows with AWS step function.
Therefore, for anyone to claim that Cloudflare Workers win over AWS Lambdas, either they have no idea what AWS Lambdas are or have no idea what Cloudflare Workers are.
Here is my use case: I have a static site to process form and referral. It used to run on AWS Lambda. I migrated them to Cloudflare workers. Deployment, code editing etc is much easier.
And no, it's fully act as a standalone app. I define the route to to route a part of traffic to the worker, other parts to our pages app.
For me, it works great and replace my aws lambda usage.
It feels like a lot of people singing Workers’ praises haven’t really used them in customer-facing scenarios at large scale. They are useful but there’s a lot missing compared to Lambda.
I really like the promise of Pages but Netlify is easier to use at the moment. (I have major performance problems with their CDN though, so we're sort of half on CF Pages and half on Netlify)
Wow never knew cloudflare did this. Now if they only started offering a way to forward emails to a URL that would be truly amazing.
Right now I have to use Amazon SES + SNS for this but seeing how cloudflare already has workers, this would be a killer feature for a lot of companies including mine.
For me its their achilles heel. Tried sining up earlier this year, the UI is a clusterfuck of products and services, I spent more time on google to find the right pages and the right description, comparison, and documentation of their products
Building a DB. Several different flavors, actually. But one needs to be a distributed, multi-tenant, ACID compliant SQL database. We also need to have hooks to allow you to connect to any third party database that makes sense for your application. So… stay tuned.
If Cloudflare is able to do this now, why wasn't Akamai able to do exactly the same thing when AWS was still a baby? Serious question. Was it lack of vision? Poor execution? Technology or market just not ready yet? Without such an answer, we might have to consider the possibility that Cloudflare isn't any more able to do this than Akamai was.
I think that the key here is Cloudflare's approach: mainly working in the open - Akamai works kind of "behind the scenes", I don't think that developer working mainly for SMB is even able to try to evaluate their services, besides - just look at their website, it screams "big corps, talk to representative to learn about pricing"
I think that big disadvantage in this approach is that they are not getting "mindshare", in contrast to that people are able to use Cloudflare serivces even for themselves and as they grow professionally CF's constantly increasing amount of solutions is there as something familiar, approachable and ready to use.
It’s important to understand that Akamai has always been an enterprise company. They are not developer friendly, and target the 1000 largest websites on earth. They were never going to compete with a ground-up cloud offering.
As an ex-AWS employee (although not an exec, but I think most Amazon execs would agree), I see a lot of parallels: Cloudflare is run the exact same way as Amazon is. They believe in Clayton Christensen's theory of disruptive innovation and hence continue to disrupt themselves.
> The service will be called R2 — “one less than S3,” quipped Cloudflare CEO Matthew Prince in an interview with Protocol ahead of Cloudflare’s announcement
Oh I never thought of that. So the next one is Q1 and final one would be P0.
I was skeptical that it was a joke since it's quite a big coincidence, but according to Arthur C. Clarke:
> ...about once a week some character spots the fact that HAL is one letter ahead of IBM, and promptly assumes that Stanley and I were taking a crack at the estimable institution ... As it happened, IBM had given us a good deal of help, so we were quite embarrassed by this, and would have changed the name had we spotted the coincidence.
I was referring more to the context that the CEO is “one of us”, he is regularly on HN, and is available via Twitter/email to constructive criticism. I interviewed with CF earlier this year and ultimately decided the timing wasn’t right.
Love the company but the touch that the CEO gives a ____ really matters to me.
R language is an open source implementation of the original S language from IBM. Its name was a play on S and also based on the fact that both the authors of the R language have first names starting with R, Ross and Robert. I suppose R2 may have been a good choice as well (would have helped with google searches)
How is it even a quip? S3 already has "one less than S3", which is called Reduced Redundancy (R2). Just for whatever dumbass reason they branded it RRS :P
Without egress charges, assets in an object store can be backed at other providers for disaster/contingency/fault tolerance.
Add the excellent rclone[1] tool which, I assume, will work immediately with (just another S3 compatible store) and there's a nice and easy workflow that adds some diversity to your infrastructure.
They are? I've seen more cases I think where it drives folks all-in on the cloud. A few units tired of dealing with on-prem infra group spin stuff up in the cloud. Before you know it big workloads are lifting and shifting because its easier to bring the data to the new stuff in the cloud than it is to bring the cloud stuff on-prem.
That said, if CIO and CFO's are (truly) pissed off, then that is going to be a huge revenue swing for AWS shortly.
I personally doubt it. Azure and GCP are not that much cheaper here.
And AWS is offering some great pricing actually on things like ECS Anywhere, so now if you want it is (a bit) easier to bring a workload local to your data lake. I think that is not a great short term move by AWS, but long term helps with goodwill.
People are always saying stuff like this and I don’t think they understand, if you wait for a stock price to reach a smaller multiple of revenue, you’re probably too late to the party to make any big gains, and it will be no better than throwing your money into some ETF. AMZN used to be what, 3000x revenue? Would you have advised people not to invest in it back then? How would that turn out?
At the end of the day, you can either sit on the side lines and criticize the prices, or you can jump in and make money.
It feels like if they released a serverless/Lambda equivalent they would start taking a lot of business from the big 3. Workers are somewhat close, but the v8/isolate pattern limits them to narrower use cases. A more traditional serverless that could sit at the center and be optionally fronted by Workers would be nice.
I definitely understand that Cloudflare Workers are likely to be unsatisfying as a Lambda replacement for people who want to use a language like Python.
I think there is a decent reason why Cloudflare, at least initially, went the route that it did. V8 Isolates allow them to run code from many different people without many of the cold-start, memory, and performance issues of offering a more full environment. V8 Isolates allow them to be a lot more efficient than something like Lambda. It does come with the cost of being more limited for things like language support.
I think it's a pretty good bet. Lots of people are comfortable with JavaScript/TypeScript (even if you or I don't love it) and WebAssembly is likely to become a decently supported compilation target over the next 5 years from a lot more languages. Microsoft has done a lot for C#/.NET support of WebAssembly and it should be quite good with .NET 6 coming in 2 months. Python, Go, and many other languages have at least some support for WebAssembly and it seems like that will only get better over time.
I definitely understand it being a turn-off. If you haven't read their post introducing them, I'd read it: https://blog.cloudflare.com/cloud-computing-without-containe.... It doesn't solve your problem, but I think it's a good read on why they made that trade-off.
Cloudflare acquiring Lumen and Fly would be interesting to see.
All of Lumen’s value is in the nationwide backbone fiber although they are trying to make a play in edge computing. IMO those fiber assets in the hands of a company like CF combined with a service like Fly would be pretty incredible.
when can we see some real competition with CF? as in offering a modular CDN, WAF, etc? Your build-your-CDN blog post left me hoping you’ll build one for us :)
I think we'll solve some of this, most people who want a WAF just want to check a box. It makes sense to sell them a WAF. It's not where we're most valuable, though, our customers are pushing us towards much more interesting infrastructure. :)
Sometimes it's a negotiation stance. Sometimes it just means "getting acquired sucks and we would rather do our own thing in the way that we think is best". Like this time!
100% agree with this. Running docker images on serverless is really great and allows you to deploy your environment to any provider with minimal effort. would be cool if they can add something like this, even if it does not provide 0s startup time..
To me, that's like saying that web browsers should be able to run existing native applications, so we can lift and shift more existing code. But just as JavaScript-based web applications enabled frictionless code distribution on the client side, JavaScript-based Cloudflare Workers is doing the same on the server side. Sometimes progress requires breaking backward compatibility, and I think this is one of those cases. And I'm confident that there will be other runtime environments that emulate Cloudflare Workers, mitigating the risk of vendor lock-in.
I'm skeptical that JavaScript and WASM are anywhere near being suitable for any and all backend services.
For example, Cloudflare workers can't even talk to the outside world with anything other than fetch(). There's websockets, but only as a pair to talk to a browser that connected to you.
I'm a fan of Workers, but they do have limitations.
Likewise, there was a time when microcomputers weren't suitable for many applications. Sometimes a new approach (edit: and ultimately a better one) has to start at the low end and work its way up.
How was it ever possible for S3 to take such a market share. Or is this market share not existing? Coming from the 90ies I could never imagine paying for outgoing traffic when already paying for a server with internet connection. There was a.early time where you would get throttled to 100MBit (and much earlier in time to 10MBit/s) but this is long gone. What do you do with S3 that such prices seem fair for anything other than rarely accessed files?
I know why; I was there! (first AWS employee in EU, 2008, stayed there until 2014).
Back then, using traditional IT providers or internal IT services, you needed weeks, paperwork, etc, to get any storage or compute.
Then AWS arrives, and you could have infinite storage, or tens of EC2 instances, within a few minutes. And pay with a freaking credit card!
It didn't matter that AWS' performance was abysmal at the time. Or that AWS was expensive. AWS solved a huge pain point for millions of people, and that's why it became a market leader.
For one, you only get billed for outgoing traffic from AWS. So if all your infra is on AWS, you're not paying for that. Secondly the ease of use is a lot compared to back when you had to buy a bunch of servers to put hard drives in - remember s3 was one of the first aws services, alongside ec2.
If your load wasn't high, cause you're a startup or whatever, then paying the extra premium on the storage to save the engineering effort of building your own storage cluster worked out. Then when you were big, those egress costs had you locked out.
Plus add in thoughts about having to maintain infra vs aws do it for you and you had a lot of developer blogs/marketing sites of tech companies/whatever just serve on s3 since it was easy to use and the absolute cost for such a product is low enough that they didn't care about the relative cost of s3 vs other services.
Unless things have changed I'm not sure it's accurate that you only get charged for external traffic on AWS: I recall having pretty substantial charges internal to AWS just for traffic between different AZs in the same region, for example.
I think the parent is talking specifically about S3 rather than all of AWS -- S3 is a regional service and you can access it from any AZ within that region without additional fees. Cross-AZ charges are more commonly an issue with things like load balancers and EC2 instances.
Cross Region is common use case which makes S3 access from another region or if you are migrating painfully expensive when Compute is in multiple regions .
Cross Region replication is typical setup for many DR use cases and S3 will charge you Inter-region transfer cost for replication as well.
I suspect Cloudflare and any clones could set up their peering agreements with Cloud providers in such a way that they're exposed as a feature. First tier Cloud providers probably won't bite because it would open the door to people migrating out of their data centers.
Second tier Cloud providers would eat that up, since it would democratize things more. Even if a competitor gets the customer, at least it's not the guy who is waging a war of attrition against you.
You are only not charged if all your infra is the same location and same availability zone and same VPC .[1]
The egress cost between anything beyond will cost you a ton of money
Virtual servers are a thing since ... forever on the Internet. At least since mid-nineties you had not to think about getting hard drives into servers, if you would not wanted to.
Yes, and that's like saying that you don't understand why people eat at restaurants when there's field full of cauliflower. If you're comparing it to S3, using virtual servers means you're now taking on responsibility for configuring, operating, and securing replicated file storage in at least 3 geographically separated regions, scaling it when you start to fill up those local disks, building an API on top of that, and providing web-based access. Don't forget things like bitrot detection and prevention, storage encryption, centralized logging, event-based triggers, lifecycle management policies, tiering onto cheaper storage either by policy or automatically.
For many organizations, the 24x7 staffing needed to provide an equivalent service alone would pay for their entire storage cost multiple times over. Even if your scale is sufficient to allow beating that, you are likely to have more compelling problems for that time to be spent on.
(This is not saying that the egress charges are great, only that I completely understand why many, many people decided it was an acceptable tradeoff)
VPSes were usually not cost effective either if what you wanted was a big pool of storage space. The big growth opportunity for S3 was the companies that would otherwise run their own SANs, not the startup that was going to run on a couple of VPSes or shared hosting otherwise.
Even the startups that grew, they might start on a VPS provider, but outgrow them. S3 managed to scale with them and retain them as customers.
They were first and from a featureset and reliability perspective, S3 is still unparalleled.
That coupled with storage costs that were always very competitive and the fact you had unlimited scale and PAYG pricing got a lot of people hooked.
It’s going to take a long time for S3 customers who have experienced pretty amazing uptime and reliability for the entire life of the service to put the same level of trust in something else.
CF did a really smart thing by making R2 be able to operate as a transparent caching-replica of S3.
Consolidated billing. AWS could in fact raise prices across the board and all my previous companies would continue to pay them and not care one bit. Finance departments love AWS bills and the simple annual negotiations that come with them.
I'm a big fan of just renting real hardware and running stuff there for a fraction of the price of the Cloud, if it fits your use case.
But doing durable storage yourself, especially once the amounts get a bit unwieldy is scary. For low amounts of data you can get away with just making plenty of copies in different places, but that gets much more difficult once it's a serious amount of data. Object storage is the most appealing service even if you want to do most of the stuff yourself. And this is an area where an established track record is important, you don't want to store your data if you're not sure the service is reliable.
A great example of counter-positioning. Cloudflare is positioning itself in the market in a way that its competitor (AWS) cannot replicate — their lock-in is predicated on egress fees.
Sounds like they want to capture the market standalone VPS providers have eaten, and I don't blame them. Not acquiring users for small-scale usage means they're less likely to consider you for large-scale usage. This is a large part of Cloudflare's success - the free tier is marketing for their Enterprise tier, and it really works.
I regret not investing back when durable objects was announced. From a technical perspective, it’s a very unique capability.
I had a similar experience with Shopify. I had interactions with them in a company I worked at back in 2015 and regarded them very highly among the e-commerce platforms but didn’t buy the stock…
Same. I'm already long since shortly after the IPO, obviously I wish I invested much more. The passage about fixed bandwidth costs due to relationships with ISPs really resonated with me.
Wish I worked for them and was getting those sweet, sweet RSUs.
Kind of. It's undervalued in the sense that it's a growing business that will probably track "the growth of the internet" (so, will continue growing for a long, long time).
It's not undervalued in that quarterly revenue are $152.4 million, on a market cap of $35 billion dollars, for a staggering multiple of 57 times revenue.
By earnings I take it you mean revenue or sales, based on the rest of the text about the revenue multiple. Typically earnings refers to profit in one form or another (often it means after-tax profit, sometimes operating profit).
Cloudflare of course has no earnings. Their operating loss last quarter was $28.8 million. Their operating loss for the past four quarters was $106 million.
Wow, that pricing and the amount of hype around the co really makes me want to short it, looks very asymmetric (good case already priced in with a huuuge range of possible outcomes). Not gonna do it though, learned my lesson with Tesla, there is always a better than best case waiting to get priced in.
I thought I could save money by hosting some backend services in-house but soon realized it ended up being more expensive than EC2 solely because of the egress fees.
So whether or not Amazon intended it that way, it functions as something that’s anti-competitive because it forces you to go all-in with AWS.
I like this. AWS feels like a proprietary mainframe system (will get downvoted for saying this).
Anytime a majority of developer job postings mention a specific product/company certifications, (think PMP, or Microsoft developer certs) , its time to pivot your skill sets.
No, my first job was as an intern at IBM and my first non-intern computer job was in an AS/400 shop, and my last 2 years I've been an 'AWS Architect' and it's the same thing exactly - the guy with a good memory knows the entire IBM product line now knows all the obscure and bizarre AWS products. It's not just product certifications but even you end up building around arbitrary limits in the big company implementation rather than just modifying the open source code or adding glue layers to get what you want.
I have had to write weird proxies just because of some missing AWS features, and it made sense because, unlike the last twenty years of my computing career, I don't have the source code to all of the code running in the environment.
But yeah, if I were recommending options to a web group, I'd 100% tell them to use R2 and populate it object by object from S3. Although I suppose all the "it's ok to put this data into S3 with these controls" agreements will need to be redone for the new use case.
I‘m a fan of the fledgling CF stack but it‘s arguably more proprietary. You can‘t run containers, you can‘t run a normal database, you have to architect the application exactly for their system.
The post right about this post on HN's front page is titled "Slack is experiencing a service disruption". So for a second I thought CF was having some disruption (outage) which caused Slack to go down.
> It’s impossible to overstate the extent to which AWS changed the world, particularly Silicon Valley. Without the need to buy servers, companies could be started in a bedroom, creating the conditions for the entire angel ecosystem and the shift of traditional venture capital to funding customer acquisition for already proven products, instead of Sun servers for ideas in Powerpoints.
So the author thinks that shared hosting or servers-for-rent did not exist before AWS' popularity?
Since you're here, would love for Cloudflare to disrupt the DBaaS marketplace.
I already run my entire business on Cloudflare (for services you have) but there is a significant portion of my infrastructure (>50%) I haven't moved over that is dependent upon needing a DBaaS offering. With a DBaaS offering, I could run near 100% of my infrastructure on Cloudflare.
(Workers KV is great btw, but there are so many times where just a traditional RDMS is needed that a key-value store doesn't fill).
Can we use R2 for video? Workers KV prohibit use for video. Video streaming is the #1 growth area since the pandemic. Why is it that we can use it and Workers KV to store images but not video (chunked) ?
5G has lots of latency compared to fiber or even good cable, if you want the same latency as someone on fiber to a regionally hosted server, you need edge compute stuff.
Cloudflare could really shake things up on the ML side of things. The egress costs and GPU prices on AWS and GPC make them a nonstarter for most companies, forcing people to rack their own hardware.
That sounds right about ML startups. But ML startups or even venture funded companies are a very small percentage of companies - especially when it comes to spending signficant dollars.
My original comment was referring to companies who train their own machine learning models. They might not be spending the way large slow corporations and the government do but there's a lot of investment in the space and a ton of room for growth.
Yes we use GCP for production workloads and enjoy the benefits of being to scale at will. I'm strictly speaking about hosting large multi TB datasets and running machine learning training jobs that end up costing thousands of dollars each.
>> "The most familiar API for Object Storage, and the API R2 implements, is Amazon’s Simple Storage Service (S3)."
Ugh - a clone of S3's functionality - that's not competing.
There's been zero innovation in cloud storage beyond S3's primitive capabilities. None of the competing services have gone beyond S3's stunted functionality.
Online storage should provide:
* An SFTP interface (and no, Amazon's "charge by the hour SFTP interface to S3" doesn't count)
* The ability to query and apply filters to queries PLEASE! For goodness sake its 2021.
* A webDAV interface
* The ability to incorporate object metadata into filtering queries
Why is there zero competitive drive in this space?
API compatible does not mean it's a clone. Cloud Flare's pitch is "multi-region storage that automatically replicates objects to the locations they’re frequently requested from."
This is big and interesting and useful even if you ignore the bandwidth savings. If it were available already, we'd be trying to use it for Fly.io users.
I find it deliciously ironic that CloudFlare is eating AWS' lunch with their launch of R2, after Amazon did basically the same thing with a bunch of their services built upon open source projects.
I suppose it's now corporations stealing market share from each other...
"Eating AWS's lunch" seems quite speculative. R2 is a blog post. S3 is the industry standard to the extent that everyone else creates products based on a subset of the S3 API. S3 is also a cornerstone of the AWS ecosystems which has enormous gravitational pull.
Cloudfare is an exciting company with a lot of great products. But they're less than 1% the size of AWS or Azure. Let's see what happens.
Cloudflare served private HTTP traffic in response bodies, meaning that website results contained cookies, session data, encrypted traffic, all personally identifiable, and because it was served as response bodies, it was *indexed by search engines*, not to mention anyone else who was scraping websites during the time of the incident. It included credit card information, frames from videos, PII, the works, all linked to individual users.
This was ongoing for *months.*
Anyone savvy could use this information to hijack accounts, scrape personal information, view private browsing habits. Even when Cloudflare publicly announced it (and tried to blame others) when they thought they had cleaned up most of the data, you could still easily use search engines to find people's personal information by searching for the Cloudflare header strings that started the leaked session information.
Many countries have legal policies around data breaches, including required disclosure policies and penalties. In the greatest blind eye turn of the history of the internet, Cloudflare managed to get away with a single blog post, and no other penalties. https://blog.cloudflare.com/incident-report-on-memory-leak-c...
THAT is Cloudflare's disruption.