Hacker News new | past | comments | ask | show | jobs | submit login

Three reasons -

First to allow them to shard more effectively. With different subdomains, they can route requests to various different servers with DNS.

Second, it allows them to route you directly to the correct region the bucket lives in, rather than having to accept you in any region and re-route.

Third, to ensure proper separation between websites by making sure their origins are separate. This is less AWS's direct concern and more of a best practice, but doesn't hurt.

I'd say #2 is probably the key reason and perhaps #1 to a lesser extent. Actively costs them money to have to proxy the traffic along.




I think they should explain this a bit better. That said

For core services like compute and storage a lot of the price to consumers is based on the cost of providing the raw infrastructure. If these path style requests cost more money, everyone else ends up paying. It seems likely any genuine cost saving will be at least partly passed through.

I wouldn't underestimate #1 not just for availability but for scalability. The challenge of building some system that knows about every bucket (as whatever sits behind these requests must) isnt going to get any easier over time.

Makes me wonder when/if dynamodb will do something similar


So "improving customer experience" is really Amazon speak for "saving us money"


Makes it faster, reduces complexity and would allow them to reduce prices too


Pricing is set by markets based on competitors' offerings. Reduced costs could simply result in monopoly rents.


reduces incentive for them to raise prices


And reduces chances of outages... which is good for both customers and AWS.


Do they not charge for network costs anyway?

A more optimistic view is that this allows them to provide a better service.


They charge for data transfer. They don't charge based on the level of complexity needed for their internal network operations.


Everything is a tradeoff.


With Software defined networking you don't need the subdomain to do that.


Yeah you basically do. Sure you can reroute the traffic internally over the private global network to the relevant server, but that's going to use unnecessary bandwidth and add cost.

By sharding/routing with DNS, the client and public internet deal with that and allow AWS to save some cash.

Bear in mind, S3 is not a CDN. It doesn't have anycast, PoPs, etc.

In fact, even _with_ the subdomain setup, you'll notice that before the bucket has fully propagated into their DNS servers, it will initially return 307 redirects to https ://<bucket>.s3-<region>.amazonaws.com

This is for exactly the same reason - S3 doesn't want to be your CDN and it saves them money. See: https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosti...


I'm not sure you understand how anycast works. It would be very shocking if Amazon didn't make use of it and it's likely the reason they do need to split into subdomains.

Anycast will pull in traffic to the closest (hop distance) datacenter for a client, which won't be the right datacenter a lot of the time if everything lives under one domain. In that case they will have to route it over their backbone or re-egress it over the internet, which does cost them money.


AWS in general are not fans of Anycast. Interesting thread from one of their principal engineers on the topic.

https://twitter.com/colmmacc/status/1067265693681311744

Google Cloud took a different approach based on their existing GFE infrastructure. It does not really seem to have worked out, there have been a couple of global outages due to bad changes to this single point of failure, and they introduced a cheaper networking tier that is more like AWS.


> AWS in general are not fans of Anycast.

I don't think that's true. Route53 has been using Anycast since its inception [0].

The Twitter thread you linked simply points out that fault isolation is tricky with Anycast, and so I am not sure how you arrived at the conclusion that you did.

[0] https://aws.amazon.com/blogs/architecture/a-case-study-in-gl...


Route53 is the exception, compared to Google Cloud where the vast majority of api's are anycast through googleapis.com

It's a good choice for DNS because DNS i a single point of failure anyway, see yesterdays multi hour Azure/Microsoft outage!


Got it, thanks. Are there research papers or blog posts by Google that reveal how they resume transport layer connections when network layer routing changes underneath it (a problem inherent to Anycast)?


I do understand how it works and can confirm that AWS does not use it for the IPs served for the subdomain-style S3 hostnames.

Their DNS nameservers which resolve those subdomains do of course.

S3 isn't designed to be super low latency. It doesn't need to be the closest distance to client - all that would do is cost AWS more to handle the traffic. (Since the actual content only lives in specific regions.)


Huh? If the DNS doesnt see the bucket name how can it hand back the right IP of where the bucket lives?


How does that work? My browser is going to send all requests to the same domain to the same place.


Anycast ip.

You have a sole ip address. All traffix routed to nearest PoP. The PoP makes the call on where and how to route the request.

Lookup google front end (GFE) whitepaper. Or thd google cloud global load balancer

That front end server that lives in the PoP can also inspect the http packets for layer 7 load balancing.

https://cloud.google.com/load-balancing/docs/load-balancing-...


Added to my comment, but basically S3 is not a CDN - it doesn't have PoPs/anycast.

They _do_ use anycast and PoPs for the DNS services though. So that's basically how they handle the routing for buckets - but relies entirely on having separate subdomains.

What you're saying is correct for Cloudfront though.


With SDN the PoP would only need to receive the TCP request and proxy TCP acks.

Raw data could flow from a different PoP that's closer to DC.

Aka user->Closest PoP-> backhaul fiber -> dc->user


Presumably Amazon has PoPs for CloudFront; why couldn't S3 share the same infrastructure?


They could do that, but they have absolutely no incentive to do so - all it would do is cost them more. S3 isn't a CDN and isn't designed to work like one.


It means two hops not one. S3 gets can be cached but then you have a whole host off issues. Better to get to the origin.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: