Image resizer scaling is one of the more interesting problems I have worked on in the last 10 years of so. I was part of a small team that designed and built the resizer that powers the nine.com.au network of sites. Modest by USA standards it gets close to hundreds of millions of views a day across the whole network.
We ended up using shared nothing architecture. The whole thing ran on 6 T2 large AWS instances using a slightly modified version of Thumbor where if you rotated the key the disk cache would be shared so avoid a large scale cache invalidation. It worked quite well and we rotated the key every few weeks.
Things I learnt.
Pretty much all image resizers have the same performance as all the good ones call out to C libraries in the end. Akamai (the CDN we used) despite having site-shield on would still hit the back-end ~100 times for the same image on occasion as I suspect all of the whitelisted machines could request the same image if their internal sharing didn't kick in fast enough.
Long tail images were the ones that brought the resizer to its knees. The hot images would quickly enter the local disk cache and were not an issue. Purge the whole cache though and the long tail images would quickly overwhelm the instances.
The last thing I learnt was to have a backup cloud-front ready to flip over to. At one point Akamai had issues and the resizer was facing origin load. It capped out at about 300 RPS which couldn't keep up with what was expected. It got even worse when the T2 instances ran out of credit. Spinning up cloud-front solved that issue once the DNS flip kicked in.
One good thing to come out of it was I helped write the C# thumbor library as we had one site that was using C# and nobody could move over to the new resizer without it.
I once tried to pitch a one "mobile ecommerce website as a service" company in Vancouver to go for GPU based image rescaler at around 2011.
A very dumb proposal: no caching, resize on the fly, the gpu has many gigabits of resizing performance for as long as JPEG is involved. One GPU works in decoding with VDPAU, one in encoding with CUDA.
That knocked down any google app engine based "elastic" service on economic basis, but the catch is that you have to send that GPU resizer to every colo. That did not work out as with google app engine you were getting access for google's POP network for almost free, and they were already paying for gigantic amount of CDN traffic.
----
When I worked as sub-subcontractor for the Alibaba's RDMA wired DC project, there was one demo by another team where they got DSP devs involved and they got a 10GB/s JPEG transcoders for under 100W. I think, most of power budget was going to the FPGA that was linking it all with the NIC :/
An expensive toy, but it again demonstrated to me just how powerful is the "lockdown" power of all those "cloud" companies. You can not buy anything like this on the open market.
Imagine what it could've been if they offered something more cash worthy over the RDMA there.
I said long ago that the killed product during the Bitcoin boom was not the mining itself, but leasing and renting the rigs. Your capital costs get covered near instantly, and you can cash out the next week. I believe that all that "cloud" thing will eventually follow this path.
Six t2.large instances sounds pretty efficient. For high volume image resizing on mobile devices we have access to GPU libraries. I wonder if something CUDA or openCL powered would help increase efficiency in a cloud based service.
A trick I've seen used at least one large site (feedly) use is piggybacking off Google's image serving infrastructure. Their ggpht/googleusercontent system gives you access to an image manipulation platform with more features than many open-source solutions (width, height, blur, rotate, frame, invert, etc). The only legitimate way to use it is through an application on their appengine platform, and I'm not sure why they don't offer it as part of the google cloud suite. Feedly seems to take the url in their appengine instance (seemingly dedicated solely to this, and redirects to a google URL which can then use the image manipulation features. Does anyone else here do something similar?
Edit: forgot to mention, the appengine documentation is very limited, and only mentions the ability for width/height resizing. Searching stackoverflow and other sites, however, reveals many other available modifiers
Edit 2: Also to mention is that the (ab)use of this service is quite popular with illegal sites. Who doesn't love offloading your image bandwidth to google's image proxies?
More interestingly, the github repo is just a wonderful example of a fully built application using modern techniques in a microservices architecture: https://github.com/DMarby/picsum-photos
It's so hard to always find how all the pieces fit together and this repo has it all. Really impressive.
Is it really a microservice architecture? It looks like it is a web application with a frontend, backend api and normal modules/packages for the functionality?
Well let's limit the possibilities to any image between 1x1 and 1920x1080.
Calculating the number of possible images is simple enough. It's just 1920*1080, or 2073600 images.
Now what's the average size of each image? Well the average of each dimension is half of the full size, so the average area should be 1/4th to of a 1920x1080 image, or 518400 pixels per image.
So in total, we need to save 1074954240000 pixels. Now how many bytes does each pixel take to store? I really don't know. You can save them as 8bit RGB PNG images and assume each pixel will use 24 bytes. You'd add some for headers, and remove some to account for compression, but let's ignore that for now. Maybe someone else can chime in. But for now we need to store 25798901760000 bytes, or about 23TB.
It's also possible to have the same base image that you cut a portion out of, so you wouldn't need to store an original image for each possibility, just cache results of cropping as needed.
I'll bet that most image requests will be within certain parameters. 2^x by 2^y. So you could probably pre-cache most real-world image sizes, and leave dynamic generation for 1-offs.
AND you could try to figure out how to do that beforehand, but some LRU-type cache solves the problem without any prior knowledge of what those sizes are.
What's the point? You'll create dimensions never used by users (CPU sensitive), you'll store files not used by users (storage), you'll miss some configurations for sure. They're creating files once and store them in a cache (CDN and local cache). Basically, they are doing something similar to your suggestion minus creating unnecessary files.
This is what I use in production right now. It works well once you figure it out, but backwards compatibility with the previous Thumbor based version isn't as good (or at least correctly documented) as they say. And the previous Thumbor based processor just stopped working one day. It's not as robust as a system as I expected.
I run a similar image host with many times the traffic of picsum as well as daily DDOS attempts. This is pretty much the approach I've chosen. (I don't use S3, but a similar setup.)
My infrastructure is much simpler than this (CDN in front of Varnish? What the hell software do you think they're using for the CDN?) and my total hosting costs are about $150 a month since recently upgrading the server.
Seeing articles like this just reaffirms for me: The people who write these articles are not necessarily experts.
Two layers of the same cache can be beneficial, even if they're both using Varnish. Let's walk through a couple of request scenarios. I'll assume I'm both running the application/inner cache/load balancers and testing the request flows myself for simplcity of pronouns.
I request image42 and it's in the outer cache. I get served from the outer cache.
I request image127 and it's a cache miss on this server. It asks its backend, which is another cache, and this time it's a cache hit since it hadn't time out there yet.
I request image128 and my browser requests the same image again from the same backend, it doesn't even have to hit my load balancer the second time.
I request image2049. It's a miss on the outer cache. It's a miss on the inner cache. It gets generated by processing in a primary application. I then request it again, and I hit a different frontend cache. It's a miss in this frontend, but this cache is hopefully refreshed from that inner layer of cache rather than going all the way back to the application. If the load balancer pins traffic based on the ultimate end-user's IP to a particular inner-circle Varnish box via MRU then the chances are quite high that's what happens.
I request image4095 and it has expired from the inner cache, but is still unexpired in the outer cache so it never gets beyond the CDN.
I understand what's happening -- there's no need to explain. Running a dedicated varnish instance for the handful of requests that have a cache miss is pointless and I'd be willing to bet he didn't benchmark it.
In 99% of workflows, what's going to happen on a cache miss at the CDN is you'll hit varnish, which will also suffer a cache miss since it's a rarely-requested resource that's being requested. That 1% of cases it helps with are the few that have been requested recently enough to have not been evicted from the Varnish server but not so recently that they haven't been evicted from the CDN. It's a vanishingly-small amount of traffic. Most of what your varnish server will be doing is making requests to your main server while doubling your bandwidth and server costs. And latency.
Not practical at all.
That infrastructure would've been better spent on another server in the load-balancer rotation -- which is also unnecessary since I run a nearly-identical offering with many times the traffic and do it off of a single server + CDN so I speak from experience.
Not to mention the most ridiculous turtle in this stack: Spaces itself is a CDN. That means on every cache miss the traffic gets bounced from a CDN (Cache #1) to a load balancer (does that imply multiple Varnish instances?) which bounces it to Varnish (aka Cache #2) to a server which makes one request to Postgres, then another to Redis (Cache #3) and if it finally finds its file it redirects to Spaces (Cache #4). Your real traffic coming in is almost all going to get served by the CDN -- and when it's not on the CDN, it's going to be from a page that only gets hit once a week or once a month or less. That means if it's not in your outer cache it's not going to be in any of your inner caches, since it's long-tail traffic. And the long tail is quite a lot of traffic.
Again: I have an image site that gets much more traffic than picsum and I run it off of a single server + CDN. My biggest cost by far is bandwidth. He's not doing himself any favors with all this over-engineering. My service has a CDN which -- upon cache miss -- serves a flat file from my server. Done. 4TB of data transfer monthly and .75TB of flat files stored across multiple volumes. New files are processed / generated at upload and that's the end of the story. I'm just some random shmuck on the internet so you don't have to believe me but I've just had an epiphany in reading this story by some guy who happens to do exactly what I do and not as well but with many, many more steps and I'm realizing I'm an expert on shit I don't even think about being an expert on while other people who think they're experts -- aren't.
You do make some solid points. You assume, however, that there's a bandwidth cost between the load balancer and the backend which won't be true for everyone. You also don't consider the cache behind the load balancer might be much larger and have a much longer TTL than the CDN's cache. Economics of this sort of setup are entirely different if you're putting every piece on a cloud instance rather than having a rack somewhere with your private data flowing for free over your own switch.
Author of the post here, figured I'd clarify some things since there seem to be some major misconceptions present.
First off, I don't claim to be an expert, I find that a pretty arrogant title for anyone to use.
I'd like to think I know a thing or two about building highly scalable webservices however, and of course I'm always open to the opportunity to learn if I'm doing things incorrectly.
That said, Picsum is what I use to play around with new technologies and try new things since it's high-traffic enough that I can get some real data on how things perform.
Is it very over-engineered? Absolutely, but that's part of the fun.
When it comes to Picsum, the reason for not pre-processing all the images is that there are simply too many variations with the sizes and variations you can request through the API. For every image, there are 5001 * 5001 * 22 variations that can be requested, and in total, we have just under a thousand source images.
As for running Varnish behind our CDN, this is done for a couple of reasons:
- We can make sure that an image is only processed once simultaneously, even tho the CDN might request it multiple times before it's cache has been filled.
- We can apply optimizations, such as sorting and filtering the query parameters for variations, to achieve a better cache rate. This is not possible to do with the CDN provider we use.
The resources it uses are negligible, the extra latency within the cluster is vanishingly small, and it saves us a lot of extra processing.
Every service within the Kubernetes cluster runs at least two replicas, varnish included, for redundancy and to distribute the load. We're not using separate servers for each layer/component, that'd be wasteful.
As for bandwidth costs, there's no cost for the bandwidth between the CDN and the load balancer, as DigitalOcean does not charge for load balancer bandwidth. There's also no cost for anything behind the load balancer, as this is all internal traffic, either within DigitalOcean or within the Kubernetes cluster itself.
Talking about Spaces, I think you might be confused. Spaces is an object storage, which also happens to have optional CDN capability built-in. Picsum only uses the object storage part, for storing the source images that are used for processing. The reason we use Redis to cache said source images is to avoid having to fetch them from Spaces on every request, as this is rather slow comparatively.
An important distinction here is that Spaces/Redis stores and caches the source images, not the processed ones, which are cached by Varnish and the CDN.
As an aside, since you seem to think that comparing numbers for services with vastly different needs and usecases is worthwhile, Picsum serves a bit over 8TB of traffic a month, and costs less then your setup to run.
DO has sometimes been accused of overselling capacity and then terminating services when they are fully used. I have never actually heard a first-hand account of this, but I assume that is what the parent is talking about.
I'm not sure if this is what you're talking about, or if the scale just wasn't a problem, but anecdotally I've (accidentally) pegged a 5$/mo instance for at least a month without any issues.
EDIT: I do know that DO/Linode/(maybe vultr) have blackholed IPs during large scale DDOS attacks, simply because they don't have the infrastructure to mitigate it, and you affect their other customers.
I run https://dummyimage.com, the first placeholder image service which has been online since 2007. I use a 1GB, 1vCPU instance from DreamCompute costing $6.00 per month. That's it.
The article mentions Digital Ocean provides the infrastructure so it seems the only cost is dev time. It reads almost like a promotional for DO with all of their services mentioned, which is probably why they support it to begin with.
I though about that, too, but that's not any different from using AWS specific services. I actually now realized why they use bloody annoying different names: it's free marketing when someone writes about how they implemented stuff instead of having a generic name.
Different names are alto useful for negatives. Someone writing about an issue with their CDN is far less useful than saying the issue occurs with Akamai.
In general I like to see brand signals: if someone I respect mentions that they use Cloudflare, that is useful information, even without further details.
Unsolicited mentions are usually useful, it's just sneaky paid advertising is bad.
We ended up using shared nothing architecture. The whole thing ran on 6 T2 large AWS instances using a slightly modified version of Thumbor where if you rotated the key the disk cache would be shared so avoid a large scale cache invalidation. It worked quite well and we rotated the key every few weeks.
Things I learnt.
Pretty much all image resizers have the same performance as all the good ones call out to C libraries in the end. Akamai (the CDN we used) despite having site-shield on would still hit the back-end ~100 times for the same image on occasion as I suspect all of the whitelisted machines could request the same image if their internal sharing didn't kick in fast enough.
Long tail images were the ones that brought the resizer to its knees. The hot images would quickly enter the local disk cache and were not an issue. Purge the whole cache though and the long tail images would quickly overwhelm the instances.
The last thing I learnt was to have a backup cloud-front ready to flip over to. At one point Akamai had issues and the resizer was facing origin load. It capped out at about 300 RPS which couldn't keep up with what was expected. It got even worse when the T2 instances ran out of credit. Spinning up cloud-front solved that issue once the DNS flip kicked in.
One good thing to come out of it was I helped write the C# thumbor library as we had one site that was using C# and nobody could move over to the new resizer without it.