Why don't more companies resize images client-side first using <canvas> and then save the server some work by only asking it to verify the result by
- resizing to the same size
- removing metadata
This results in much faster transfer (10x less bandwidth used often for mobile uploads) and reduces server load by "farming" out the work to the clients.
Some people mention having original highest-resolution images are important. I don't think that is true for most applications.
Most apps don't need hi-resolution history as much as current, live engagement so older photos being smaller isn't a big deal. As technology moves on you simply start allowing higher-res uploads. Youtube, facebook, and others have done this fine as the older stuff is replaced with the new/current/now() content.
In fact, even our highest resolution images are still low-quality for the future. Pick a good max size for your site (4k?) and resize everything down to that. In a year, bump it up to 6k, then 10k, etc...
Keeping costs low has it's benefits, especially for us startups. Now if you have massive collateral, then knock yourself out.
1) Although the site serves up images at 1024 pixels (or whatever) today, in the future they may want larger images. When everyone is rocking 10K monitors and 6K phone displays, those small images are going to look pretty bad.
2) The original image has some metadata that they want to keep (geolocation, etc).
3) They think they can do a better and more consistent job resizing than the various browsers, which is probably true.
agree on 3) most browsers just use linear interpolation when resizing images, which makes sense from a performance point of view, but looks terrible.
Better to use a bi-linear or cubic resize, more computing up front, but better images, this is probably the reason they do it
If you resize the image in steps, with each resize at least 50% of the previous step, you can do a pretty decent approximation of cubic resize using the canvas. Doing this for a year now, we've gotten no complaints and we have designers as clients :)
point is, their statement does not preclude them from using data for marketing purposes. some people are content with that, but others take it as a sign that they are (or will) use the data to build dossiers on users.
Our site would have been happy with full res images from the start. As it is now we are stuck with 80x80 images that needs replacing with higher res images since the originals was not kept in any sorted order.
Long story but from the start we kept originals organized. Then we restructured and the new people couldnt care less and threw away originals or left them named 1,2,3 and so on. All useless now.
This is for proxying images that users link in chat, not for when users upload images to the service. It doesn't make sense to talk about doing this resize on the client, as the client doesn't have the image.
That is a great point. It could still be feasible to cache the image on the client, and have it do the resize. Although I probably wouldn't accept it as a client, especially if my internet cuts out halfway through.
Yes, the use-case of proxying images is a different mater. I was talking about client uploads since so many companies seem determined to waste my bandwidth and time uploading without resizing first.
As mentioned in the post, one of our core product features is preventing your IP from being shared. Given that requirement, images shared in chat have to be proxied through our infrastructure. When doing this we save a lot of money and improve client performance by reducing image sizes.
You should seriously consider doing this for your mobile client; the worst thing about Discord is that it eats mobile data if you're uploading lots of images.
Data aside, my phone takes pictures at ridiculously high resolutions. Whenever I go to send pictures with discord, it takes a good 30 seconds and half the time it'll just break.
As an aside, I wish the "share" button would share a lower resolution image instead. I don't mind storing the full quality picture, but handling a 10mb image is seriously silly.
The other side of this is if you're on mobile data and they decide to resize the images on the client then people will complain about the app eating up battery life because it's using so much CPU to resize the images. Plus, they won't have the full size image to share with people you may be chatting with on desktop. If you want to not use so much data while uploading images then you should probably resize them yourself or just not upload images unless you're on wifi.
There's no reason this couldn't be a two-step process, resizing to something reasonable on the client then fine-tuning it on the server. I'm presuming you don't see the need to start with multi-megapixel images.
> I'm presuming you don't see the need to start with multi-megapixel images.
Might be a fair presumption today, but might not be for the future with hiDPI screens, VR etc... for the relative storage costs, it'd be better to have the original, and then you can programmatically run from there.
Today's hiDPI screens can already display more detail than your eye can perceive. The issue wasn't about storage costs, it was about transmission costs which still matter for the foreseeable future.
Perhaps, but I think you can compress current-day phone images considerably without losing any actual image fidelity (because the sensor pitch significantly exceeds the lens resolution, and .. noise).
"drawImage() will ignore all EXIF metadata in images, including the Orientation. This behavior is espacially troublesome on iOS devices. You should detect the Orientation yourself and use rotate() to make it right.
"
If the origin of the image is the client and you got the client side resize wrong, then you might introduce artifacts when trying to fix it on the server because the data loss. Also if clients are mobile, you might like to optimize the battery of clients instead of computing time on the server.
The question now is, what drains more power, sending the image and resizing it before sending. If the user has a good WiFi or 4G connection, sending the file as it, should be quicker and more energy efficient. With a 2G or 3G connection, uploading a photograph can take significantly longer (1min on poor/average 3G, which means the antenna is working for that duration and draws a lot of battery). Converting it should not take more then a second. Furthermore, I would prefer using less data then using a tiny amount of battery.
Even several years ago there were libraries on github that accounted for iOS defects. However, that aside, just skip the resize on iOS and send as-is. The server still has to verify the result anyway.
First: Please don't use "Edit" for responding to responses to your comment; it make following threads much, much harder.
On Topic:
> Some people mention having original highest-resolution images are important. I don't think that is true for most applications.
It is true for every application when the next generation of displays hits the market. The question is not the long term usability of our current low-res images but just the migration to the next step. At the moment Acorn announces their new APhone and has a million handsets sold by tomorrow, you want your service to deliver at least viewable images. It's not always the app that sets the bar, sometimes
it is the device.
Edit: As someone who travels rather remote places of this planet regularly I'm grateful for every app that does not put the burden on the client. My battery packs only last so long.
As I understand they needed to download images from any external servers using an URL. I am not sure if that is possible even with CORS.
Also they don't save preview images and generate them when needed as I understood. So what you are suggesting requires a lot of disk space to keep thumbnails that might be never needed later.
And if you don't have millions of uploads per day then it makes no sense trying to save some seconds of CPU time by unnecessarily complicating the system. In most languages there already are libraries for resizing images.
Many do resizing initially, but even when resizing, you still need to resize images for different reasons, such as thumbnails. So what you need to do is resize on client as low as you are willing to go, and then upload that. But you still need to resize for different needs. You don't want the client doing multiple resizes and uploads for that.
Agreed. I wish the posts contained a "it cost X developer hours to recreate thumbor or $$$ total, and we saved Y dollars per month" meaning in approximately 15 years we'll have broken even on this investment. Oh yeah and we don't even do intelligent resizing like thumbor does.
It looks like thumbor is built on a Python stack similar to what Discord was using in their original service. What makes you say they didn't consider it or benchmark it against their previous service and make the decision that it wasn't as good as they needed?
But it also means you need to know how you will display when you save it. Layout changes, screens change, how do you anticipate the future dimensions / resolution you will need out of the original?
isn't it a meme at this point, pushing computational work to the client side? I have a laptop or mobile device, please don't hog my limited cpu and battery life by forcing my device to resize images.
also -- there's also page load time. if it's an intensive calculation then the overhead of sending the results over http is still less than the browser calc time.
There is already an (unofficial Google) image proxy written in Go that is quite fast, does caching (local or backed by S3/GCS), and does other nice things like smart cropping: https://github.com/willnorris/imageproxy
Seemed like a lot of unnecessary work for them to reimplement a service from scratch without gaining any major perf benefits over their existing one and without leaning on an existing well-known and well-built foundation.
Author of the blog post here - it looks like what you linked does its image resizing in pure Go. In our testing we found these libraries are significantly slower than the C++ resize libraries. I would guess we would need at least 10x as many instances if we used that resizer, though probably a lot more
The one thing these don't support though is smarter cropping that takes into account image contents, which takes enough cpu power to require preprocessing
I’d be very worried about a security issue with the unsafe C++ code.
You really have to run this kind of complex parsing in a disposable containerized environment to do it safely. Or do everything carefully and in a memory safe language.
I'm not sure why this is being downvoted - image processing is one of the most dangerous parts of a common consumer-facing web software stack. By and large this is because image container formats are poorly documented, overly broad, and rely on a lot of tricky binary parsing that's easy to mess up in an unsafe programming language. It's also one of the most obvious ingress points for untrusted binary data uploaded by an end-user, which is always going to be dangerous.
See the persistent, years-long trend where mobile devices and game consoles get exploited via some combination of libtiff and libpng.
The downvotes are also because it's a somewhat cliche comment on HN now. Anytime anyone is doing any with C or C++ that is even indirectly web facing, "this could be unsafe!!!" is an obligatory comment, even though all major tech companies have core components written in C++, and there are big web apps that have been running for years that are mostly written in C or C++. Security is definitely a concern, but these kind of comments can derail interesting discussion, in the same way complaining about font readability or template choice in an otherwise interesting article can.
To be fair most everything under the hood passes through to these libraries. So even sticking with python means passing unvalidated blobs through to libpng/jpeg/tiff or some other low level language.
It's the entire reason python is generally fast enough, anything that's slow generally uses a C lib under the hood anyway.
True (and I didn't downvote by the way), but a "memory safe" language might not be as helpful as people might think. Most of memory managed languages still rely on native libraries to perform image processing, if at the end you are using libpng and there is an exploit on it, it doesn't matter if you are using python or C++, both code base would have the same exploit if it is not explicitly mitigated in the logic.
The downvote is probably because the comment implied that the issue is that the image processing is done in "unsafe" C++ and that another language should have been used.
However, there isn't much choice. Performance is very important in image processing, so much that many libraries contain hand-written assembly. In the article, it says that 90% of processing power is dedicated to it. Using a safer language in a safe way could completely kill performance and significantly increase the costs.
How much does a hack of all your data and/or a major outage cost?
I also recommended a mitigation strategy for unsafe code. Complaining that security is too hard is the reason for the situation we find ourselves in as an industry.
I'd love to be pointed at any resource where somebody who has spent the time walks through the best way to do this safely. Is the only way to do it safely inside a container via some networked connection? Are there other ways to lock down ImageMagick etc such that you can resize safely?
How is the security? Any sort of image processing is a potential exploitation point. I see it says it uses the 'mature' libjpeg-turbo and libpng libraries,along with giflib for .gifs, but even with full trust of those, the C code, patches, and changes ontop could be more exploitation points. You can look through Imagemagick alone to see all the fun things possible when seemingly basic processing turns into exploits. https://www.cvedetails.com/vulnerability-list/vendor_id-1749...
ImageMagick is notoriously questionable. It was originally written, I believe, as a local command-line tool for users to work with their own images, so security and untrusted input were not primary concerns.
Additionally, image manipulation is inherently challenging - not even due to the actual manipulation of image pixel data, but due to the proliferation of complex image container formats which require binary data manipulation and byte copying in performance-critical code. This is a minefield for secure programming practices because it puts at direct odds performance and sanity checking, as well as encouraging pointer and memory arithmetic and unsafe access.
seems to me that there is no limit to available room. well, i suppose we're capped by the collective capacity of local storage and storage service providers.
ImageMagick is a particularly poor choice because it will try parsing a thousand formats your users will never upload. That's a lot of code to leave exposed to the internet.
> Today, Media Proxy operates with a median per-image resize of 25ms and a median total response latency of 85ms. It resizes more than 150 million images every day. Media Proxy runs on an autoscaled GCE group of n1-standard-16 host type, peaking at 12 instances on a typical day.
I believe this little piece answers your question:
> We likely could have addressed this behavior in Image Proxy, but we had been experimenting with using more Go, and it seemed like a good place to try Go out.
At the heart of if, they were looking for opportunities to use more Go in their stack and they deemed this situation as a fit.
3. More employees knowledgeable about Go than Python
4. More enthusiasm (and therefore faster velocity) around Go development.
The blog post was about the engineering challenges they faced and how they solved them and I think it was a great write-up in that regard. The post wasn't about why they switched this service from Python to Go.
It might be, then again I see a lot of wheel reinvention in tech / NIH syndrome.
I'm the kind of hacker who if a service runs out of memory every 2 hours, writes a crontab to restart it every hour after X random minutes so they don't all restart at the same time. It gets a lot of eye rolls from the other engineers searching for perfection, but it tends to produce services quickly that are highly reliable.
And look now the engineers who like chaos monkey don't even have to set that up. It's built in.
Part of it is just Discord’s operating scale. They are already leveraging Elixir clustering to an extremely high rate of concurrency and when you start thinking about problems from that standpoint Go becomes a much more natural fit within the stack for low level micro services.
I agree that tech in general and Silicon Valley in particular has a lot of NIH, but I also think this isn't really the case here. In particular, we're discussing a Python service that performs slow image resize calls. They would have (probably, speculation on my part/experience) had to do 2 things:
1. Add profiling and telemetry to their Python code. Refactor the codebase based on insights from this.
2. Write a C<->Python interop for their image libraries.
I can't see the cost of #2 being any different than the cost they paid on writing it in Go. As for #1, depending on how the code is structured, a rewrite may have been less time than profiling spaghetti code. At that point, it depends on how much Go experience the team has.
Yeah, either a good Python JIT or Cython would have been fine honestly. I never understood the obsession with "python is slow" when you can recover almost all of the performance with a good JIT or Cython (in many/most cases).
Yes. Or simply profiling the app and optimizing sore spots would have helped too. It seems to me there was no real reason to move from Python to Go, apart from preference.
I don't think the article gives us the data to know this. Where did the latency spikes in the original implementation come from? Would fixing them have required a complete rewrite of the Python parts anyways?
I understand this is a personal preference, but having spent a good amount time with both Python and Go, FWIW I would also choose Go if I were solving the same problem.
From reading this, seems HTTP handling speed was important to them? which Go is probably better for. Also, interfacing Python to C/C++ is pretty unpleasant.
vips (Go binding) is included in the benchmarks mentioned in the post, but at the time of running them (~10 months ago) vips pulled 51482954 ns/op on a 1024x1024 test image, where as pillow-simd managed 3324135.3035 ns/op.
Nice, but why? https://cloudinary.com, https://www.imgix.com, or https://www.filestack.com already exist and are well worth it for 99% of apps. Even at scale, it really doesn't cost that much to have someone else do it. You can use a thin proxy through your existing CDN if you want to save on their bandwidth fees.
Also http://thumbor.org and https://imageresizing.net if you want a library to host yourself which are already very fast and well tested. Put them in a docker container on a kubernetes cluster and it's all done in an hour.
I agree. Offloading this type of work to a third party who does it really well is a smart move. Why manage additional code when it's not even core to what you do?
In this case, it was perhaps cheaper for them to do in-house, and it's not rocket science? They wrote a bleeding edge library for it - sounds like they have the expertise just fine. Minimizing external dependencies can be a big deal if you have the developers to manage it.
Also, it is totally core to what they do. Images are a huge part of the Discord UX.
At 150m images per day, not counting bandwidth, imgix would cost ~135k/month. Running 12 n1-standard-16 instances (peak load according to the article) is ~$5k/month. It's not hard to see why we wrote it in house when you consider that cost.
Ok, so why a new library and associated dev time when thumbor and other libraries already exist, especially if you're willing to spend 5k/month on instances just for this?
That was pretty clear in the post - they didn't find a Golang lib that could compete with their pillow-simd on resizing, which was the main performance bottleneck.
Why was a Go version needed if performance was paramount? There are libraries already that can handle this performance just fine.
If they're going to spend 60k/year on instances, the dev time definitely wasn't worth it for this. They just wanted to use that language because this is a NIH situation, not really an engineering priority.
I'm not saying that images are not core to what they do (I use Discord a lot) but processing them is almost certainly not. Dev time is expensive enough already so spending time building and maintaining a library could end up being a waste.
This post reminded me of a very old article from Yahoo/Tumblr explaining how they were (ab)using Ceph to generate thumbnails on the fly as pictures were uploaded using the Ceph OSD plugin interface.
Unfortunately the post seems to have disappeared from the internet (it was probably around 6 years ago), so here are some other teasers:
I have built an image resizing service around this with go and libvips. With go libvips, s3gof3r, you can load s3 images directly into a buffer, pass to libvips, and serve without writing to disk. Basically, you can use edge functions with your origin as the above go service.
How much would you pay for an image resizing service? I'd been thinking for a while of putting a fleet of autoscaled thumbor boxes behind cloudfront and making a billing API for it.
Imgix's $10 minimum is so much for a personal site with maybe 500 uniques a month. If you're going for a service like that, think of people like me who host on s3/cloudfront for $.20/month. But let people scale up to millions of pageviews a month.
Don't need anything fancy. Just w=? h=? would be great, developers can handle the DPI stuff with sourceset tags.
PCI express is ~100 gbit/sec, much faster than any network interface. Internally, a GPU can resize these images by an order of magnitude faster than that, see the fillrate columns in the GPU spec.
This isn't just resampling an image: decoding a variety of image (and even video) formats, decompressing the selected frame, performing the actual resize, and then compressing the result. If the resample doesn't save more than the setup overhead, it'd be an immediate loss. Even if it does, there's an engineering cost since you now need to make sure that all of your servers have GPUs available, your chosen implementation code supports all of them with acceptable quality and error handling, etc.
Since the GPU hardware has become commonplace, there's definitely a lot more attention on using it in the server space and I think it'll become common in the next few years but that has a migration cost for early adopters since you're hitting less mature projects for critical functions. Internet-facing image processing has a bunch of tedious but important work handling format variations and errors (it'll be reported as a bug in your software if the image opens in a browser and/or photoshop), making sure that you handle gamma/colorspace consistently, etc.
If you're trying to get production-ready server out the door, it's really tempting not to deal with any of that once you hit the point where it's fast enough that engineering time costs more than the server savings.
> you now need to make sure that all of your servers have GPUs available
OP is running on google’s cloud: “n1-standard-16 host type, peaking at 12 instances on a typical day.” That instance costs $0.76/hour. Adding NVIDIA Tesla K80 is $0.7 extra.
> it's really tempting not to deal with any of that
Yeah, that’s understandable. But the original article dealt with a lot of strange technologies to get the performance they want. And ended up doing much slower, performance wise, than what’s possible with a GPU.
Agreed - but for how many different formats, and how well do those implementations support all of the various format options for things like bit depth or palettes, compression variants, etc.? That's not just things like compliance testing – itself a big problem – but also handling all of the slightly non-compliant data in the wild which users will inevitably expect to work.
(I'm somewhat biased having spent time dealing with JPEG 2000 imagery where various lapses on the standards side meant that it's still common to find images which don't display correctly in one or more implementations but are silently reported as correct in others)
Again, I'm not arguing that doing this on a GPU isn't a good idea — the hardware has become common enough that it's reasonable to assume availability for anyone who cares — but just that there's significant overhead cost for anyone who needs to handle images from unconstrained sources. It'll happen but this kind of thing always takes longer than it seems like it should.
We did consider doing GPU, but it seems like you have fewer options there. We were really picky about the resize kernel used and it seems like with GPU you may not always get the same kernels available. Also presumably that only handles resizing, not compressing/decompressing, which make up a pretty sizeable portion of the workload.
> with GPU you may not always get the same kernels available
No kernels are available _out of the box_. You code a pixel shader, implement any kernel, or any other resizing method besides kernels: https://stackoverflow.com/a/42179924/126995
> that only handles resizing, not compressing/decompressing
In my previous comment there’s a link to a commercially available JPEG codec, 100% compliant with JPEG Baseline Standard, that does both compression and decompression.
Yikes. If we had had to write our own image resizing kernel, this would have taken much longer. And ok, it can do JPEG but what about PNG, GIF, and WEBP?
As far as I understand, you goal was to cut server costs, right?
I assume the majority of pictures on the Internet are jpegs. If you have them processing on the GPU, this leaves you 16 virtual CPUs you’ve already paid for just sitting idle and waiting for the GPU to finish the job. No need to do everything on GPU.
Sorry to be confusing, I am not resizing images. Just working with data sets as large as what I image 150M images would be. The software I am working on takes point and time backups of computers and uploads them to "the cloud", I mean servers in a data center. There they can be virtualized with a click of a button in mass or one at a time, and near instantly.
This involves transfering, encrypting, compression and creating checksum of terabytes of data a hour (per node). While not exactly resizing images, I would image the computational power was on par with the service described. The entire system has about 4 PB or 8 PB in it right now, as backups are pruned (based on what people will pay for storage).
My software has a ton of space to grow and become better, but I think a better story would have been how discord handles 150M images a hour. If anything bandwidth acquiring the source image would be what I would consider the largest problem, not the CPU time to resize. In fact as long as your resize code slightly faster than the download then streaming it in and out would put your bottleneck entirely on bandwidth.
I will also note I am not a fan of libraries :p but that is not what this is about.
EDIT:
Also kudos to you, somebody criticized your post and you had the best response one could have. Inquiring minds are awesome.
Assuming the average image size is 3 MB which seems conservative, especially if they're handling GIFs as well, this is 450 TB per day. If you're handling that much data on one beefy machine then kudos.
People have just drunk so much “cheap commodity hardware” kool aid by now, they don’t realize there are cheaper and easier ways of doing things now, assuming you have devs who can code and tune for performance. Same with “big data”. Most people have sub-1T datasets. You simply don’t need Spark or anything custom for that.
- resizing to the same size
- removing metadata
This results in much faster transfer (10x less bandwidth used often for mobile uploads) and reduces server load by "farming" out the work to the clients.
https://developer.mozilla.org/en-US/docs/Web/API/CanvasRende...
# Edit: On Keeping Full Resolution Images
Some people mention having original highest-resolution images are important. I don't think that is true for most applications.
Most apps don't need hi-resolution history as much as current, live engagement so older photos being smaller isn't a big deal. As technology moves on you simply start allowing higher-res uploads. Youtube, facebook, and others have done this fine as the older stuff is replaced with the new/current/now() content.
In fact, even our highest resolution images are still low-quality for the future. Pick a good max size for your site (4k?) and resize everything down to that. In a year, bump it up to 6k, then 10k, etc...
Keeping costs low has it's benefits, especially for us startups. Now if you have massive collateral, then knock yourself out.