How Discord Resizes 150M Images Every Day with Go and C++

Xeoncross · on Nov 14, 2017

Why don't more companies resize images client-side first using <canvas> and then save the server some work by only asking it to verify the result by

- resizing to the same size

- removing metadata

This results in much faster transfer (10x less bandwidth used often for mobile uploads) and reduces server load by "farming" out the work to the clients.

https://developer.mozilla.org/en-US/docs/Web/API/CanvasRende...

# Edit: On Keeping Full Resolution Images

Some people mention having original highest-resolution images are important. I don't think that is true for most applications.

Most apps don't need hi-resolution history as much as current, live engagement so older photos being smaller isn't a big deal. As technology moves on you simply start allowing higher-res uploads. Youtube, facebook, and others have done this fine as the older stuff is replaced with the new/current/now() content.

In fact, even our highest resolution images are still low-quality for the future. Pick a good max size for your site (4k?) and resize everything down to that. In a year, bump it up to 6k, then 10k, etc...

Keeping costs low has it's benefits, especially for us startups. Now if you have massive collateral, then knock yourself out.

AndrewStephens · on Nov 14, 2017

A few reasons:

1) Although the site serves up images at 1024 pixels (or whatever) today, in the future they may want larger images. When everyone is rocking 10K monitors and 6K phone displays, those small images are going to look pretty bad.

2) The original image has some metadata that they want to keep (geolocation, etc).

3) They think they can do a better and more consistent job resizing than the various browsers, which is probably true.

malux85 · on Nov 14, 2017

agree on 3) most browsers just use linear interpolation when resizing images, which makes sense from a performance point of view, but looks terrible. Better to use a bi-linear or cubic resize, more computing up front, but better images, this is probably the reason they do it

amelius · on Nov 14, 2017

But soon you can do any type of resizing through WASM on the browser.

XCSme · on Nov 14, 2017

You can already do it, just use a library or implement your own scaling function and don't use the built-in image resize functions.

searealist · on Nov 15, 2017

bi-linear is linear interpolation

nice_byte · on Nov 15, 2017

I think they may have meant "nearest-neighbor", which isn't true for any browsers that I know of.

Regardless, there are still better filters than bilinear, i.e. Lanczos, which I'm pretty sure none of the browsers use.

trumpeta · on Nov 14, 2017

This is my understanding as well, you could kill a browser trying to do single-threaded bilinear or cubic resize on a sufficiently large image.

Poefke · on Nov 15, 2017

If you resize the image in steps, with each resize at least 50% of the previous step, you can do a pretty decent approximation of cubic resize using the canvas. Doing this for a year now, we've gotten no complaints and we have designers as clients :)

SaltyBackendGuy · on Nov 14, 2017

> 2) The original image has some metadata that they want to keep (geolocation, etc).

Isn't exif data something you should strip out?

indiandennis · on Nov 14, 2017

Discord makes most of their revenue from selling user data so they probably want to keep as much as possible.

RussianCow · on Nov 14, 2017

Do you have a source? That's a pretty bold claim, and I haven't seen anything else to back that up.

hintss · on Nov 14, 2017

pretty sure they've directly stated that they're not selling user data

gruez · on Nov 14, 2017

to be fair, you could say the same of facebook/google. they're not selling the data (giving it to third parties), but they're making money off it.

Operyl · on Nov 14, 2017

Sure, they serve said data to other authorized users and clients (intended recipients). Name a business that doesn’t do that. ;)

gruez · on Nov 15, 2017

point is, their statement does not preclude them from using data for marketing purposes. some people are content with that, but others take it as a sign that they are (or will) use the data to build dossiers on users.

Moru · on Nov 14, 2017

Our site would have been happy with full res images from the start. As it is now we are stuck with 80x80 images that needs replacing with higher res images since the originals was not kept in any sorted order.

Xeoncross · on Nov 14, 2017

So you kept the originals, but didn't organize them into a way you could use them later?

Moru · on Nov 15, 2017

Long story but from the start we kept originals organized. Then we restructured and the new people couldnt care less and threw away originals or left them named 1,2,3 and so on. All useless now.

Lutin · on Nov 14, 2017

This is for proxying images that users link in chat, not for when users upload images to the service. It doesn't make sense to talk about doing this resize on the client, as the client doesn't have the image.

TrinaryWorksToo · on Nov 14, 2017

That is a great point. It could still be feasible to cache the image on the client, and have it do the resize. Although I probably wouldn't accept it as a client, especially if my internet cuts out halfway through.

Xeoncross · on Nov 14, 2017

Yes, the use-case of proxying images is a different mater. I was talking about client uploads since so many companies seem determined to waste my bandwidth and time uploading without resizing first.

drdaeman · on Nov 14, 2017

Client uploads with client-provided thumbnails could lead to some pranks.

You know, a thumbnail with a cute kitten but something completely different after you click on it ;)

Xeoncross · on Nov 14, 2017

I think you misunderstood. The client is uploading a single data blob for each image, but the blob can be smaller if the client resizes it first.

b1naryth1ef · on Nov 14, 2017

As mentioned in the post, one of our core product features is preventing your IP from being shared. Given that requirement, images shared in chat have to be proxied through our infrastructure. When doing this we save a lot of money and improve client performance by reducing image sizes.

Xeoncross · on Nov 14, 2017

So why the image can't first be resized/compressed before being sent through your infrastructure...?

b1naryth1ef · on Nov 14, 2017

Ah sorry I misunderstood you. We keep the original image around and provide different sizes for different platforms/resolutions/dpis/etc.

radicalbyte · on Nov 14, 2017

You should seriously consider doing this for your mobile client; the worst thing about Discord is that it eats mobile data if you're uploading lots of images.

tzahola · on Nov 14, 2017

That’s exactly the reason why the iOS Mail app asks you whether you want to attach an image in small/medium/large/original size.

ehsankia · on Nov 15, 2017

Data aside, my phone takes pictures at ridiculously high resolutions. Whenever I go to send pictures with discord, it takes a good 30 seconds and half the time it'll just break.

As an aside, I wish the "share" button would share a lower resolution image instead. I don't mind storing the full quality picture, but handling a 10mb image is seriously silly.

rockostrich · on Nov 15, 2017

The other side of this is if you're on mobile data and they decide to resize the images on the client then people will complain about the app eating up battery life because it's using so much CPU to resize the images. Plus, they won't have the full size image to share with people you may be chatting with on desktop. If you want to not use so much data while uploading images then you should probably resize them yourself or just not upload images unless you're on wifi.

mark-r · on Nov 14, 2017

There's no reason this couldn't be a two-step process, resizing to something reasonable on the client then fine-tuning it on the server. I'm presuming you don't see the need to start with multi-megapixel images.

wastedhours · on Nov 14, 2017

> I'm presuming you don't see the need to start with multi-megapixel images.

Might be a fair presumption today, but might not be for the future with hiDPI screens, VR etc... for the relative storage costs, it'd be better to have the original, and then you can programmatically run from there.

mark-r · on Nov 14, 2017

Today's hiDPI screens can already display more detail than your eye can perceive. The issue wasn't about storage costs, it was about transmission costs which still matter for the foreseeable future.

dboreham · on Nov 14, 2017

Perhaps, but I think you can compress current-day phone images considerably without losing any actual image fidelity (because the sensor pitch significantly exceeds the lens resolution, and .. noise).

pmelendez · on Nov 14, 2017

From the notes of your link:

"drawImage() will ignore all EXIF metadata in images, including the Orientation. This behavior is espacially troublesome on iOS devices. You should detect the Orientation yourself and use rotate() to make it right. "

If the origin of the image is the client and you got the client side resize wrong, then you might introduce artifacts when trying to fix it on the server because the data loss. Also if clients are mobile, you might like to optimize the battery of clients instead of computing time on the server.

Matt3o12_ · on Nov 14, 2017

The question now is, what drains more power, sending the image and resizing it before sending. If the user has a good WiFi or 4G connection, sending the file as it, should be quicker and more energy efficient. With a 2G or 3G connection, uploading a photograph can take significantly longer (1min on poor/average 3G, which means the antenna is working for that duration and draws a lot of battery). Converting it should not take more then a second. Furthermore, I would prefer using less data then using a tiny amount of battery.

Xeoncross · on Nov 14, 2017

Even several years ago there were libraries on github that accounted for iOS defects. However, that aside, just skip the resize on iOS and send as-is. The server still has to verify the result anyway.

jlg23 · on Nov 14, 2017

First: Please don't use "Edit" for responding to responses to your comment; it make following threads much, much harder.

On Topic:

> Some people mention having original highest-resolution images are important. I don't think that is true for most applications.

It is true for every application when the next generation of displays hits the market. The question is not the long term usability of our current low-res images but just the migration to the next step. At the moment Acorn announces their new APhone and has a million handsets sold by tomorrow, you want your service to deliver at least viewable images. It's not always the app that sets the bar, sometimes it is the device.

Edit: As someone who travels rather remote places of this planet regularly I'm grateful for every app that does not put the burden on the client. My battery packs only last so long.

antisthenes · on Nov 14, 2017

The last thing I want is client-side image resizing when my browser is already choking on heavy javascript.

timdorr · on Nov 14, 2017

These images are often links from the web or posted by a bot, so they're not on a client until after they've been served.

codedokode · on Nov 14, 2017

As I understand they needed to download images from any external servers using an URL. I am not sure if that is possible even with CORS.

Also they don't save preview images and generate them when needed as I understood. So what you are suggesting requires a lot of disk space to keep thumbnails that might be never needed later.

And if you don't have millions of uploads per day then it makes no sense trying to save some seconds of CPU time by unnecessarily complicating the system. In most languages there already are libraries for resizing images.

megy · on Nov 14, 2017

I have no idea how what you are saying related to compressing images before they are sent?

jasonlotito · on Nov 14, 2017

Many do resizing initially, but even when resizing, you still need to resize images for different reasons, such as thumbnails. So what you need to do is resize on client as low as you are willing to go, and then upload that. But you still need to resize for different needs. You don't want the client doing multiple resizes and uploads for that.

ElevenLathe · on Nov 14, 2017

Because then you don't get to build out fun infrastructure like this and write it up in your company blog.

_3u10 · on Nov 14, 2017

Agreed. I wish the posts contained a "it cost X developer hours to recreate thumbor or $$$ total, and we saved Y dollars per month" meaning in approximately 15 years we'll have broken even on this investment. Oh yeah and we don't even do intelligent resizing like thumbor does.

rockostrich · on Nov 15, 2017

It looks like thumbor is built on a Python stack similar to what Discord was using in their original service. What makes you say they didn't consider it or benchmark it against their previous service and make the decision that it wasn't as good as they needed?

Xeoncross · on Nov 14, 2017

Sure you do! Don't you know that adding "Free" to a title increases the ROI by 40%?

"How Discord Resizes 150M Images Every Day for Free"

batmansmk · on Nov 14, 2017

It may work in certain cases.

But it also means you need to know how you will display when you save it. Layout changes, screens change, how do you anticipate the future dimensions / resolution you will need out of the original?

user5994461 · on Nov 14, 2017

Historically, resizing images client side doesn't work because most clients are not able to render the images, let alone resize them.

The image file formats are very very complicated, many are platform specific, some are covered by patents.

For example of a common issue, another comment mentioned the rotation parameter, it's set by many cameras but the support is inconsistent.

sgk284 · on Nov 14, 2017

You often want the original because different clients may display different sizes.

dajohnson89 · on Nov 15, 2017

isn't it a meme at this point, pushing computational work to the client side? I have a laptop or mobile device, please don't hog my limited cpu and battery life by forcing my device to resize images.

_Tev · on Nov 15, 2017

Are you sure your wifi connection will not eat more power with huge upload than processor/gpu doing the scaling?

dajohnson89 · on Nov 15, 2017

I dont see how that matters at all.

also -- there's also page load time. if it's an intensive calculation then the overhead of sending the results over http is still less than the browser calc time.

gsich · on Nov 14, 2017

Mandatory XKCD: https://xkcd.com/1683/

greenleafjacob · on Nov 14, 2017

Imgur does this.

sgk284 · on Nov 14, 2017

There is already an (unofficial Google) image proxy written in Go that is quite fast, does caching (local or backed by S3/GCS), and does other nice things like smart cropping: https://github.com/willnorris/imageproxy

Seemed like a lot of unnecessary work for them to reimplement a service from scratch without gaining any major perf benefits over their existing one and without leaning on an existing well-known and well-built foundation.

brian-armstrong · on Nov 14, 2017

Author of the blog post here - it looks like what you linked does its image resizing in pure Go. In our testing we found these libraries are significantly slower than the C++ resize libraries. I would guess we would need at least 10x as many instances if we used that resizer, though probably a lot more

bpicolo · on Nov 14, 2017

https://github.com/thoas/picfit is another golang lib for this, and it's pretty mature at this point.

The one thing these don't support though is smarter cropping that takes into account image contents, which takes enough cpu power to require preprocessing

gourou · on Nov 14, 2017

Link to the resulting open-source project:

https://github.com/discordapp/lilliput

caltrops · on Nov 14, 2017

I’d be very worried about a security issue with the unsafe C++ code.

You really have to run this kind of complex parsing in a disposable containerized environment to do it safely. Or do everything carefully and in a memory safe language.

bri3d · on Nov 14, 2017

I'm not sure why this is being downvoted - image processing is one of the most dangerous parts of a common consumer-facing web software stack. By and large this is because image container formats are poorly documented, overly broad, and rely on a lot of tricky binary parsing that's easy to mess up in an unsafe programming language. It's also one of the most obvious ingress points for untrusted binary data uploaded by an end-user, which is always going to be dangerous.

See the persistent, years-long trend where mobile devices and game consoles get exploited via some combination of libtiff and libpng.

Impossible · on Nov 14, 2017

The downvotes are also because it's a somewhat cliche comment on HN now. Anytime anyone is doing any with C or C++ that is even indirectly web facing, "this could be unsafe!!!" is an obligatory comment, even though all major tech companies have core components written in C++, and there are big web apps that have been running for years that are mostly written in C or C++. Security is definitely a concern, but these kind of comments can derail interesting discussion, in the same way complaining about font readability or template choice in an otherwise interesting article can.

caltrops · on Nov 14, 2017

This isn’t one of those. Handing large amounts of unvalidated user input to these libraries is particularly dangerous.

_3u10 · on Nov 14, 2017

To be fair most everything under the hood passes through to these libraries. So even sticking with python means passing unvalidated blobs through to libpng/jpeg/tiff or some other low level language.

It's the entire reason python is generally fast enough, anything that's slow generally uses a C lib under the hood anyway.

CyberDildonics · on Nov 15, 2017

Where is the assumption coming from that it hasn't been validated?

searealist · on Nov 15, 2017

Unvalidated user input? What are you talking about, this is about image resizing. Your buzzwords make no sense.

LambdaComplex · on Nov 15, 2017

Yes, and images are user input in this case

pmelendez · on Nov 14, 2017

True (and I didn't downvote by the way), but a "memory safe" language might not be as helpful as people might think. Most of memory managed languages still rely on native libraries to perform image processing, if at the end you are using libpng and there is an exploit on it, it doesn't matter if you are using python or C++, both code base would have the same exploit if it is not explicitly mitigated in the logic.

GuB-42 · on Nov 14, 2017

The downvote is probably because the comment implied that the issue is that the image processing is done in "unsafe" C++ and that another language should have been used.

However, there isn't much choice. Performance is very important in image processing, so much that many libraries contain hand-written assembly. In the article, it says that 90% of processing power is dedicated to it. Using a safer language in a safe way could completely kill performance and significantly increase the costs.

caltrops · on Nov 14, 2017

How much does a hack of all your data and/or a major outage cost?

I also recommended a mitigation strategy for unsafe code. Complaining that security is too hard is the reason for the situation we find ourselves in as an industry.

abiox · on Nov 14, 2017

> How much does a hack of all your data and/or a major outage cost?

seems to vary wildly. for some, it's not that expensive.

jcelerier · on Nov 14, 2017

>How much does a hack of all your data and/or a major outage cost?

How much indeed ? What was the last time ? Ah, yes, Equifax. What happened ? Nothing.

abiox · on Nov 14, 2017

> I'm not sure why this is being downvoted

if i was a betting person, i'd wager that it may see somewhat like "rewrite it in rust" cargo culting.

Franciscouzo · on Nov 14, 2017

Except those would be upvoting it...

hemancuso · on Nov 14, 2017

I'd love to be pointed at any resource where somebody who has spent the time walks through the best way to do this safely. Is the only way to do it safely inside a container via some networked connection? Are there other ways to lock down ImageMagick etc such that you can resize safely?

searealist · on Nov 15, 2017

This has nothing to do with parsing.

Also, your life must be very stressful.

devwastaken · on Nov 14, 2017

How is the security? Any sort of image processing is a potential exploitation point. I see it says it uses the 'mature' libjpeg-turbo and libpng libraries,along with giflib for .gifs, but even with full trust of those, the C code, patches, and changes ontop could be more exploitation points. You can look through Imagemagick alone to see all the fun things possible when seemingly basic processing turns into exploits. https://www.cvedetails.com/vulnerability-list/vendor_id-1749...

mark-r · on Nov 14, 2017

They specifically addressed this by throwing a fuzzer at it. Of course that's to find crashes rather than exploits, but it's a good start.

Buttons840 · on Nov 14, 2017

Wow really? Is there room for another image processing library? Is ImageMagic poorly written or is image manipulation inherently risky?

bri3d · on Nov 14, 2017

ImageMagick is notoriously questionable. It was originally written, I believe, as a local command-line tool for users to work with their own images, so security and untrusted input were not primary concerns.

Additionally, image manipulation is inherently challenging - not even due to the actual manipulation of image pixel data, but due to the proliferation of complex image container formats which require binary data manipulation and byte copying in performance-critical code. This is a minefield for secure programming practices because it puts at direct odds performance and sanity checking, as well as encouraging pointer and memory arithmetic and unsafe access.

abiox · on Nov 14, 2017

> Is there room for another (...)

seems to me that there is no limit to available room. well, i suppose we're capped by the collective capacity of local storage and storage service providers.

tedunangst · on Nov 14, 2017

ImageMagick is a particularly poor choice because it will try parsing a thousand formats your users will never upload. That's a lot of code to leave exposed to the internet.

linkmotif · on Nov 14, 2017

> Today, Media Proxy operates with a median per-image resize of 25ms and a median total response latency of 85ms. It resizes more than 150 million images every day. Media Proxy runs on an autoscaled GCE group of n1-standard-16 host type, peaking at 12 instances on a typical day.

Awesome! <3

throwthisawayt · on Nov 14, 2017

Did it seem to anyone else that sticking to Python would have been way easier? It didn’t seem like any of the performance gains were through Golang.

smaili · on Nov 14, 2017

I believe this little piece answers your question:

> We likely could have addressed this behavior in Image Proxy, but we had been experimenting with using more Go, and it seemed like a good place to try Go out.

At the heart of if, they were looking for opportunities to use more Go in their stack and they deemed this situation as a fit.

prophesi · on Nov 14, 2017

And they ended up open-sourcing the library they built, so it's a win on all sides.

_3u10 · on Nov 14, 2017

The age old solution in search of a problem.

Karrot_Kream · on Nov 14, 2017

I think that's a bit reductionist, no? There are many reasons they may have been searching for moving to Go. Off the top of my head I can think of:

1. Static typing increasing confidence and velocity

2. Better developer-facing tooling increasing velocity

3. More employees knowledgeable about Go than Python

4. More enthusiasm (and therefore faster velocity) around Go development.

The blog post was about the engineering challenges they faced and how they solved them and I think it was a great write-up in that regard. The post wasn't about why they switched this service from Python to Go.

_3u10 · on Nov 14, 2017

It might be, then again I see a lot of wheel reinvention in tech / NIH syndrome.

I'm the kind of hacker who if a service runs out of memory every 2 hours, writes a crontab to restart it every hour after X random minutes so they don't all restart at the same time. It gets a lot of eye rolls from the other engineers searching for perfection, but it tends to produce services quickly that are highly reliable.

And look now the engineers who like chaos monkey don't even have to set that up. It's built in.

It looks like most of the savings were in switching from pillow to opencv, something that thumbor already does. https://github.com/thumbor/opencv-engine

brightball · on Nov 14, 2017

Part of it is just Discord’s operating scale. They are already leveraging Elixir clustering to an extremely high rate of concurrency and when you start thinking about problems from that standpoint Go becomes a much more natural fit within the stack for low level micro services.

Karrot_Kream · on Nov 14, 2017

I agree that tech in general and Silicon Valley in particular has a lot of NIH, but I also think this isn't really the case here. In particular, we're discussing a Python service that performs slow image resize calls. They would have (probably, speculation on my part/experience) had to do 2 things:

1. Add profiling and telemetry to their Python code. Refactor the codebase based on insights from this.

2. Write a C<->Python interop for their image libraries.

I can't see the cost of #2 being any different than the cost they paid on writing it in Go. As for #1, depending on how the code is structured, a rewrite may have been less time than profiling spaghetti code. At that point, it depends on how much Go experience the team has.

deepnotderp · on Nov 14, 2017

Yeah, either a good Python JIT or Cython would have been fine honestly. I never understood the obsession with "python is slow" when you can recover almost all of the performance with a good JIT or Cython (in many/most cases).

Drdrdrq · on Nov 14, 2017

Yes. Or simply profiling the app and optimizing sore spots would have helped too. It seems to me there was no real reason to move from Python to Go, apart from preference.

victor106 · on Nov 14, 2017

What are some good Python JIT's that are worth trying out?

dr_zoidberg · on Nov 14, 2017

PyPy.

victor106 · on Nov 15, 2017

Thanks.

detaro · on Nov 14, 2017

I don't think the article gives us the data to know this. Where did the latency spikes in the original implementation come from? Would fixing them have required a complete rewrite of the Python parts anyways?

harikb · on Nov 14, 2017

I understand this is a personal preference, but having spent a good amount time with both Python and Go, FWIW I would also choose Go if I were solving the same problem.

stmw · on Nov 14, 2017

From reading this, seems HTTP handling speed was important to them? which Go is probably better for. Also, interfacing Python to C/C++ is pretty unpleasant.

jononor · on Nov 14, 2017

In Python they already had an extremely fast library with bindings available.

JepZ · on Nov 14, 2017

Anybody knows how well libvips https://github.com/DAddYE/vips compares to liliput performance wise?

b1naryth1ef · on Nov 14, 2017

vips (Go binding) is included in the benchmarks mentioned in the post, but at the time of running them (~10 months ago) vips pulled 51482954 ns/op on a 1024x1024 test image, where as pillow-simd managed 3324135.3035 ns/op.

CapacitorSet · on Nov 14, 2017

For ease of reading, that's respectively 51 ms and 3 ms.

JepZ · on Nov 14, 2017

Thanks :-)

Looks like I didn't scroll properly when I looked at that file. My bad :-/

manigandham · on Nov 14, 2017

Nice, but why? https://cloudinary.com, https://www.imgix.com, or https://www.filestack.com already exist and are well worth it for 99% of apps. Even at scale, it really doesn't cost that much to have someone else do it. You can use a thin proxy through your existing CDN if you want to save on their bandwidth fees.

Also http://thumbor.org and https://imageresizing.net if you want a library to host yourself which are already very fast and well tested. Put them in a docker container on a kubernetes cluster and it's all done in an hour.

zitterbewegung · on Nov 14, 2017

Maybe it’s because that they don’t want a dependency on a external service that could go down ?

StreamBright · on Nov 14, 2017

So you have an internal dependency that could go down?

Volt · on Nov 14, 2017

It's better when you can do something about it.

StreamBright · on Nov 14, 2017

It depends.

manigandham · on Nov 14, 2017

It's images... seems like a very low risk situation, especially when they are served from a CDN.

reificator · on Nov 14, 2017

As a user, the order of importance for Discord services is:

* Voice

* Text

* Previews (images, gifs, and videos)

Previews going down would be a pretty big deal for my communities based on the way we use the platform.

manigandham · on Nov 14, 2017

It’s just resizing, you still have the source images and can use those.

mercwear · on Nov 14, 2017

I agree. Offloading this type of work to a third party who does it really well is a smart move. Why manage additional code when it's not even core to what you do?

bpicolo · on Nov 14, 2017

In this case, it was perhaps cheaper for them to do in-house, and it's not rocket science? They wrote a bleeding edge library for it - sounds like they have the expertise just fine. Minimizing external dependencies can be a big deal if you have the developers to manage it.

Also, it is totally core to what they do. Images are a huge part of the Discord UX.

jhgg · on Nov 14, 2017

At 150m images per day, not counting bandwidth, imgix would cost ~135k/month. Running 12 n1-standard-16 instances (peak load according to the article) is ~$5k/month. It's not hard to see why we wrote it in house when you consider that cost.

manigandham · on Nov 14, 2017

Ok, so why a new library and associated dev time when thumbor and other libraries already exist, especially if you're willing to spend 5k/month on instances just for this?

bpicolo · on Nov 14, 2017

That was pretty clear in the post - they didn't find a Golang lib that could compete with their pillow-simd on resizing, which was the main performance bottleneck.

manigandham · on Nov 15, 2017

Why was a Go version needed if performance was paramount? There are libraries already that can handle this performance just fine.

If they're going to spend 60k/year on instances, the dev time definitely wasn't worth it for this. They just wanted to use that language because this is a NIH situation, not really an engineering priority.

brian-armstrong · on Nov 15, 2017

We specifically addressed this question in our post. Not only did we reduce from ~25 instances to ~12, we also added new features.

mercwear · on Nov 14, 2017

I'm not saying that images are not core to what they do (I use Discord a lot) but processing them is almost certainly not. Dev time is expensive enough already so spending time building and maintaining a library could end up being a waste.

mbakke · on Nov 15, 2017

This post reminded me of a very old article from Yahoo/Tumblr explaining how they were (ab)using Ceph to generate thumbnails on the fly as pictures were uploaded using the Ceph OSD plugin interface.

Unfortunately the post seems to have disappeared from the internet (it was probably around 6 years ago), so here are some other teasers:

https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-ob...

https://ceph.com/geen-categorie/dynamic-object-interfaces-wi...

Disclaimer: not affiliated with Ceph apart from being a happy sysadmin.

noahdesu · on Nov 15, 2017

Here is a link to a talk I gave last month describing how to use Lua to generate thumbnails remotely in the Ceph/RADOS OSD servers.

Talk is from Lua workshop 2017. Relevant content begins at 15m40s.

https://youtu.be/bGQc-PpJAyk?t=15m40s

kylehotchkiss · on Nov 14, 2017

I wish Cloudfront supported resize parameters so we wouldn't have to keep buildings these or paying a lot for Imgix.

abeach222 · on Nov 14, 2017

You can use lambda edge functions for this. They recently announced support for query string parameters.

https://aws.amazon.com/about-aws/whats-new/2017/10/lambda-at...

I have built an image resizing service around this with go and libvips. With go libvips, s3gof3r, you can load s3 images directly into a buffer, pass to libvips, and serve without writing to disk. Basically, you can use edge functions with your origin as the above go service.

_3u10 · on Nov 14, 2017

How much would you pay for an image resizing service? I'd been thinking for a while of putting a fleet of autoscaled thumbor boxes behind cloudfront and making a billing API for it.

kylehotchkiss · on Nov 14, 2017

Imgix's $10 minimum is so much for a personal site with maybe 500 uniques a month. If you're going for a service like that, think of people like me who host on s3/cloudfront for $.20/month. But let people scale up to millions of pageviews a month.

Don't need anything fancy. Just w=? h=? would be great, developers can handle the DPI stuff with sourceset tags.

manigandham · on Nov 14, 2017

Cloudinary is free. https://cloudinary.com/pricing

Const-me · on Nov 14, 2017

I wonder why people implement such things on CPU?

PCI express is ~100 gbit/sec, much faster than any network interface. Internally, a GPU can resize these images by an order of magnitude faster than that, see the fillrate columns in the GPU spec.

acdha · on Nov 14, 2017

This isn't just resampling an image: decoding a variety of image (and even video) formats, decompressing the selected frame, performing the actual resize, and then compressing the result. If the resample doesn't save more than the setup overhead, it'd be an immediate loss. Even if it does, there's an engineering cost since you now need to make sure that all of your servers have GPUs available, your chosen implementation code supports all of them with acceptable quality and error handling, etc.

Since the GPU hardware has become commonplace, there's definitely a lot more attention on using it in the server space and I think it'll become common in the next few years but that has a migration cost for early adopters since you're hitting less mature projects for critical functions. Internet-facing image processing has a bunch of tedious but important work handling format variations and errors (it'll be reported as a bug in your software if the image opens in a browser and/or photoshop), making sure that you handle gamma/colorspace consistently, etc.

If you're trying to get production-ready server out the door, it's really tempting not to deal with any of that once you hit the point where it's fast enough that engineering time costs more than the server savings.

Const-me · on Nov 15, 2017

> This isn't just resampling an image

GPUs can do that, too: http://fastcompression.com/products/jpeg/cuda-jpeg.htm

> you now need to make sure that all of your servers have GPUs available

OP is running on google’s cloud: “n1-standard-16 host type, peaking at 12 instances on a typical day.” That instance costs $0.76/hour. Adding NVIDIA Tesla K80 is $0.7 extra.

> it's really tempting not to deal with any of that

Yeah, that’s understandable. But the original article dealt with a lot of strange technologies to get the performance they want. And ended up doing much slower, performance wise, than what’s possible with a GPU.

acdha · on Nov 15, 2017

> > This isn't just resampling an image

> GPUs can do that, too: http://fastcompression.com/products/jpeg/cuda-jpeg.htm

Agreed - but for how many different formats, and how well do those implementations support all of the various format options for things like bit depth or palettes, compression variants, etc.? That's not just things like compliance testing – itself a big problem – but also handling all of the slightly non-compliant data in the wild which users will inevitably expect to work.

(I'm somewhat biased having spent time dealing with JPEG 2000 imagery where various lapses on the standards side meant that it's still common to find images which don't display correctly in one or more implementations but are silently reported as correct in others)

Again, I'm not arguing that doing this on a GPU isn't a good idea — the hardware has become common enough that it's reasonable to assume availability for anyone who cares — but just that there's significant overhead cost for anyone who needs to handle images from unconstrained sources. It'll happen but this kind of thing always takes longer than it seems like it should.

Const-me · on Nov 16, 2017

> significant overhead cost for anyone who needs to handle images from unconstrained sources

Flickr is doing just that, and they’ve been using GPUs for more than 2 years already:

http://code.flickr.net/2015/06/25/real-time-resizing-of-flic...

> It'll happen but this kind of thing always takes longer than it seems like it should.

I think the main reason for that is lazy software developers reluctant to learn new stuff.

brian-armstrong · on Nov 15, 2017

We did consider doing GPU, but it seems like you have fewer options there. We were really picky about the resize kernel used and it seems like with GPU you may not always get the same kernels available. Also presumably that only handles resizing, not compressing/decompressing, which make up a pretty sizeable portion of the workload.

Const-me · on Nov 15, 2017

> with GPU you may not always get the same kernels available

No kernels are available _out of the box_. You code a pixel shader, implement any kernel, or any other resizing method besides kernels: https://stackoverflow.com/a/42179924/126995

> that only handles resizing, not compressing/decompressing

In my previous comment there’s a link to a commercially available JPEG codec, 100% compliant with JPEG Baseline Standard, that does both compression and decompression.

brian-armstrong · on Nov 15, 2017

Yikes. If we had had to write our own image resizing kernel, this would have taken much longer. And ok, it can do JPEG but what about PNG, GIF, and WEBP?

Const-me · on Nov 15, 2017

> If we had had to write our own image resizing kernel, this would have taken much longer

I don’t disagree but this is very subjective.

You don’t need to invent anything, you only need to carefully implement a well known approach, e.g. this one: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch24.ht...

Also there’re third party libraries for that, e.g. here’s one from the same company who do JPEG codec: http://fastcompression.com/products/resizer/gpu-resizer.htm

> what about PNG, GIF, and WEBP?

As far as I understand, you goal was to cut server costs, right?

I assume the majority of pictures on the Internet are jpegs. If you have them processing on the GPU, this leaves you 16 virtual CPUs you’ve already paid for just sitting idle and waiting for the GPU to finish the job. No need to do everything on GPU.

P.S. Some other people already implemented what I’m telling you: http://code.flickr.net/2015/06/25/real-time-resizing-of-flic...

malikNF · on Nov 14, 2017

Most probably its because of the time it takes to push the image on the the GPU and then back to the CPU.

tuananh · on Nov 15, 2017

is there any open source project img proxy that can do this?

eg: instead of this

http://localhost:8080/https://octodex.github.com/images/code...

we can create alias like octo and url will become this

http://localhost:8080/octo/images/codercat.jpg

0xbear · on Nov 14, 2017

That’s 1700 images per second. Doable on one (beefy) box. 3 to account for the diurnal cycle. Am I supposed to be impressed?

brian-armstrong · on Nov 15, 2017

Can you link to which resize library you're using? We'd love to see a 90% further reduction in instances

mbrumlow · on Nov 15, 2017

Sorry to be confusing, I am not resizing images. Just working with data sets as large as what I image 150M images would be. The software I am working on takes point and time backups of computers and uploads them to "the cloud", I mean servers in a data center. There they can be virtualized with a click of a button in mass or one at a time, and near instantly.

This involves transfering, encrypting, compression and creating checksum of terabytes of data a hour (per node). While not exactly resizing images, I would image the computational power was on par with the service described. The entire system has about 4 PB or 8 PB in it right now, as backups are pruned (based on what people will pay for storage).

My software has a ton of space to grow and become better, but I think a better story would have been how discord handles 150M images a hour. If anything bandwidth acquiring the source image would be what I would consider the largest problem, not the CPU time to resize. In fact as long as your resize code slightly faster than the download then streaming it in and out would put your bottleneck entirely on bandwidth.

I will also note I am not a fan of libraries :p but that is not what this is about.

EDIT:

Also kudos to you, somebody criticized your post and you had the best response one could have. Inquiring minds are awesome.

rockostrich · on Nov 15, 2017

Assuming the average image size is 3 MB which seems conservative, especially if they're handling GIFs as well, this is 450 TB per day. If you're handling that much data on one beefy machine then kudos.

mbrumlow · on Nov 15, 2017

I don't get why you are being down voted. This is almost exactly what I thought. It's just not that much data given the state is computer hardware.

Where I work we have single nodes processing near that much data a hour -- these are beefy systems though.

0xbear · on Nov 15, 2017

People have just drunk so much “cheap commodity hardware” kool aid by now, they don’t realize there are cheaper and easier ways of doing things now, assuming you have devs who can code and tune for performance. Same with “big data”. Most people have sub-1T datasets. You simply don’t need Spark or anything custom for that.