How We Built r/Place

ChicagoBoy11 · on April 13, 2017

I love write-ups like this because they are such a nice contrast to the too-common comments on Reddit and HN where people claim that they could rebuild FB or Uber as a side project. Something as superficially trivial as the r/Place requires a tremendous amount of effort to run smoothly and there are countless gotchas and issues that you'd never even begin to consider unless you really tried to implement it yourself. Big thanks to Reddit for the fun experiment and for sharing with us some of the engineering lessons behind it.

wishinghand · on April 14, 2017

> the too-common comments on Reddit and HN where people claim that they could rebuild FB or Uber as a side project.

I'm pretty active on both. What threads/subreddits are you reading?

easytiger · on April 14, 2017

you should try r/unitedkingdom where all posters are pretty sure they could flawlessly redesign modern civilization

nathancahill · on April 14, 2017

http://www.theonion.com/article/there-are-no-good-options-sy...

travmatt · on April 14, 2017

To be fair that's kinda the history of the United Kingdom.

angusp · on April 14, 2017

You ain't seen nothing yet

- /r/Libertarian

Ajedi32 · on April 14, 2017

Here's one: https://www.reddit.com/r/programming/comments/656z8t/how_we_...

Pretty funny considering the content of the post he wrote that on.

josephg · on April 14, 2017

> the too-common comments on Reddit and HN where people claim that they could rebuild FB or Uber as a side project.

I do and don't agree with you. Whats really going on here is that development time scales linearly with the number of decisions you need to make. Decisions can take the form of a product questions - "what are we making?" and development questions - "how should we implement that?".

There are three reasons why people feel confident saying "I could copy that in a weekend":

- When looking at an existing thing (like r/place), most of the product decisions have already been made. You don't need to re-make them.

- If you have a lot of development experience in a given domain, you don't need to make as many decisions to come up with a good structure - your intuition (tuned by experience) will do the hard work for you.

- For most products, most of the actual work is in little features and polish thats 'below the water', and not at all obvious to an outsider. Check out this 'should we send a notification in slack' flowchart: https://twitter.com/mathowie/status/837807513982525440?lang=... . I'm sure Uber and Facebook have hundreds of little features like that. When people say "I could copy that in a weekend", they're usually only thinking about the core functionality. Not all the polish and testing you'd actually need to launch a real product.

With all that said, I bet I could implement r/place in less than a weekend with the requirements stated at the top of that blog post, so long as I don't need to think about mobile support and notifications. Thats possible not because I'm special, and not because I'm full of it but because most of the hard decisions have already been made. The product decisions are specified in the post (image size, performance requirements) and for the technical decisions I can rely on experience. (I'd do it on top of kafka. Use CQRS, then have caching nodes I could scale out and strong versions using the event stream numbers. Tie the versions into a rendered image URL and use nginx to cache ... Etc.)

But I couldn't implement r/place that quickly if reddit didn't already do all the work deciding on the scope of the problem, and what the end result should look like.

In a sense, I think programming is slow and unpredictable because of the learning we need to constantly do. Programming without needing to learn is just typing; and typing is fast. I rule out doing the mobile notifications not because I think it would be hard, but because I haven't done it before. So I'd need to learn the right way to implement it. - And hence, thats where the scope might blowout for me. Thats where I would have to make decisions, and deciding correctly takes time.

ChicagoBoy11 · on April 14, 2017

First off, it's a bit unfair to me that you were the one to make this comment, since your very extensive experience and deep knowledge of real-time systems makes you uniquely qualified to disprove my point!! hehe

But in all seriousness, I agree with most of what you said - I think I'm just more bearish on people's ability to infer those many decision points without being given the blueprint like we were in this article.

If you are a senior engineer at FB and you decide to make Twitter in your spare time, I buy that lots of experience and knowledge gained at your job can probably get you going fairly quickly. But I have never seen an example of engineers discussing sophisticated systems like these where crucial aspects of its success in terms of implementation didn't rely on some very specific knowledge/study of the particular problem being solved that could only be gleaned after trying things out or studying it very carefully. The representation of the pixels is a great example in this case -- they go into wonderful detail about why they decided to represent it the way they did, which in turn informs and impacts how the rest of the stack looks like.

I think at one point Firebase had it as one of their example apps something which very closely mirrored what they did with r/Place, so I agree that one could probably build some "roughly" like it somewhat quickly. I agree that in general knowledgeable individuals could probably grok things which would in some form resemble popular services we know today. The devil is in the "roughly," though. I think that often what makes them be THE giant services we know are things that let them have the scale which very few of us have ever needed or know how to deal with, or because they have combined tons of "below the water" features and polish like you mentioned. When really basically most web apps that we use are CRUD apps which we all know how to build, I think maybe we need to give more weight to these "below the water" features in terms of how much they actually contribute to the success of the applications we use.

tripzilch · on April 16, 2017

> The representation of the pixels is a great example in this case

That's funny because I thought (assuming you're referring to the bit-packing of the pixels) that seemed to me one of the more obvious choices (to do something else would have been more remarkable to me).

Beyond that though, I have zero experience with real-time networked systems. Especially with a gazillion users and that everybody gets to see the same image, that seems hard.

The cleverest solution that I read in the article, that I really liked but probably would never have thought of myself (kinda wish I would, though) was the part where the app starts receiving and buffering tile-changes right away, and only then requests the full image state, which will always be somewhat stale because of its size, but it has a timestamp so they can use the buffered tile-changes to quickly bring its state up to time=now. Maybe this is a common technique in real-time networked systems, I don't know, but it was cool to learn about it.

ianamartin · on April 14, 2017

Thank you for saying this. I was going to say the same thing. The difference between facebook and my side project is not the core functionality. Anyone competent can build something that seems to look and act like facebook in a weekend or two in the general sense of "look and feel." Which is also perhaps a dangerous way to go, lest you get sued.

It's getting the nitty-gritty details hammered out and then implemented that takes time and people. I'm currently trying to do something similar but directed in a very different way. Facebook feels like a CRUD app. But there's a lot of different moving parts that you have to deal with on top of the CRUD pieces.

And every single one of those things--while not necessarily difficult to implement individually--add up to tons of decision time.

When the expected outcome is already decided, the problem becomes much much easier to solve for a developer.

Assuming that you're dealing with some kind of a reasonable web stack, implementing any individual feature is often not that big a deal. Deciding to do it, and also perhaps making the choice to have a reasonable web stack to begin with, are potentially problematic.

And of course, building something that can massively scale is hard as well.

But from my experience with side projects and also my day job, the hard part is making decisions.

firebones · on April 14, 2017

I wish I had enough spare cash to take you up on a bet that you could implement this in less than a weekend. Maybe HN can crowdfund paying you for two days at $1000/day to replicate, versus your time and a $3000 rebate if you can't.

josephg · on April 14, 2017

My loved ones are out of town, so you're on! I'll do it for the fun of it and the experience. And because I still haven't built anything on top of kafka, but there's no time like now to fix that.

The challenge is: Meet the main requirements of r/place: 1000x1000 image, a web based editor, 333+ edits / second, with an architecture that can scale to 100k simultaneous users (although that part will be hard to actually test). I won't implement mobile support or notifications, and I won't implement any sort of user access control (thats out of scope). The challenge is to do it & get it hosted online before I go to bed on Saturday (so, I'm allowing myself some slop there). But its already 1:30pm on Friday, so I think that easily qualifies as "less than a weekend".

As a stretch I'm going to write all the actual code in nodejs while aiming for well into the thousands / tens of thousands of writes per second territory.

I'm willing to accept some fun stakes if anyone wants to propose them (ice bucket challenge level stuff). I'll live-tweet progress here - https://twitter.com/josephgentle

(Edit: fixed days)

josephg · on April 14, 2017

12 hours in and the site is up: https://josephg.com/sp/

Polish and perf tuning tomorrow.

Code: https://github.com/josephg/sephsplace (warning: contains evil)

josephg · on April 15, 2017

Complete and working. I haven't tested the load for real, but judging by the performance numbers it should be able to handle 100k connections fine with 2 beefy backend servers. (40% of one of my laptop's cores handles 15k connections with 400 edits per second. So about 8 cores should handle 100k users no sweat.) The bottleneck is fanout, so until kafka starts hitting subscribe limits it should scale linearly with frontend servers. (And kafka is a long way from hitting that limit because I've added write coalescing.)

I'll post a writeup about it all once I've eaten some real food and had a rest. But for now, enjoy!

socmag · on April 15, 2017

I like what you are doing a lot. It doesn't matter that it isn't a one to one comparison, the point is you are taking on the challenge and enjoying it. Probably actually you are learning some stuff along the way as well.

It would be kind of neat if the place experiment could become something of a micro-benchmark for online customer facing distributed platforms.

Keep up the good work man, it's looking good.

Don't forget to hydrate and consume pizza.

renlo · on April 14, 2017

Very cool man. For the real-time updates you keep the browser-client requests open, right? Any reason you did it that way?

josephg · on April 14, 2017

Well, the server needs to message the client with updates somehow. Either the client keeps a connection open or it has to periodically poll the client. Keeping a connection open is faster and it uses less system resources.

always_good · on April 14, 2017

12 hour amphetamine binge can produce some good work!

developer2 · on April 14, 2017

>> I won't implement mobile support

Mobile support was a huge factor for /r/place. To reimplement the project while completely ignoring mobile is basically taking out maybe 30% of the effort required. It's a bit of a copout to claim "can implement in a weekend", while discounting one of the main reasons why it is nearly impossible to hammer out solo in a single weekend.

The other "big ticket item" is integrating such a system into an existing live stack. It's very convenient that you get to pick the tools you think are most efficient for the job from scratch; it's quite another to restrict yourself to an existing set of tech.

>> an architecture that can scale to 100k simultaneous users (although that part will be hard to actually test)

Also this. Reddit had a single opportunity to deploy live. You can't throw out many days of required dev/staging benchmarking for a system that must work on first deploy. You're not "done" after a weekend unless your first deploy both works 100% and scales as required.

tldr; Were I to do what you are doing, it would be out of interest and to put it live for others to use. What seems disingenuous is to do so with the primary goal of proving that "see, child's play - it's not difficult at all!". The goal should be to build something that works, not to rush the job as some way to show off and take something away from Reddit's efforts.

josephg · on April 14, 2017

Absolutely! And I'm not going to claim otherwise. I said above I think I can only implement it in a weekend if I can work to my technical strengths. Doing otherwise would mean learning, which is super slow compared to typing. Just implementing to a closed spec cuts out maybe half of the work of something like this, because you don't need to figure out the requirements or waste time going down blind alleys.

From my post above:

> But I couldn't implement r/place that quickly if reddit didn't already do all the work deciding on the scope of the problem, and what the end result should look like.

developer2 · on April 14, 2017

Upvoted you. I'm a very cynical person, and thus I focused on the "weekend" aspect as being more of an attempt to refute the claim that Reddit had to put in quite a bit of effort to accomplish what they did, rather than you simply limiting how much time you're willing to sink into it.

If anything, this only increases my motivation to replicate the project myself, whether it's during a weekend or two full weeks. It's interesting enough and at the right level of complexity - kind of simple, but not too simple - to make it a fun side project.

didibus · on April 14, 2017

Yes!! Love me a good show. I'll be following your tweets.

SomeStupidPoint · on April 14, 2017

Wait... Where are you that it's Saturday already?

I thought I knew timezones... But that's 40+ hours ahead of me.

josephg · on April 14, 2017

Errr... oh, that thing is happening where its a public holiday so I'm a bit unmoored from reality. I'll get it done by Saturday night, not Sunday night.

employee8000 · on April 14, 2017

Unless you stick with a single Kafka partition (which might not withstand the load) I don't think you can consume the events in exact chronological order.

josephg · on April 14, 2017

A single kafka partition should be able to handle the load - it only needs to write 333 ops/second, and I could coalesce several edits together over half second windows if I wanted to make that more efficient. And I'll only have as many kafka consumers as I have frontend servers. So we're talking about 333 ops/second and like, 5 consumers of those operations. Somehow I think a single kafka partition should be able to manage the load :)

But there's no reason I couldn't use multiple partitions if I had to. To make it fully consistent I'd just need a consistent way to resolve conflicting concurrent writes on different partitions. Either adding a server timestamp to the messages, or using the partition number to pick a winner would work fine for this. It would just make the server logic a little more complicated.

xapata · on April 14, 2017

You can simulate the users for load testing. Have them random walk from an arbitrary start point, changing a pixel every 10-20 minutes.

josephg · on April 14, 2017

Simulating hundreds to thousands of writes per second via post requests will be easy. Simulating 100k simultaneous readers of an eventstream will be much harder. I don't know of any tools which can load test that many readers. Do you have any ideas?

scrollaway · on April 14, 2017

locust can spawn arbitrary scripts; you'd have to write your own eventstream reader, and simulating 100k of them will take a few machines concurrently load-testing, but i think this should be the easiest way to do it.

CryoLogic · on April 14, 2017

The core functionality that Facebook relies on exists in most open source web forums. A good developer could have easily built it on top of PHPBB back in the day - and probably had a more maintainable codebase.

Yes - today's Facebook could not be built by a dev in a weekend, but Facebook 1.0? Not so hard.

perfmode · on April 14, 2017

The original Facebook was a one man operation

bhargav · on April 14, 2017

Regarding Uber braggers (pun intended): Who cares! If they build it they will not come.

conradfr · on April 14, 2017

With a polished front-end maybe not in a week-end but I was thinking that it was a good use case for a fun Elixir/Phoenix exercise anyway.

soup10 · on April 14, 2017

To me it seems a little over-engineered but it held up so props to them.

joezydeco · on April 14, 2017

I have this hunch that Reddit engineers know what they're up against when they launch something like this. It can get out of control extremely fast when something catches on.

nojvek · on April 14, 2017

Why do you think it's over engineered. The scaling issues made sense. Cdn was a great move to limit throughput of requests. Redis made sense, javascript canvas and typedarrays made sense. How else would you have done it?

soup10 · on April 14, 2017

For the core application of writing pixels to the board, I would of used a more efficient monolithic design with less points of failure. One big server could handle 333 pixels/s(their expected volume). Not sure that using cassandra and redis is really necessary or makes things easier.

The secondary goal of pushing pixel updates to lots of users would probably want separate hardware. I would probably fake it so its not really realtime. If you send the web client batches of updates to replay on the board it looks realtime and doesn't require persistent connections.

dmihal · on April 14, 2017

Reddit is like the 7th most popular website, features like this need to be able to handle significant load

dstroot · on April 14, 2017

I came to write exactly what you wrote. Here's an upvote brother from another mother.

seankimdesign · on April 13, 2017

What a fantastic writeup. I had some vague ideas regarding the challenges involved to build an application of such scale, but the article really makes it clear for everyone the amount of decision points encountered as well as why certain solutions were selected.

I also like the way the article is broken down into the backend, API, frontend and mobile. This isolated approach really highlights the different struggles each aspects of the product has, while dealing with what is essentially a shared concern: performance.

What I also found interesting is the fact that they were able to come up with a pretty accurate guess in terms of the expected traffic.

> "We experienced a maximum tile placement rate of almost 200/s. This was below our calculated maximum rate of 333/s (average of 100,000 users placing a tile every 5 minutes)."

Their guess ended up being a good amount above the actual maximum usage, but it was probably padded against the worst case scenario. The company that I work for consistently fails to come up with accurate guesses even with our very rigid user base, so it's pretty impressive that Reddit could accommodate the unpredictable user base that is the entire Reddit community.

eblanshey · on April 13, 2017

Agreed, great write-up. Anyone have other recommended links to write-ups about specific challenges and how to make them scale?

petercooper · on April 13, 2017

Here are some recent ones that have been popular:

http://highscalability.com/blog/2016/4/20/how-twitter-handle...

https://redditblog.com/2017/1/17/caching-at-reddit/

https://blog.twitter.com/2017/the-infrastructure-behind-twit...

https://www.netlify.com/blog/2017/03/16/smashing-magazine-ju...

https://nickcraver.com/blog/2016/02/03/stack-overflow-a-tech...

https://engineering.linkedin.com/blog/2016/10/instant-messag...

http://techblog.netflix.com/2016/08/building-fastcom.html

http://highscalability.com/blog/2016/9/28/how-uber-manages-a...

Or a recent writeup on a challenge building IMDB's forums in 2001(!) https://www.beatworm.co.uk/blog/internet/imdb-boards-no-more .. fun and scary in equal measures.

(I'm plucking these from things were popular recently in our http://webopsweekly.com/ :-))

josephg · on April 13, 2017

http://highscalability.com/ frequently features content like this. The archives are a treasure trove of practical software architecture wisdom.

ra7 · on April 13, 2017

I really like the "Big Data in Real-Time at Twitter" slides at https://www.slideshare.net/nkallen/q-con-3770885. They explain their original implementation and why it didn't scale, possible new implementations and their current one (it's a bit old though). It's clear and easy to understand.

samfisher83 · on April 13, 2017

That writeup may have taken more time than building it out. It seems quite thorough.

Edit: why am I getting down voted for praising a write-up?

tabeth · on April 13, 2017

Because despite being thorough, there's no way this write-up could be completed slower than building the actual thing they're describing.

specialist · on April 17, 2017

Belated response:

I've done a fair amount of tech writing.

Documentation always takes me as long or longer than designing, coding.

Most recently, when I released a novel layout manager, the docs, examples, screenshots, etc took roughly twice as much effort as everything else combined.

virtuabhi · on April 14, 2017

People on HN can be quite uptight. One response takes your comment literally and other downvoters do not approve of your cheeky comment.

eatitraw · on April 13, 2017

> We used our websocket service to publish updates to all the clients.

I used /r/place from a few different browsers with a few different accounts, and they all seemed to have slightly different view of the same pixels. Was I the only one who experienced this problem?

When /r/place experiment was still going, I assumed that they grouped updates in some sort of batches, but now it seems like they intended all users to receive all updates more or less immediately.

d23 · on April 13, 2017

Yeah, we went into it a bit in the "What We Learned" section, but that was most likely during the time we were having issues with RabbitMQ. I believe it was mostly fixed later on, but either way, we found a new pain point in our system we can now work on.

atombender · on April 13, 2017

Surprised you're using RabbitMQ. It's one of those things which work great until they don't (clustering is particularly bad), and then you have almost zero insight into the issue, and have to resort to the Pivotal mailing list.

Have you looked at NATS at all? We're using it as a message bus for one app and it's been fantastic. It is, however, an in-memory queue, and the current version cannot replace Rabbit for queues that require durability.

fernandotakai · on April 13, 2017

i've been using rabbitmq heavily (as in, the whole infrastructure is based on two rabbitmq servers) for a long time and i've never seen it fail.

tbh, i never used clustering (because it's one of the shittiest clustering implementations i've ever seen) but we do use two servers (publishers connect to one randomly and consumers connect to both) and it seems to handle millions of messages without any issues.

of all servers i've ever used, rabbitmq is by far the most stable (together with ejabberd).

atombender · on April 13, 2017

RabbitMQ is decent if you don't use clustering (which, I agree, is shitty). I have some quibbles with the non-clustered parts, but nothing big.

Right now, the main annoyance is that it's impossible, as far as I understand, to limit its memory usage. You can set a "VM high watermark" and some other things, but beyond that, it will — much like, say, Elasticsearch — use a large amount of mysterious memory that you have no control over. You can't just say "use 1GB and nothing more", which is problematic on Kubernetes where you want to pack things a bit tightly. This happens even if all the queues are marked as durable.

fernandotakai · on April 13, 2017

yeah we have dedicated machines to rabbitmq because it's basically memory hungry. but i like it that way because it's only going to crash if the machine crashes.

d23 · on April 14, 2017

Hmm, I'll take a look at that. For websockets, non-durability seems like a fine tradeoff, so it sounds interesting. Thanks!

atombender · on April 14, 2017

Note that NATS is currently pub/sub, which is a "if a tree falls in the forest" situation. Messages don't go anywhere if nobody is subscribing.

So it's awesome for realtime firehose-type use cases where a websocket client connects, receives messages (every client gets all the messages, although NATS also supports load-balanced fanout) for a while, then eventually disconnects.

NATS is ridiculously fast [1], too.

There's an add-on currently in beta, NATS Streaming [1], which [2] has durability, acking/redelivery and replay, so covers most of what you get from both RabbitMQ and Kafka. It looks very promising.

[1] http://bravenewgeek.com/tag/nats/

[2] https://nats.io/documentation/streaming/nats-streaming-intro...

manigandham · on April 15, 2017

NATS is only a pub/sub system. NATS Streaming uses an embedded NATS server while building queuing and persistence on top. It works well.

atombender · on April 15, 2017

https://news.ycombinator.com/item?id=14111856

manigandham · on April 15, 2017

RabbitMQ is absolute crap. Surprised anyone uses it in production.

If you already have a good Redis infrastructure then you can just use the pub/sub features built into it for your websockets communication.

eatitraw · on April 13, 2017

Ah okay then!

BTW, thanks for the great post, it was a very interesting read.

not_kurt_godel · on April 14, 2017

> RabbitMQ

You're already on AWS, why not use SQS instead?

programbreeding · on April 13, 2017

I experienced this as well. I have a different account logged in on mobile than what is logged in on my desktop. I wouldn't say things were drastically different, but when there was a location with a ton of activity (like OSU or the American flag towards the end), I saw different views between them.

Cthulhu_ · on April 14, 2017

I had the same thing, my client running out of sync (it seemed to be way behind) the "truth"; had to do a refresh to get it back up.

writeslowly · on April 13, 2017

I thought it was interesting that one of their requirements was to provide an API that was easy to use for both bots and visualization tools. I remember reading some speculation when this was running that r/place was intentionally easy to interface with bots, while there were also complaints that the whole thing had been taken over by bots near the end.

gramstrong · on April 13, 2017

Without bots, I doubt that /r/place would have been very interesting. It's a nice thought that a million random strangers can be cohesive without automation, but for some reason I don't find that to be particularly realistic..

saulrh · on April 13, 2017

As a concrete example, as far as I can tell the entire Puella Magi Madoka Magica section, starting from Homura Did Nothing Wrong next to Darth Plagueis The Wise, was hand-crafted and hand-maintained. On their discord they were actively discouraging community members that wanted to use bots.

kuschku · on April 13, 2017

Oh it is very easy. You just need subreddits with lots of very loyal people who even do frequent meetups, and lots of those subreddits.

And soon you get exactly what /r/place was.

jobigoud · on April 13, 2017

I remember taking part in a big drawing canvas exactly like this about twelve years ago between several Art/Photoshop communities, Worth1000.com, SomethingAwful, Fark.com etc. There wasn't bots at the time but it was still very socially interesting.

ralfd · on April 16, 2017

But reddit itself is a tool to bring cohesion out of a million random strangers.

As others have said subcommunities quickly formed or some subreddits themselves had a orga-thread to paint an iconic logo relevant to their niche.

ag_47 · on April 13, 2017

Now I'm curious, Are there any websites that do something similar to /r/place? (hackathon idea?)

Also, reminds be of the million dollar front page [1].

[1] https://en.wikipedia.org/wiki/The_Million_Dollar_Homepage

DanHulton · on April 13, 2017

A long time ago, I built http://www.ipaidthemost.com/, which is kinda related, at least to TMDH anyhow. Far, far less collaborative than /r/place, but similar in terms of staking out ownership.

jacquesm · on April 13, 2017

Hilarious. That's got to be the most direct monetization strategy since tmdhp.

Liron · on April 14, 2017

Similar to https://flashcash.money

joenot443 · on April 13, 2017

I love it. Wish this had been my idea.

Twirrim · on April 13, 2017

Does that actually make money?

zepolen · on April 13, 2017

Looks like it's made $23.25 so far @ 1.50$ for the message.

If it reached 5$, he'd have made 25,000$

loeber · on April 14, 2017

Your math is off -- if it reached $5, he'd have made $250. I think you were counting in cents. To convince yourself:

>>> total = 0

>>> for i in range(5, 500, 5):

... total += i

>>> total

24750

zepolen · on April 16, 2017

Hey you're right, my bad.

Twirrim · on April 17, 2017

But there's all the hosting costs to consider.

bsimpson63 · on April 13, 2017

http://www.drawball.com/

theklub · on April 13, 2017

Actually this idea has been around for years, and sadly isn't new at all. I just checked and there is one that's been around since at least 2006, http://da-archive.com/index.php?showtopic=42405

I remember lueshi

ag_47 · on April 13, 2017

Right, true. Collaborative drawing was basically the "hello world" of real-time platforms back in the day (Firebase, Parse etc). But I think most of those were ephemeral canvases.

AldousHaxley · on April 13, 2017

Yeah, collaborative drawing is kind of old hat, but when you can use the context of the modern, social web to provide some new modes of interaction around it, it can be interesting again. Same applies to more mundane things like text.

Karawebnetwork · on April 13, 2017

/r/place spawned this clone: http://placepx.com

AldousHaxley · on April 13, 2017

Not quite the same, but my startup, Formgraph[0], also does public, real-time collaborative drawing. I even did a similar write-up about the stack behind it the other day[1]. Looks like they're relying on Cassandra and Redis. I went with RethinkDB. Should probably do a write-up about the front end in the future.

[0] https://www.formgraph.com

[1] https://medium.com/@JohnWatson/real-time-collaboration-with-...

criley2 · on April 13, 2017

Some redditors have created /r/place derivatives already. I'm not aware of one prior to /r/place but it seems impossible that it hasn't been done before

splintercell · on April 13, 2017

http://8192px.co/

caspervonb · on April 13, 2017

<shameless-plug> Also available on GitHub (https://github.com/8192px/8192px) ;-] </shameless-plug>

Ajedi32 · on April 13, 2017

There were definitely a few that popped up after /r/place closed down. (pxls.space being the most popular) They were nowhere near as successful as /r/place though.

est · on April 14, 2017

4chan's moot's previous startup was something called canvas.

wingerlang · on April 14, 2017

That was not realtime nor collaborative on a single pane. It was a "remix" platform where one user created the initial drawing, then people used it as a base and drew around/over it to change it - in a new picture.

It was quite interesting for a little while.

wyager · on April 13, 2017

Why use Redis and multiple machines instead of keeping it in RAM on a single machine? I'm not claiming the Reddit people did anything wrong; they have a lot more experience than me here obviously. I'm just trying to figure out why they couldn't do something simpler. 333 updates/sec to a 500kB packed array, coupled with cooldown logic, should have a negligible performance cost and can easily be done on a single thread. That thread could interact via atomic channels with the CDN (make a copy of the array 33 times a second and send it away, no problem) and websockets (send updates out via a broadcast channel, other cores re-broadcast to the 100K websockets). Again, I'm not saying this is actually a better idea, this is just what I would do naively and I'm curious where it would fall apart.

bsimpson63 · on April 13, 2017

> keeping it in RAM on a single machine

> 500kB packed array, coupled with cooldown logic, should have a negligible performance cost and can easily be done on a single thread. That thread could interact via atomic channels with the CDN

That's not simpler.

We used tools that we're already using heavily in production and are comfortable with.

wyager · on April 13, 2017

> That's not simpler.

With respect to your experience in the matter, I strongly disagree. What I described is complicated to say, easy to implement. What the OP describes ("use redis") is easy to say, complicated to implement. Not just in terms of human work time (setting up the redis machine and instance, connecting everything together), but also in terms of number of moving parts (more machines, more programs, etc.).

> We used tools that we're already using heavily in production and are comfortable with.

That's entirely fair, and what I figured was the most likely explanation.

eric_h · on April 13, 2017

> Why use Redis and multiple machines instead of keeping it in RAM on a single machine?

Because machines go down. If you don't expect your hardware to fail at the most inopportune time, you'll be screwed when (not if) it happens.

andrewvc · on April 13, 2017

I agree that machines go down, but there are sane (and safe!) ways to build this sort of thing without adding in cassandra and Redis. Additionally, the max placement rate of 333/s is reaaaaally slow! Maybe that's due to the websocket frontends, not the DB, but, that doesn't mean that's the most obvious way to build it.

The crux of the problem is that they need to mutate a relatively tiny amount of memory and have a rolling log of events for which only the last 5 minutes needs fast access. Also, if you can put all your state on one machine its far less likely that the one machine will die, than it is that at least one will die in a cluster of machines. Given the nature of the problem keeping all state on one machine seems pretty rational to me, so long as you have the ability to switch to a hot spare within a few minutes or so.

If I were to architect this for speed I would have two tiers: a websocket tier, and secondly a 'database' tier. The database would be a custom program that would:

0. Provide a simple Websocket API that would receive a write request and return either success if the user's write timer allowed it to write or failure if it didn't. This would also broadcast the state + deltas. 1. Keep the image in memory as a bitmap 2. Use rocksdb for tracking last user writes to enforce the 5m constraint. You could use an in memory map, but the nice thing about rocksdb is that it shouldn't blow up your heap. 3. Periodically flush the bitmap out to disk to timestamped files for snapshots 4. Keep the hashmap size small by evicting any keys past their time limit 5. Write rotating log files rotated every 5m or so to record the history of events for DR and also later analysis

Backing this sort of thing up is very simple. You just replicate the files using rsync or something like it. You may have some corruption on files that are partially written, but since we're opening and closing new files often you can choose how much data-loss you want to tolerate.

Restoration is as simple as re-reading the bitmap and reading the log files in reverse up to 5 minutes ago to see who still isn't allowed to write yet (thus reconstituting the hashmap). Let's remember, redis replication is async, so this has the same tradeoffs.

gooeyblob · on April 13, 2017

> The database would be a custom program that would:

Creating a "custom database program" is not a small task.

We like to use boring technologies that we know work well. We were already using Cassandra, had some experience with Redis, and had a lot of confidence in our CDN.

andrewvc · on April 14, 2017

Well, in your article you mentioned that you tried to use Cassandra for one task and had to jettison it because of unexpected performance problems. You had to re-approach the problem with a whole other DB. I would say contradicts the point you're making.

I'm not arguing that most problems need a custom database, only a minority do. I'd say that this problem is borderline on which direction to go.

Databases are very leaky abstractions as you all discovered. The nice thing about custom code is that you don't have leaky abstractions. The bad thing about custom code is that you have a large new untested surface area.

In the case of your application the requirements are so minimal, a bitfield plus a log, I'd say its a wash.

Programmers today forget that things like flat files exist and are useful. It's a shame, because you wind up with situations where people just assume they need a giant distributed datastore for everything.

What you're doing in that case is trading architectural complexity for code complexity. Now, if its the case that all data in your org goes in one data store to keep things consistent, great, that makes sense. But for a one off app I just don't buy it.

Ono-Sendai · on April 13, 2017

Updating a bitmap 333 times a second is trivial with one core. Handling 100K websocket connections is the tricky bit I think!

repsilat · on April 14, 2017

And there's only half a meg of data! Serving the readers is a much more interesting problem than managing the writes, and tbh I'd just keep a rolling "pixels that have changed in the last 10 seconds" diff going, pushed out every 100ms, compressed and cached, and have clients poll for it. Easy peasy, websockets just complicate life.

vmasto · on April 13, 2017

> Users can place one tile every 5 minutes, so we must support an average update rate of 100,000 tiles per 5 minutes (333 updates/s).

It only takes a couple of outliers to bring everything down. I'm not exactly well-versed in defining specs for large scale backend apps (not a back-end engineer) but it seems to me that preparing for the average would not be a wise decision?

For example, designing with an average of a million requests per day in mind would probably fail, since you get most of that traffic during daytime and far more less at the nightly hours.

Could anyone more experienced shed some light?

tyrust · on April 13, 2017

They might have phrased this poorly.

>We should support at least 100,000 simultaneous users.

This line makes me think that this is what they expected the peak (or near peak) to be.

>Users can place one tile every 5 minutes, so we must support an average update rate of 100,000 tiles per 5 minutes (333 updates/s).

So assuming that they mean that 100k is the peak and that clients are limited to 1 update per 5 minutes, they can expect 333 updates per second on average. The "average" is taken over this 5 minute period. This average represents the number of queries per second they will get if everyone's 5 minute cooldown is spread out evenly over each 5 minute period.

It is possible, for example, for half of the peak population's cooldown to expire at 1300 and the other half to expire at 1305. In this case the average updates/s over the 5 minute period from 1300-1305 would still be 333 updates/s even though there were really 2 bursts of 50k a second at 1300 and 1305. It's far more likely that cooldowns are not excessively stacked in this way, so you prepare for the average and hope for the best.

foota · on April 13, 2017

Prepare for the average and hope for the best seems like a great way for things to fail.

tyrust · on April 13, 2017

They prepared for the worst (peak 100k users). The "hope" that those 100k would be spread out was based on the statistical likelihood that these 100k wouldn't line up too much over a 5 minute period.

I didn't follow /r/place that much, but I haven't read any complaints about latency or failures so it looks like they did just fine.

foota · on April 13, 2017

They say elsewhere that they were prepared for 100k updates all at once. Which is the worst.

Edit: you're an sre, you probably have more experience planning these things than I ever will. I just can't help but think about what would happen if some trolls realized they could synchronize their updates and bring down the service. (Although based on the infrastructure it doesn't seem possible to cause lasting damage.)

tyrust · on April 13, 2017

Yeah, I understand what you're saying. It is conceivable that a bunch of people would coordinate their updates. To prevent new-comers from spoiling the fun, they only allowed users with accounts created before /r/place was launched to send updates.

tummybug · on April 13, 2017

I agree, but if you do fail it means you are doing something right. In most cases I think YAGNI suffices for how much engineering to do.

bsimpson63 · on April 13, 2017

This was just a back of the napkin estimation. I think at one point we calculated that we'd be able to support one tile placement per user per second.

sangnoir · on April 13, 2017

> It only takes a couple of outliers to bring everything down. I'm not exactly well-versed in defining specs for large scale backend apps (not a back-end engineer) but it seems to me that preparing for the average would not be a wise decision?

It's not an average: Reddit controlled the 5 minute user cool-down period, a.k.a. request throttling. 333 updates/s was the capacity. As alluded to in the article, the 5 minutes was dynamically configurable: if more than 100,000 had users showed up, they would have increased the cool-down period to a value that would yield 333 updates per second at most.

raziel2p · on April 13, 2017

With 100K users and a minimum cooldown time of 5 minutes, there's 100K tiles per 5 minutes would be an upper limit. Only bots would hit that 5 minute cooldown every single time, and bots make up a very small minority of the users. They even point out later in the article that they only peaked at 200 updates per second.

> For example, designing with an average of a million requests per day in mind would probably fail, since you get most of that traffic during daytime and far more less at the nightly hours.

That depends. If you're talking about a public website, then yes. But this is just a single API endpoint with a very fixed time-based rate limit.

avip · on April 13, 2017

I can shed that 333 [something/s], for any uncomplicated something, is so little that a single-core 10YO laptop running a single-process non-blocking webserver (a-la node) could probably handle it, and likely x10 it.

tummybug · on April 13, 2017

Sure in a perfect scenario with a well behaving load test client. In production you often encounter scenarios which place unexpected load on your servers.

eric_h · on April 13, 2017

Ah yes, the self-inflicted unintentional DDoS attack that doesn't appear when you do a semi-idealized load test. I may or may not have been responsible for one of those once or twice in my life.

andoon · on April 13, 2017

The entire reddit website goes down every night, especially during weekends, sport matches, etc, so there you have your answer.

celticninja · on April 13, 2017

Is that hyperbole or are you really experiencing that much downtime of Reddit? I see it occasionally but it's never down for long, the odd "servers are busy" message usually disappears after a single refresh.

ClassyJacket · on April 13, 2017

The search is consistently broken, but the rest of the site seems to have good uptime now, much better than it once was.

celticninja · on April 13, 2017

Search has never worked. I always use Google site: modifier and have never had an issue.

Macha · on April 13, 2017

Search worked pretty reliably up until 7-8 months ago for me

andoon · on April 13, 2017

Pages take a long time to generate all day, but during peak hours they take a minimum of 4 seconds each (depends on what page you're loading, if it's got lots of comments, etc), and many times they simply timeout. The engineers at reddit have been unable thus far to fix it.

rhizome · on April 13, 2017

This isn't my experience.

freehunter · on April 13, 2017

I'm not the same guy you were just talking to but in my experience on mobile Safari, I get "something went wrong, visit the homepage" when I navigate around reddit far too often. What's funny is it actually tends to resolve itself if I wait a second or two, but yeah, reddit engineering and infrastructure is not well equipped to handle the amount of traffic they receive. It's gotten better, but it's still not what you'd expect from the internet's front page.

noir_lord · on April 13, 2017

The main website is OK as is i.reddit.com but their mobile client is hilariously bad in a "I can't believe they thought this was ready" way, it takes an age to load, it stalls out all the time, the tap targets are too small, it has the classic "oh you hit this when you meant that and then clicked that, lets spin the wheel on where you really end up" problem that slow mobile apps have.

I know they deal with insane scale yadayada but it's simply not ready.

That and the horrific dark pattern on the "We want you to have the best, massive red/orange button marked continue that takes you to the app store and the tiny weeny little "continue to mobile site" underneath".

I don't want your damn app, stop asking me.

If I was cynical I'd think they didn't care about the mobile site been awful as it drives people to the app.

francoisfeugeas · on April 13, 2017

Last time I tried it, their app was even worse than the mobile website though.

tummybug · on April 13, 2017

I can't agree with you. The sheer fact that I know well what the reddit error page is refutes this. In fact it's one of the few sites I frequent that I know even have an error page.

rhizome · on April 13, 2017

Maybe it's regional.

andoon · on April 13, 2017

It probably is regional, yes, because I simply can't believe every time I write a comment like this here or somewhere else I'm told reddit works fine. It gets on my nerves every single time I visit at night. (Western Europe)

9935c101ab17a66 · on April 13, 2017

I almost never have a problem on desktop, but using iOS safari (and the mobile reddit site), I often see the error page. I think it's a view/rendering issue, and ends up being displayed rather than a loading modal, but I could be wrong.

vertex-four · on April 13, 2017

Nor mine - although search essentially never works on the first try, and sometimes the second and third tries.

fernandotakai · on April 13, 2017

same. reddit was really bad some good years ago, but it's quite stable now.

notatoad · on April 13, 2017

maybe three years ago it did, but reddit has gotten drastically more stable since then. It still has the occasional downtime, but now it's more like every couple months than every couple days.

SquareWheel · on April 13, 2017

Reddit itself has been very stable these last few years. Reddit search on the other hand is a complete crapshoot even to this day.

amyjess · on April 13, 2017

So I got hit by an unfortunate bug on the first day of /r/place.

I was trying to draw something, one pixel at a time, and all of a sudden, after a bunch of pixels, it stopped rate-limiting me! I could place as many as I wanted! So I just figured that they periodically gave people short bursts where they can do anything. This was backed up by my boss, who was also playing with /r/place, saying that the same thing happened to him not long before that (yes, my whole team at work was preoccupied with /r/place that Friday). So I quickly rushed to finish my drawing.

And then I reloaded my browser... and it wasn't there. Turns out that what I thought was a short burst of no rate limiting was just my client totally desyncing from Reddit's servers. Nothing was submitted at all.

Not too long after that, another guy on my team got hit by the same bug. But I told him what happened with me, so he didn't get his hopes up.

mwfj · on April 13, 2017

It happened to me as well. I did verify that my changes actually made change (from the same IP, but in incognito mode). Didn't bother to check if the changes stayed.

nathan_f77 · on April 14, 2017

This is fantastic. I learned a lot, and it seems like they nailed everything.

I really enjoyed the part about TypedArray and ArrayBuffer. And this might be a common thing to do, but I've never thought about using a CDN with an expiry time of 1 second, just to buffer lots of requests while still being close to real-time. That's brilliant.

bhargav · on April 14, 2017

Startup tech writers, take note. This write up has been more helpful than many in the past. Thank you very much Reddit developers

nitwit005 · on April 13, 2017

Given the scale described, it sounds like they could have had a single machine that held the data in memory and periodically flushed to disk/DB to support failing over to a standby.

bsimpson63 · on April 13, 2017

You're basically describing how we used redis for this project.

nitwit005 · on April 14, 2017

I suppose so, but then what did you gain from the extra hop to redis?

bsimpson63 · on April 14, 2017

Not having to implement redis ourselves.

nullbyte · on April 14, 2017

Did you even bother reading the first few paragraphs? They talked about their usage of Redis for this. Next time please read the article before replying.

nitwit005 · on April 14, 2017

Perhaps, rather than posting an insult, you should consider the possibility that you misinterpreted my comment?

Matheus28 · on April 14, 2017

I would be slightly more careful and just use a cluster of servers with a simple consensus algorithm (like raft). A simple C++ server with a raft library plus uWebSockets should be able to handle a lot of load.

foota · on April 13, 2017

That was my thought as well.

tupshin · on April 13, 2017

Our initial approach was to store the full board in a single row in Cassandra and each request for the full board would read that entire row.

This is the epitome of an anti-pattern .I sincerely hope that this approach was floated by somebody who had never used Cassandra before.

Even if individual requests were reasonably fast, you are sticking all of your data in a single partition, creating the hottest of hot spots and failing to leverage the scale out nature of your database.

spyspy · on April 13, 2017

This entire project is just an elaborate hack day project. There's no reason to fault them for trying new and interesting hacks to get it off the ground. They realized it wasn't the right method and moved on. End of story.

er8h · on April 14, 2017

They're using (timestamp, user) as their compound key, which would partition rows by timestamp, no?

matsemann · on April 13, 2017

Interesting how big the Norwegian and Swedish flags got, given our small populations.

77pt77 · on April 13, 2017

Very easy flags to maintain and extend from a smaller model.

kzrdude · on April 13, 2017

Populations are not so small in reddit terms. Sweden probably has an equal population to Italy or even more, on reddit.

stevekemp · on April 13, 2017

I thought the exact same thing, about the Finnish flag - complete with Moomins.

pimeys · on April 13, 2017

And the almost a dictator of a president Urho Kekkonen. I kind of understand this, I was born in that country...

nerfhammer · on April 13, 2017

I also notice some kind of Finland-Brazil-Argentina alliance

pitaj · on April 13, 2017

r/Place is really awesome. This is how you grow the community. The 2D and 3D timelapses are super cool to watch, as well. Glad Reddit decided to make this a full-time thing.

kzrdude · on April 13, 2017

Full time? I haven't heard that anywhere.

ThomPete · on April 13, 2017

This is why experimentation is so important and why I always love when people do things just to do them.

It's literally like exploring the digital universe and reporting on some of your findings.

Great writeup!

blurrywh · on April 14, 2017

FULL 72h (90fps) TIMELAPSE: https://www.youtube.com/watch?v=XnRCZK3KjUY

archagon · on April 13, 2017

Thank you for the fascinating writeup! How long did the whole thing take to put together?

tyrust · on April 13, 2017

First commit was on Jan 20 [0], so that provides a lower-bound for how long they spent on it.

[0] - https://github.com/reddit/reddit-plugin-place-opensource/com...

matsemann · on April 13, 2017

> seems to be pretty simple, I feel like it shouldn't take more than a day to code

From a reddit "expert", so I guess that answers that ;)

hopfog · on April 13, 2017

This is amazing and I got so many ideas on how to tackle the scaling issues I have with my own multiplayer drawing website. In the aftermath of r/Place I went into some of the factions' Discord servers and posted my site, getting 50-100 concurrent users which caused a meltdown on my server. It was a good stress test but also a wake-up call.

Again, amazing write-up. Thank you!

antoniuschan99 · on April 13, 2017

Looks impressive!

How big was the team? How long did it take to complete this project? Is the code going to be open sourced?

aw3c2 · on April 13, 2017

https://github.com/reddit/reddit-plugin-place-opensource

biot · on April 13, 2017

  > At the peak of r/place the websocket service was
  > transmitting over 4 gbps (150 Mbps per instance
  > and 24 instances).

What does Reddit use for serving up this much websocket traffic? Something open source, or is it custom built?

dkasper · on April 13, 2017

It's open source and custom built. https://github.com/reddit/reddit-service-websockets

77pt77 · on April 13, 2017

Now where can we get a dump of all the data.

Like

timestamp, x, y, color, username

calosa · on April 13, 2017

One of the reddit data scientists dumped it here... https://data.world/justintbassett/place-events

aw3c2 · on April 13, 2017

> Oops! We can't find that page.

justintbassett · on April 13, 2017

my fault :). I have to get a few things ready for a public release of more data

77pt77 · on April 13, 2017

Is this data:

https://www.reddit.com/r/place/comments/6396u5/rplace_archiv...

Complete?

I've been working with that.

Ajedi32 · on April 14, 2017

That data was recorded by the community. It took everyone a while to figure out how to grab a snapshot of the canvas though, so a few hours of data near the start are missing.

socmag · on April 15, 2017

Looking forward to that, it will be a very nice data set

calosa · on April 13, 2017

Oops. Sorry about that. Look forward to it being back up :)

Paul_S · on April 13, 2017

You can't have it because it would show the extent of moderation.

77pt77 · on April 13, 2017

I don't get it.

Please explain.

a_bonobo · on April 14, 2017

Weird question - why does that bash script use absolute paths for standard tools like awk and grep (/usr/bin/awk instead of just awk)? Is this some best practice I know nothing about?

Cofike · on April 14, 2017

As someone looking to expand their knowledge of big systems and building at scale this kind of resource is invaluable!

calosa · on April 13, 2017

If any data science-y folks want to work with the raw data, you can find it here... https://data.world/justintbassett/place-events

Ajedi32 · on April 13, 2017

"Oops! We can't find that page."

Viper007Bond · on April 13, 2017

> my fault :). I have to get a few things ready for a public release of more data

https://news.ycombinator.com/item?id=14110094

calosa · on April 13, 2017

Looks like the owner changed it to be private :/ Hopefully they'll open it up again.

ReverseCold · on April 13, 2017

Did anyone download it already? Reshare?

svarrall · on April 15, 2017

Anyone with any insight into how much something like this 'cost' Reddit, resource wise. Is the main outlay in time and the server costs already covered by their infrastructure or does the high traffic add enough to make a difference?

_hamilton · on April 13, 2017

/r/place is probably the coolest project that happened this year so far.

taftster · on April 13, 2017

This year? I'm thinking more like this decade. It's gotta be up in the top 10 of ever. On so many levels, /r/place was fascinating; and I didn't even come across it until after it had finished!

replface · on April 14, 2017

Similar to this? https://www.youtube.com/watch?v=9_uX5yXSOwU

rohankshir · on April 14, 2017

anyone know what framework they used to do visualizations?

bsimpson63 · on April 14, 2017

https://grafana.com/ for the graphs and https://www.draw.io/ and http://www.fiftythree.com/ for the diagrams.

the_arun · on April 14, 2017

Good article. One security issue I see is - showing Nginx version in error page - nginx/1.8.0. No need to show details of webserver or its version!

huangc10 · on April 13, 2017

Can someone link to the full resolution final image? Been trying to find it. Thanks!

testudovictoria · on April 13, 2017

If you ever want to revisit, it's linked in the right side of the sub.

https://www.reddit.com/place?webview=true

huangc10 · on April 13, 2017

For some reason it leads to 403...weird

dan1234 · on April 13, 2017

http://i.imgur.com/dYdHDYe.png

I've exported it to imgur for you.

LeoPanthera · on April 13, 2017

This is the final bitmap. http://i.imgur.com/ajWiAYi.png

Ajedi32 · on April 13, 2017

Here you go: https://i.imgur.com/ajWiAYi.png.

And for those interested, here's some additional stats:

- The original announcement about /r/place: https://www.reddit.com/r/announcements/comments/62mesr/place...

- Full timelapse of the canvas over the course all 72 hours: https://www.youtube.com/watch?v=XnRCZK3KjUY

- Heatmap of all activity on the canvas over the full 72 hours: https://i.redd.it/20mghgkfwppy.png by /u/mustafaihssan

- Timelapse heatmap of activity on the canvas: https://i.imgur.com/a95XXDz.gifv by /u/jampekka

- Entropy map of the canvas over the full 72 hours: https://i.imgur.com/NnjFoHt.jpg by /u/howaboot (explanation: https://www.reddit.com/r/dataisbeautiful/comments/63kuy6/oc_...)

- Map of all white pixels that were never touched throughout the event: https://i.imgur.com/SEHaUSJ.png by /u/alternateme

- Most common color of each pixel over the last...

  - 72 hours**: https://i.imgur.com/C5jOtl1.png by /u/howaboot

  - 24 Hours: http://aperiodic.net/phil/tmp/place-mode-24h.png by /u/phil_g

  - 12 Hours: http://aperiodic.net/phil/tmp/place-mode-12h.png by /u/phil_g

  - 6 Hours: http://aperiodic.net/phil/tmp/place-mode-6h.png by /u/phil_g

  - 2 Hours: http://aperiodic.net/phil/tmp/place-mode-2h.png by /u/phil_g

- Average color of each pixel over the course of the experiment: https://i.imgur.com/IkPOwIh.png

- Atlas of the Final Image: https://draemm.li/various/place-atlas/ by /r/placeAtlas/ (source code: https://github.com/RolandR/place-atlas)

- Torrents of various canvas snapshots and image data: https://www.reddit.com/r/place/comments/6396u5/rplace_archiv...

- The post announcing the end of /r/place: https://www.reddit.com/r/place/comments/6382bb/place_has_end...

It took a while for members of the community to realize what was happening and start recording snapshots of the canvas, so there are a few time periods early on that got skipped

19eightyfour · on April 14, 2017

This was such an amazing demonstration of human collective collaboration. It sort of makes me fee like humans could do anything, even tho the result is sort of in some sense trivial. As well as simply being enjoyed, this could be studied in so many ways. Competition of memes and cultural representations. 'Evolutionary' convergence upon some optimum. Mainstream vs Fringe. Accepted vs Taboo concepts. Implicit spontaneous emergence of behaviour norms for participants: self regulating systems. I also like this timelapse which contains an overview, and then a close up of each of 12 sections ( 333 x 250 ) - https://www.youtube.com/watch?v=RCAsY8kjE3w

huangc10 · on April 13, 2017

awesome resources! Thanks!

webdwarf · on April 13, 2017

It's awesome project!

mozumder · on April 13, 2017

> We actually had a race condition here that allowed users to place multiple tiles at once. There was no locking around the steps 1-3 so simultaneous tile draw attempts could all pass the check at step 1 and then draw multiple tiles at step 2.

This is why you use a proper database.

I'd probably add a Postgres table to record all user activity, and use that to lock out users for 5 minutes as an initial filter. Have triggers on updates to then feed the rest of the application.

d23 · on April 13, 2017

So in that case, each pixel would be stored as a separate row in a relational database? And to query the whole canvas you'd query a million rows on every read?

I lean towards just using the ratelimiting stuff we already have in place (via memcached, which we talked about in a previous post). We just overlooked it.