Random routing seems okay if you think only about busy servers under light load. However, one of the problems is it creates light servers when under heavy over-all load. Any time a server is idle during a high overall demand period is work opportunity that is irreversibly lost. I made this argument in the single-thread 2-server case here http://www.win-vector.com/blog/2013/02/randomized-algorithms... , but he principle applies much more generally.
why? do you have any proof/calculations? what you're stating seems unintuitive to me. would't you expect intelligent routing and random routing to converge in the limit of many concurrent processes?
The issue is that while the average time is going to converge - users aren't interested in the average time!
Consider the following scenario:
- Two types of requests, A and B
- Request A takes 200ms to process
- Request B takes 10 seconds to process
- Each dynamo can take 4 concurrent requests
If you are receiving 1000s of requests per minute, it is very likely that you will eventually allocate more than 4 request B to a single dynamo. One this has happened, that dynamo is now locked for 10 seconds. All of the request As that we route to that dynamo will take over 10 seconds to return.
If request A is a credit card transaction, and request B is just some data lookup with a loading UI spinner, then every time this occurs our poor app has lost money as the user has navigated away from our app before the transaction (request A) can complete! Ouch!
Intelligent routing solves this by ensuring that request A will only go to a dynamo that is open, thereby ensuring no user will have to wait 10 seconds for their quick credit card transaction.
The take away here is that the important measure is the slowest user facing request, not the average request time across all requests.
How about factoring your A-type requests and B-type requests into two separate apps?
Oh, wait, Heroku already tells you to do this: the B-type requests are why they introduced "workers." Heroku's recommendation has always been that you're supposed to write your "app servers" to serve A-type requests directly, while asynchronously queuing your B-type requests to be consumed by workers. You then either poll the app server for the completion state of the long requests, or have the worker queue a return-value paired to the request-ID, that the app server can dequeue and return along with the next request. (Erlang's process-inbox-based message-passing semantics, basically.)
To put it another way, it's the old adage of "don't do long calculations on the UI thread." In this case we have a Service-Oriented Architecture, so we've got a UI service--but we still don't want it to block. By default, Heroku basically exposes Unix platform semantics; unless you wrap those with Node or Erlang, you have to deal with how Unix does SOA: multiple daemons, and passing messages over sockets. Heroku could "intelligent-route" all they want, but there's no level of magic that can overcome applications designed in ignorance of how concurrent SOA architecture works on the platform you're designing for.
However, the more concurrency your stack can do, the less this matters. Assuming the probability of a node getting assigned a long request is 1/5, the probability of a node getting assigned 4 of them is (1/5)^4 = 1/625. If your stack can do 20 concurrent requests, it's (1/5)^20 = 1/95367431640625.[1]
Note that this overlooks the fact that a node that has requests queued only drains them at a certain rate, as long requests tend to pile up. However, if your framework is truly concurrent and your nodes are not CPU bound, but rather waiting for other services, it's possible to get much higher concurrency than just 20.
You misunderstood my reply. It means that each dyno does more work by being concurrent, not buying more dynos.
It's possible to do several hundred requests per second on a single dyno; just because Rails doesn't allow you do to that due to not being concurrent doesn't mean that other stacks can't.
Rails isn't "not concurrent", it simply uses multiprocess concurrency rather than multithreaded concurrency. Unicorn, for instance, uses a forking model, which is why Heroku recommends it. Ruby 2.0 also favors improves the runtime's copy-on-write performance, which makes it cheaper to fork processes.
Still, there's a resource limit to how much concurrency you can achieve within a dyno as opposed to by buying more dynos. You shouldn't have to scale horizontally just because the routing scheme is inefficient with the width that it has.
If Heroku really wanted to abandon Rails because other web stacks are easier for them to scale with, they should have let us know rather than turning around and shitting on the platform that made them as a business.
Rails is not concurrent. Forking the process does not make it concurrent, either. Parallel, yes, but not concurrent.
Concurrency is the ability to do work on multiple things at once, while parallelism is the ability to execute multiple things at once. For example, an operating system running on a uniprocessor is concurrent, but not parallel. Another example is HAProxy, which is highly concurrent but not parallel(it can handle several thousands connections, but is single-threaded).
The distinction is important when you're trying to scale. Adding more threads/processes does not help, as you quickly reach OS limits(eg: C10k problem). Having a concurrent web stack(nodejs, EventMachine, Play, Xitrum, Twisted, Yaws, etc.) does, as it allows extremely large concurrency with a limited resource impact, whereas adding more processes quickly hits memory limits(at least on Heroku).
By that definition, it's more typical for databases and load balancers to be highly concurrent. If you have that, then all you need from your app servers is parallelism. Unless you have an especially bad load balancer that does something stupid like distribute requests randomly, at which point your app servers have to do some of their own load balancing.
And even at that point, you're still using more dynos than you would with intelligent routing.
Some web stacks are highly concurrent too (we're running one on Heroku). On a concurrent stack, it doesn't matter if one request takes longer because it does not prevent other requests from completing, so random routing is optimal.
Leastconns routing falls flat in many cases, such as long polling/streaming/websockets, which count as a connection but barely take any resources. In the case where you would have a load balancer that did leastconns, the servers with many long poll requests would end up underloaded(ie. doing nothing) while the ones with few requests would end up serving most of the application.
Random load balancing still wouldn't be optimal because, on aggregate, the dynos that happen to receive more expensive requests still get overutilized and the dynos that receive less expensive requests still get underutilized. This is assuming--probably validly--that there's a power law distribution to how expensive requests are.
Maybe you need entirely different load balancing strategies for different designs of web application, which means Heroku's promise of a single infrastructure stack for everything is bogus. But I'm skeptical that they deliberately chose to favor evented frameworks rather than choosing what was easiest for them to implement.
Your front end should not be doing anything "expensive," ever. If you're doing something like transcoding video in your frontend, your performance will go down the drain, for obvious reasons.
If your web app can only process one thing at a time, everything is expensive. Someone made a database query in the admin interface that runs for 20 seconds? Oops, hope you have other servers. Ten 100ms requests queued? The last guy has to wait 1000ms for his reply.
If it can process things concurrently, it doesn't matter, as long as you're not doing something that uses all the CPU(which shouldn't happen, assuming a sane design). Someone made a database query in the admin interface that runs for 20 seconds? Doesn't matter, you still have n-1 database connections to process the other queries. Ten 100ms requests hit your server? That's fine, the last guy will have to wait maybe 120ms for his reply.
If your app cannot process things concurrently, it should not accept connections for further work if it is working on something, period. This shifts the load balancing back onto the load balancer, because it has to find a worker that can process a request. And guess what, random assignment works fine in that case.
If only Heroku allowed dynos to stop accepting further requests rather than queuing at the dyno level!
I don't see how forking is insufficient concurrency for the cases you mention, either. Even the Ruby GIL will allow threads to switch while blocking on I/O, so your ten second DB query is covered.
But there's no free lunch--if 2000ms of CPU ends up on one of your servers, and your median request is closer to 100ms, the unlucky server that gets the 2000ms request is still a little fucked without any mechanism to counterbalance. Even letting the server stop accepting new requests from the LB, as you suggested, would be sufficient.
> If only Heroku allowed dynos to stop accepting further requests rather than queuing at the dyno level!
It does and it's in the configuration for your app, not on the Heroku side of things. For example, if you're using Unicorn, check the documentation for :backlog, which controls the connection queue depth of the web server.[1]
> I don't see how forking is insufficient concurrency for the cases you mention, either. Even the Ruby GIL will allow threads to switch while blocking on I/O, so your ten second DB query is covered.
Can it handle 10, 20 or 100 long database connections/web service queries? I do admit I don't know much about Ruby threading and per-thread overheads and in that particular case, green threads are perfectly fine.
> But there's no free lunch--if 2000ms of CPU ends up on one of your servers, and your median request is closer to 100ms, the unlucky server that gets the 2000ms request is still a little fucked without any mechanism to counterbalance. Even letting the server stop accepting new requests from the LB, as you suggested, would be sufficient.
There shouldn't be any requests that take that much CPU time, unless you're doing something ridiculous like sorting millions of integers or whatnot. If your application requires sorting millions of integers or any computationally expensive requests, for some reason, those should be handled in a backend queue and sent to the client using long polling/comet/websockets/meta refresh so that your front end can worry about delivering pages quickly and shoveling data between the backend and the client instead of crunching numbers.
In the limit of many concurrent processes, yes, this does converge. However, not even concurrent servers have truly independent response times: unbalanced load will slow down a server. Balanced routing is almost always more optimal, if you can pay the cost of maintaining that state. The problem is only aggravated by servers which can only do one thing at a time.
“Intelligent routing” is round robin and requires knowledge of the number of request handlers. It's not intelligent at all, it just means that you are ruling out high concurrency servers.
If you would extend the whole thing to have a negotiation protocol that lets the load balancer know how loaded the individual backend processes are then you could start talking about intelligent routing. That however currently does not exist in the Ruby world.
Yes, memory bandwidth is one of the biggest slow downs of ray tracing. Rasterization access memory very linearly which makes it very easy to cache and easy to optimize in hardware on the GPUs. But ray tracing access memory very randomly, two rays originating from pixels next to each other easily diverges and so hits objects far from each other in the scene. This means that ray tracing will spend most of its time waiting for memory.
Yes. But the "constant factor" will be very large, even larger than for traditional ray tracing, for some global illumination algorithms, even when you chose to only render a small part of the scene. For example photon mapping requires a pre-pass where you trace photons from the light source onto the scene, that means that reducing the pixel count doesn't decrease the time for the pre-pass. http://en.wikipedia.org/wiki/Photon_mapping
Purity has nothing to do with green threads. Many languages that doesn't have the concept of purity have green threads, like Erlang or Go. In fact, green threads are not a part of Haskell, but a part of the runtime. Purity is just a nice thing to have when working with threading or concurrent programs.