Windows 3.1 All Over Again

Animats · on Sept 25, 2017

The real "Windows 3.1 all over again" experience is mixing functions which block with ones that are "async". If nothing blocks for long, as with Javascript, that's fine. If everything is threaded, and threads block, that's fine too. What works badly is mixing the two. If you block something that's doing cooperative multitasking, the whole system stalls. This is still a big problem inside browsers.

Python has this problem with "async". If you make a blocking call, you stall out the "async" stuff. Go gets around this by having a "fiber" or "green thread" mechanism underneath. If you block on a lock, that doesn't tie up a resource needed to run other goroutines. This is why retrofitting async to a language has problems.

vog · on Sept 25, 2017

Note that Python has greenlet and gevent, though. It works by monkeypatching all blocking calls in the standard library (and common external libraries such as database client libs like psycopg2).

Animats · on Sept 25, 2017

That's not a standard part of the new "async" feature. It's a leftover from the Stackless spinoff of Python.

gizzlon · on Sept 25, 2017

This feels like Elixir propaganda =/

Starts off nicely with a good premise, but instead of digging into the interesting details it glosses over them and starts to preach Elixir.

I don't have any stake in this, I use neither js or Elixir, but this article is too shallow to be really useful.

ENGNR · on Sept 25, 2017

Yeah he stakes a bit of a stab at scala play, which as far as I can tell in the latest release uses the same concurrency mode (actors) as Erlang anyway

Lerc · on Sept 25, 2017

I tend to think of it as a pre-emption issue more than a concurrency or parralellism issue. Endless loops cause hangs. Being able to pre-empt things would solve a lot of the problem.

    10 print "mike is cool"
    20 goto 10

simply doesn't have a good analogue in javascript. Even in Web Workers a loop like that causes the worker to be unresponsive to events. The Shared Memory addition alows a rudimentary form of this by allowing a flag to make an aware program exit. It wouldn't be too hard to have a BASIC to worker compiler that compiled the above code to

    while (true) {
       print("mike is cool");
       if (shared_global_flag === true) break;
    }

There are a few javascript livecoding websites out there that update the code as you type. There is a problem with this. Broken (as half written code tends to be) javascript doesn't just fail to work, it breaks other things

for example

    var x=1;
    while (x<10) {
      output(x);
      x+=1;
    }

Typing this into a livecoding environment breaks because the while loop goes on forever until you type the code to increment x. It breaks everything because the while loop blocks you from being able to type the code that would get you out of the situation.

I don't see the wisdom of disaallowing something due to the specter of race conditions when its absence seems to cause a much easier to cause debilitating problem.

greggman · on Sept 25, 2017

apparently codepen runs the code through a parser and inserts exit checks in all detectsble loops.

https://blog.codepen.io/2016/06/08/can-adjust-infinite-loop-...

greggman · on Sept 25, 2017

as for why no races I think because it's basically impossible (or way too non performant) to write a JavaScript engine that can handle multiple threads deleting properties on objects. You'd have to lock on every delete.

Lerc · on Sept 25, 2017

That's the nature of JIT though, non-performant but working is ok for the bits that might cause trouble. JavaScript should just make fast the things it can.

I'd be ok for a light pre-emtion that would happen between javascript statements. Instead of genuain preemtion have the Javascript engine include break checks. A check of a flag and conditional branch (usually not taken) is not a huge expence. The cost can be especially ameliorated if you do the check only on statements that jump to earlier statements.

That fixes the problem of long running code stopping other things. It doen't need system level threads or parallelism, just a js engine doing spot checks.

BillinghamJ · on Sept 25, 2017

Author seems to slightly miss the point of Node. It’s designed for IO-bound work, and essentially nothing else.

If you have a long-running synchronous algorithm, you should not be running it in Node, or alternatively you could dispatch to another process/C lib and have it run in a true thread and asynchronously wait for the result.

jcelerier · on Sept 25, 2017

> Author seems to slightly miss the point of Node. It’s designed for IO-bound work, and essentially nothing else.

No remotely relevant app is only ever IO-bound. There are always some long lists to sort, some graphs to walk, some intricate canvas renderings to do.

romanovcode · on Sept 25, 2017

> long lists to sort, some graphs to walk, some intricate canvas renderings to do

This is probably one of the worst usecases for node.js

zbentley · on Sept 25, 2017

I'm not sure about that assertion.

I work on a giant [Health|Fin|Defense]Tech monolith which has been around forever, has do to everything for everyone, and has been worked on by hundreds or thousands of developers with radically different skill levels. It connects to many databases, external services, etc., and does some immensely complex data munging just to render what you'd think are simple pages (since the inflexibility of the backing model and limited space to make denormalizations of really big data mean everyone has to do super complex aggregations in the app server, across data sources X, Y, and Z, all to show the user a 10 row table).

In short, it's huge, ugly, and computationally expensive.

I was asked to quickly research the benefits of switching its platform (a single-threaded scripting language) to NodeJS (I wasn't told to research anything other than NodeJS, despite my objections).

I figured the savings would be minimal, since all our application servers are constantly running out of CPU (page loads crazy expensive, see above) or mem (aggregations crazy expensive, see above).

So I broke down what the app was doing on some representative servers, working both from the coarse level (dtrace/system resource usage) to the fine level (flame graphs of calls/wait time/yield events within the application runtime itself). I didn't profile the batch processing services; they were RPC'd to via the renderers, and used more appropriate languages/patterns for huge-data manipulations. As far as my profiling was concerned, they were functionally databases.

The result? 88% on average time spent waiting on IO or blocking non-block-file system calls. P90 was 99% blocking.

That went totally against my assumptions.

Sure, our webservers were overloaded with non-IO load, but if we were to switch to non-blocking IO and buy more webservers, we'd have gotten a massive performance increase without having to change the fundamental architecture of our webapp.

That was when I started seriously considering the benefits of reactive-style programming in a single thread, a la NodeJS; it hits a nice balance between "programmers that aren't necessarily super skilled having to engage with a full/real concurrency system" and "do everything blocking one per process".

There are tons of downsides, of course. Switching to nonblocking IO after spending so long in a blocking world would require both massive technical expenditure, and would probably also require reorganizing the capacity planning of all the other services/databases the app servers talked to, since they'd be fielding a lot more requests. Basically, the blocking nature of the render loops was an informal rate limiter on database queries. Parallelizing the render loops via processes gave much more direct control of changes in resource utilization, which is nice for proactive scaling. Additionally, node/callback style is still harder to learn (even with async/await sugar) than plain 'ol top-to-bottom sequential code. All that said, we'd be rewriting code in a new platform that looked different, but the code could still do the same things, per render, in the same order, which is a huge benefit.

A platform that hides preemption/concurrency while allowing people to program in the sequential style (e.g. Erlang) might have been a better fit, but . . . we were already using one of the best M:N resource schedulers in the world, the Linux process scheduler, to multiplex concurrent sequential processes that were just . . . linux processes. At the end of the day, I gained a lot of respect for the power and balance struck by single-thread/event-loop-driven reactive runtimes like Node.

Edits: grammr.

le-mark · on Sept 25, 2017

So I broke down what the app was doing on some representative servers, working both from the coarse level (dtrace/system resource usage) to the fine level (flame graphs of calls/wait time/yield events within the application runtime itself). I didn't profile the batch processing services; they were RPC'd to via the renderers, and used more appropriate languages/patterns for huge-data manipulations.

That's very interesting; you were given a huge old legacy app that had scaling issues (cpu and memory). Presumably the business was tired of throwing hardware at it? Or they hadn't gotten to that stage yet? Continuing, and tasked with diagnosing the performance problems you looked at the system and application. I can read up on dtrace, but how did you profile the application level stuff (time/yield)? Was it some functionality provided by the run time? Ie java visual vm for example?

This is a problem many, many companies have; ill performing legacy apps that the "legacy" staff aren't capable of handling (because the talented people left long ago ie "don't move my cheese"). It'd be really educational to see a write up of this!

zbentley · on Sept 25, 2017

It broke down roughly like this. I can't write it up, and am being vague, because I don't wanna get yelled at, sorry. Googling the below techniques will get you started, though.

1. Simple resource usage (system time vs. user time, memory, etc.) got the metrics for how long the OS thought the app was spending waiting on IO.

2. Dtrace was able to slice those up by where/how they were being called, which syscalls were being made, and what was being passed as arguments. This was important for filtering out syscalls that would remain a constant, high cost (e.g. block local file operations on old versions of Linux, which we have, get farmed out to a thread pool in NodeJS, so I pessimistically budgeted as if that thread pool were constantly exhausted due to volume + filesystem overuse).

3. In-runtime profiling. We have the equivalent of java visual VM (well, more primitive; more like jstack plus some additional in-house nice features we built, like speculative replay), but for our scripting language platform. That generated flame graphs and call charts for processes. Those were somewhat useful, but tended to fall into black boxes where things called into native code libraries, which was where the dtrace-based filtering data was able to help disambiguate. Using this we got a comprehensive map of "time spent actually waiting for IO".

There was a lot more to it than that, though:

Since different syscalls both had different call overhead (and different call overhead depending on arguments supplied) and different blocking times, all 3 steps were necessary.

For example, an old monolithic chunk of code that did ten sequential queries to already-open database connections is going to issue select(2) (or epoll or whatever) at least ten times. Conversion to Node, and it's single-poll-per-tick model, would make that cost vastly reduced, making it move the performance needle a lot. Of course, that's only true if the ten queries in question can actually be parallelized, which typically requires understanding the code . . . if it can be understood, which is not a given.

However, a page render that called ten different HTTP services would make ten full-cost connect(2) calls in the worst case, ten low-cost (keepalive'd) connect calls in the best case. Node would still have to make those same ten calls, making it a less needle-moving thing to move into nonblocking IO (though the time spent waiting for connect to complete or time out would still not be paid directly in the render, which had to be accounted for as a positive). And it goes deeper: depending on the services being hit, keepalive window, and rate at which they were called during a typical server's render operations, we had to calculate how often, say, a 50-process appserver worker pool would be redundantly connecting to those services (because separate sibling processes can't share the sockets if they're initiating the connection, and before you ask, I would not like to add to the chaos by passing open file descriptors over unix sockets between uncoordinated processes thank you very much). If the redundant connect rate was high, Node might offer significant savings by allowing keepalive'd sharing of connections within a single node process (we'd need many fewer of those per server than appserver workers). If it was low, fewer savings.

TL;DR it's complicated but possible to measure this data using established techniques. You don't have to get super far down the rabbit hole to get a decent guess as to whether it will be beneficial to performance, but transforming decent into good requires a fairly thorough tour of Linux internals.

And, as always, the decision of whether or not to switch hinged primarily on the engineering time required, not the benefits of switching. C'est la vie.

throwme211345 · on Sept 25, 2017

VA and mumps is what it sounds like to me. It's broken and walk away.

zbentley · on Sept 25, 2017

Good guess, but no cigar.

creatonez · on Sept 25, 2017

That's what other languages are for. Logic handling IO goes in Node.js, and everything else can go in native modules, child node processes, or other services that fulfill requests.

If you can design an application where this kind of separation won't overcomplicate everything, Node.js is probably pleasant to deal with.

golergka · on Sept 25, 2017

> No remotely relevant app is only ever IO-bound.

A lot of microservices that these apps could be split to certainly are.

BillinghamJ · on Sept 25, 2017

I run an motor insurance company. The vast majority of our services are indeed exclusively IO-bound. The very small amount of non-IO-bound work happens mostly in Go.

ubernostrum · on Sept 25, 2017

No remotely relevant app

Define "relevant".

jstewartmobile · on Sept 25, 2017

Not sure that's fair. I think his point was that instead of improving performance by switching to async and navigating callback soup and task unfairness, we should really be improving the performance of context switching, and he probably has a point.

Just a few weeks ago there was an interview with Ryan Dahl where he said:

"...I think Node is not the best system to build a massive server web. I would definitely use Go for that. And honestly, that’s basically the reason why I left Node. It was the realization that: oh, actually, this is not the best server side system ever."[0]

[0]https://www.mappingthejourney.com/single-post/2017/08/31/epi...

nine_k · on Sept 25, 2017

Erlang was also designed for IO-bound work, and Elixir builds on the Erlang VM and stdlib.

What would be the point of saying that Elixir is superior in this situation, were it not?

baybal2 · on Sept 25, 2017

The author seem to loose the point of FP, as is the case with many web programmers

Async != FP

Back in my high school days, FP was almost strictly about usage of pure functions above anything else.

The late comers to the party, the React guys, then bunched FP with async stuff, and function polymorphism. That was wholly apocryphal to the original FP idea.

I can still cite my high school teacher citing Niklaus Wirth: "If a function does not interact with external data in a any way other than reading them from its inputs, the whole program can be represented as a spreadsheet"

The original advantages were states as:

1. a program written in this manner is very easy to optimize. 2. an interpreter or a compiler executing such program is easier to program, and is less error prone 3. for as long as a kind of middleware sits in between the "spreadsheet function," and the data, it is possible to write a continuous execution log with ease, as well as to do tricks with getters/setters to start execution from arbitrary points in code.

All of that is stuff from times when Oberon and Modula were "hip" (seventies)

pjmlp · on Sept 25, 2017

Oberon was designed in 1986 and it was hip up to around 1998. :)

andrewguenther · on Sept 24, 2017

> “what else am I supposed to do when I’m waiting for a result from the database”? I can’t do anything else until I have that result. I need the result to make give to a template to give to the renderer, etc.

I dunno, maybe you could serve one of your 100 other concurrent requests?

> You can do something useful in the meanwhile and get the result later

and then immediately...

> It has nothing to do with hogging or stealing resources.

It has everything to do with this. If you're blocking on something waiting for a result while you could do something useful in the meantime, that is the definition of hogging resources.

rdtsc · on Sept 25, 2017

> maybe you could serve one of your 100 other concurrent requests?

That's exactly the problem with node.js and other promise/callback/deferred/futures mechanism - it mixes concerns. You business algorithm probably doesn't look like "1) authenticate & authorize 2) return a callback then go to select loop to process 100 other connections 3) add book to cart etc". It probably doesn't have 2) in there, and it shouldn't. The point was that that particular request doesn't and shouldn't have anything else to do besides wait for the connection from database to return a result.

Sure there maybe there are 2 million other (https://blog.whatsapp.com/196/1-million-is-so-2011) connected clients wanting to buy a book, but the logic for a single doesn't have to care about it. That's the beauty of it.

> If you're blocking on something waiting for a result while you could do something useful in the meantime, that is the definition of hogging resources.

That's not something you should be doing that high up the abstraction levels. If you have to worry about it that high up there is something broken with the framework / language / library.

Like the OP says, it is basically the equivalent of Windows 3.1. Nobody sane today would put their crown jewel, latest production systems running on Windows 3.1 but yet we do with when it comes to our choice of frameworks and libraries.

christophilus · on Sept 25, 2017

His point is that blocking while waiting should not be an expensive thing. In preemptive systems like BEAM, it's not. Blocking a process is not an issue. In Node/Windows 3.1, it is a big deal, as it means starving other work.

pas · on Sept 25, 2017

Await works the same way as BEAM, no?

In theory their performance and behavior should be the same. BEAM and other N:M green thread systems being maybe faster if they can efficiently use multiple threads. But since they use message passing anyway (needed for isolation), they are again almost equivalent to running multiple Node/reactive request handlers.

N:M is probably useful for desktop apps. But for that single thread is (should be) always enough, so same behavior.

jerf · on Sept 25, 2017

"In theory their performance and behavior should be the same."

This isn't about performance, it's about how you write code. In Erlang/Elixir (and also Go and Haskell), you aren't sitting there constantly explaining to the runtime over and over again how to plumb together bits of "async" code. Essentially, all code is simply presumed async, and the plumbing is always already there for you; the act of calling a function includes, for free, all the "async" plumbing.

I expect anyone who has been doing this for a couple of months should have noticed it's the same plumbing, over and over again. Here's the long-running IO task. Here's what to do with it when it's done. Here's what to do with errors. Here's the next long-running IO task. Here's what to do with it when it's done. Here's what to do with errors, which is pretty similar to the last thing. Here's the next long-running IO task. Here's what to do with it when it's done. Here's what to do with errors; whoops, forget that one, which is gonna cost me during debugging but nothing in the code, runtime, or syntax affordances particularly cares that I forgot. Here's the next long-running IO task....

Yes, they do come out roughly the same in speed for Erlang/Elixir and JS, and Go can quite easily be substantially faster... and easier to write.

And if we want to discuss performance, Go still isn't as fast as it could be. There's a solid reason why N:M threading can't be the fastest way to run code in general, but I don't see much reason why it couldn't be something more like 20% away from C speed in general, with some pathological cases where it's much slower and out of the question (very fast packet handling with those fancy software TCP cards, for instance).

WorldMaker · on Sept 25, 2017

async/await as mentioned by the previous poster is "do-notation" for the asynchronous monad Promise<T> in modern ES/JS(/TS). It's not terribly different from the languages you mentioned, even if it is a different approach because the language started synchronous first. You don't have to do that much plumbing as with the bad old JS world of Node callbacks, just sprinkle the async/await as appropriate, make sure things stay in Promise<T> and write things do-notation style just about like you would write the equivalent synchronous code.

From what I've done async/await in ES (and C#) is easier to write than Go's equivalent, but we're hitting your mileage may vary territory.

kabes · on Sept 25, 2017

Well, as the article points out, await is not pre-emptive, so it doesn't have the same fairness guarantees as BEAM. But IMO this argument is mostly theoretical as most web applications deal with simple request/response kind of things and have few long running tasks. Although a non pre-emptive system might be easier to DOS as soon as you find one input that takes a long time to process.

pas · on Sept 25, 2017

Await doesn't have to be as everything (im theory) is non-blocking.

Yes, I see, if you have data crunching in the event loop, you have to "manage" it, by either delegating it outside of it, or chunk it up into small pieces, and yield between.

But that should result in the same performance as BEAM, because you can run 1 big runtime or many small event loops on each box, and put a load balancer in front (which is needed even for BEAM, because if you have a more threads/processes than cores throughput AND latency will suffer).

Agreed on the DoS thing, but nothing, sort of a proper timeout (and/or hard RT) will save you. Yes, with Node et al. you can lose a lot of requests when you inevitably kill the runtime. But it's just as easy to DoS a different runtime. (BEAM just gives up a few seconds later.)

noncoml · on Sept 24, 2017

Can we stop with the erlang/Elixir is slow? For most web services it is as fast as go.

For my last side project I was evaluating go and erlang. Just to test out how faster go is, I quickly coded a simple HTTP request in both language(I am not an expert in either).

The server would get a request with large body, cache it to a file on the disk, then load it in 128k chunks, get the hash of the chunk and save both the digest and the 128k data to a database.

To my surprise, erlang was consistently slightly faster for each request. Happy to share the code snips of both if anyone want to audit/scrutinize.

Edit: Code for both: https://pastebin.com/aBQWqkG3

On my laptop a single request(1.2MB) takes around 230ms for Erlang and 280ms for Go. Not a scientific test, but it gives an idea.

atombender · on Sept 25, 2017

Your Go code uses "defer" in a for loop, which could be a performance bottleneck. The deferred function call doesn't run until the saveToDb function returns. A simple fix would be to reuse the rows variable and put the defer outside the loop.

I believe it might, depending on the database driver, also cause another issue. Since you're not closing the rows right away, and not even consuming the rows, this may force the driver to mark the internal connection as busy until the defer runs, meaning the next database call would have to open a new connection. Try closing the rows as soon as you can.

If you don't expect any results, you can also use Exec instead. Then there's nothing to close.

noncoml · on Sept 25, 2017

Just to update that you where right. Without the `defer` now golang is as fast as erlang.

Exec vs Query didn't make a difference.

Yet another update: things get really interesting when I run wrk. I don’t want to spoil it for you. I can let you try it ;)

egisspegis · on Sept 25, 2017

Please spoil it for us. I would love to see results, but can't run wrk myself at the moment.

noncoml · on Sept 25, 2017

I get 2x more requests on erlang

noncoml · on Sept 26, 2017

OK, I was unfair to golang again. Turns out it needs a bit more tuning "db.SetMaxOpenConns(10)" and now golang and erlang are on par.

noncoml · on Sept 25, 2017

I keep the habit of deferring immediately so I don’t forget later if the fucnction gets long. But I see what you mean, on this case, inside the loop, is really bad idea.

Will try with Exec and without the defer and see if it makes a noticeable difference.

The driver is the MySQL one for what’s worth it. Postgres was a bit slower.

kenhwang · on Sept 25, 2017

We did this experiment too! When we did our initial benchmarks, Go had a healthy performance lead on pure CPU or IO workloads over Elixir and Scala.

Then we went through with the full POC of our toy webserver project and the performance difference between Go, Elixir, and Scala was pretty much insignificant (within 5% of each other).

kjksf · on Sept 25, 2017

The difference is that in Go sha256.Sum256() is implemented in Go and I'm pretty sure in Erlang crypto::hash() is implemented in C.

The program is mostly i/o with sha256 being the only CPU intensive part. If sha256 (or any compute-intensive part) was implemented in Erlang, it would be at least 10x slower than Go version.

Go version probably has a bug: file.Read() doesn't guarantee to fill the buffer, you need to use io.ReadFull() for that.

Both versions unnecessarily write a copy of the file to disk (and don't seem to delete those randomly-named files from disk).

In Go saveToDb() would take io.Reader as an argument and you would pass r.Body directly.

Go is much faster than Erlang, it just doesn't show up in every toy program.

noncoml · on Sept 25, 2017

Just to update that getting rid of the part of saving to the disk first, and getting rid of the defer as another person suggested, the number now are around 200ms for golang and 210 for erlang.

IMHO it is not as huge as everybody makes it look like.

Again, as you very correctly pointed out, this is of course only for IO intensive applications.

noncoml · on Sept 25, 2017

Did you miss the: “For most web services it is as fast as go.”?

I think the request in my test is quite representative of what an actual web server will be like: mostly IO.

As a general programming language, go is much faster, I completely agree with you.

Btw: I don’t think deleting the files would make a difference. Also I save the body to file first because with cowboy I had trouble splitting it to the exact chick sizes. Will try without it in go and update.

microcolonel · on Sept 25, 2017

Elixir is slow enough to get in the way, if we're talking about complex processing tasks. I (along with a team) was tasked with writing a fairly complex accounting/numerics application in Elixir, and one day I had to basically call the whole thing a thousand times more often per request. It took me weeks to accumulate enough microoptimizations to make the thing function (because it was already reasonably well designed).

It's a sort of a "it's fast enough until it isn't" type thing. Python is fast in the same way (and faster with each release!).

For anything that isn't dominated by network and database time, language and runtime performance matters, and OTP/Elixir is slow in that way. In terms of "real world performance", it can sometimes matter more that the incremental cost of preempting your beam process is low, more than whether your process completes in more or less time.

noncoml · on Sept 25, 2017

Completely agree.

What saddens me is that only Erlang/Elixir get the “it’s slow” argument although they are in the same class as python and ruby.

dahauns · on Sept 25, 2017

I know what you mean. I don't know how often I've read "I prefer python/ruby/etc. because its not as bloated and slow as Java"...sigh

kpil · on Sept 25, 2017

The only thing slow about java is the time to start the jvm.

For everything else to be faster you need asm, c, c++, fortran, or rust.

Loading an applet over a modem in 1998 doesn't count...

sz4kerto · on Sept 25, 2017

Starting the JVM is <0.5 seconds. On my box it's ~0.1 sec.

majewsky · on Sept 25, 2017

I have programs doing nontrivial things that run in 5 ms, from execve() to exit(). That's milliseconds, not seconds.

kpil · on Sept 25, 2017

Exactly, even fairly complex script languages manages to boot, execute and exit in just a few milliseconds.

Maybe jigsaw is one piece in the puzzle to reduce the time it takes to start.

I doubt it's so bloody complicated, except AOT compilation of all system jars, I suspect it's mainly a matter of getting the time and resources to look at what happens when hello.world is executed.nu

That and relabling executable jars to jxe and the desktop is Java's to grab (that last part was a joke)

dahauns · on Sept 26, 2017

Exactly what I was going for.

le-mark · on Sept 25, 2017

This articles is a fantastic, real world overview of cooperative and preemptive multitasking. Note that coroutines and fibers are essentially equivalent because they can be implemented, each with the other[1]. Unfortunately this thread (up to now) has been simply language tribal-ism. Helps if everyone is on the same page with respect to technical terms before that all starts.

[1] https://stackoverflow.com/a/3325985/2561675

spc476 · on Sept 25, 2017

There are other concepts, like coroutines. At work, I use Lua coroutines to handle network requests. Basically, any function that can block instead starts the IO request, then yields the current coroutine. Yeah, the management code was a bit of a pain to write, but you can write (using the BASIC example from the article and not Lua, but the idea is similar):

    10 A = 1
    20 A = A + 1
    30 PRINT "HELLO " + A
    40 PRINT "YES"
    50 GOTO 20

where `PRINT` will do a non-blocking write, if there's anything left over, set the IO descriptor to trigger when ready for writing, and yield(). It's in the poll loop (select(), poll(), epoll(), kevent(), whatever the API is) to schedule the proper coroutine.

Yes, this requires making sure the code doesn't do anything CPU intensive, but for the application I have running, that is the case. But for me, using Lua coroutines means the main business logic is still sequential, even in an event driving environment.

Also, because you call a function to preempt execution, you don't need to save the entire CPU state. For C on the x86-64 bit, this is just six registers: https://github.com/spc476/C-Coroutines/blob/314607cf058352be...

ComodoHacker · on Sept 25, 2017

>In Elixir (thanks to Erlang) yo have true pre-emptive multi-tasking

>An Erlang process is not a system process. An Erlang process is a lightweight process. But here’s a key difference: there is no way in Erlang for processes to share memory

I'm confused. Is there real concurrency in Erlang? If yes, what Erlang processes use under the hood: system processes or threads? If threads, why they aren't sharing memory?

ramchip · on Sept 25, 2017

The VM has a set of threads called schedulers, all running in parallel. Each scheduler has in turn a set of Erlang processes (possibly thousands) to run. Every process has a pointer to its own separate chunk of heap where it allocates its stuff.

The language has no concept of pointer, so there is no way to create a reference to memory owned by another process. You can send a message to another process, but internally the VM will copy the message to that process’ heap.

devbug · on Sept 25, 2017

Mapped to system threads. Separate heaps to simplify garbage collection (and improve performance.)

AndrewCHM · on Sept 25, 2017

"I’ve heard a few reasons:" ... that you've considered, then cherry picked two of them, so you can confirm your views?

To properly reason why the approach is wrong, shouldn't you consider all significant reasons, including backwards compatibility as probably the biggest one?

"New language that learns from the mistakes of languages before it" would generally be better than "language and runtime that is keeping compatibility with a programming language that was heavily rushed just to fill a feature point for a web browser"

One might as well state that the grass is green and sky is blue, no?

norswap · on Sept 25, 2017

With all due respect, "the reasons he heard" are bogus.

The only reasons there is no preemptive multithreading in Node is just that the V8 interpreter is under GIL.

qualitytime · on Sept 25, 2017

" 10 A = 1 20 A = A + 1 30 PRINT_NON_BLOCKING "HELLO" + "A", when done callback { LINE 40 } 40 PRINT "YES" 50 GOTO 20 A common quiz for Javascript is to ask: which is printed first? “YES” or “HELLO3”? "

Is that a mistake with "HELLO3"? Should it not be HELLO2? Or am I stupid?

vog · on Sept 25, 2017

Please note that your formatting is totally screwed, making your fine comment almost unreadable.

See also the "help" link next to the edit box: https://news.ycombinator.com/formatdoc

z3t4 · on Sept 25, 2017

With nodeJS using fs.readFile is basically the same as scaling across multiple machines. NodeJS kinda force you to learn how to manage concurrency, eg what people call callback/promise/future hell.

chaostheory · on Sept 24, 2017

> Concurrency is hard. If you want concurrency you have two choices - processes or threads, take your pick.

Sometimes there's a 3rd choice: actors. imo it's much easier to manage than threads

atombender · on Sept 25, 2017

In Erlang, actors are called "processes".

syrrim · on Sept 24, 2017

Actors the same as what the OP is describing as processes I believe.

annon23 · on Sept 24, 2017

yup... I think browser need to come with some type of bytecode or vm that supports concurrency so we can finally have a true app platform.

EdSharkey · on Sept 25, 2017

Assuming you're not being snarky, hold on to your pants ...

As a baseline, you have Asm.js[1] + Web Workers[2] today in most browser places (actually, all places, if you creatively polyfill.) For newer-fangled browsers only, you have Web Assembly[3] + Web Workers, which takes in-browser performance to a whole 'nother level.

Now, the Web Assembly spec isn't stopping at replacing Asm.js. There's going to be support added for SIMD and vectors and whatever cool stuff newer processors can do. The big deal will be once Web Assembly gets the ability to do syscall-like thunks to DOM API and the other myriad JavaScript API. At that point, the browser's a full OS platform capable of hosting most applications. This new thing will then be able to go beyond any previous application platform in terms of reach and ultimately, capability.

Just forget about the term "browser" when the thunks appear. This will be a new "Platform 1.0" that will span all modern operating systems and devices. Node.JS will implement a compatible "Platform 1.0" subset for server-side platforming where client and server will be able to share binary libraries.

This is coming soon. If you've got some great idea for a next generation online app, it's time to start working hard and have it target "Platform 1.0"!

[1] http://asmjs.org/

[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...

[3] http://webassembly.org/

rixed · on Sept 25, 2017

Browser need to come with a bytecode that would run into the local container on the local operating system running on the VM on another operating system running on top of real hardware, so we can finally have a true app platform.

icebraining · on Sept 25, 2017

What's your point, exactly? Shipping bytecode is essentially the leanest way you can achieve platform-independent code execution, no?

rixed · on Sept 26, 2017

My point is that we could ship bytecode long before browser were even invented, therefore I doubt that the explosion of complexity is caused by us trying to solve that problem.

This is easy to test: we just have to wait until we can ship byte code through the browser, and see how long it takes for another layer of abstraction missing some important feature to pop up on top of it.

andrewguenther · on Sept 24, 2017

Don't you already have this with WebWorkers? This isn't a limitation of the environment per se as much as it is a limitation of the DOM.

pmontra · on Sept 25, 2017

The DOM wants to be accessed sequentially by a single thread, then we ended up to say single threading is good because we don't have alternatives and because we're using the same frontend technology for backend jobs (v8.) It's a kind of Stockholm syndrome.

Sequential access to the DOM can be ok because we are the only user of the browser. Single processing is not so ok on the backend because there could be thousands of users there. We scale Ruby, Python and Node with multiple processes (I'm doing it.) I'm also developing an Elixir application using Phoenix. The approach is similar to writing Rails or Django code (I never sent a single message in all the application) with the convenience of not having to manage sidekiq or celery for background jobs (they're in the language) and autoscaling.

annon23 · on Sept 24, 2017

we should develop software using different languages as well, web workers are part of the solution

solarengineer · on Sept 25, 2017

Java Applets - run bytecode in a VM (called the JVM) in the browser, and use multi-threading supported by the JVM.

chmod775 · on Sept 25, 2017

Can we stop repeating the meme that Node.JS is not multi-threaded?

True, your JS code all runs in a single thread, but all the heavy lifting is behind asynchronous interfaces, delegating the work to a threadpool in the background.

What a Node.JS application really is, is a supervisor thread (your JS code) giving work for a bunch of worker threads. Doesn't sound very single threaded to me.

Mithaldu · on Sept 25, 2017

Can you start a thread with a JS loop in node so you can have two loops running at the same time doing CPU-heavy tasks?

If not, then calling it multi-threaded is misleading to a programmer who needs such things, and instead you'll need to find a more accurate word.

chmod775 · on Sept 25, 2017

This will repeatedly run 100000 iterations of PKBDF2 in 10 threads:

    process.env.UV_THREADPOOL_SIZE = 10;

    function startOne() {
        const start = Date.now()
        require('crypto').pbkdf2(String(start), 'secretSalt', 100000, 512, 'sha512', (error, result) => {
            console.log(`${result.length}b took ${Date.now() - start}ms`);
            startOne();
        });
    }

   for (let i = 0; i < 10; i++) startOne();

You can use htop or another tool of your choice to confirm this is the case.

Mithaldu · on Sept 26, 2017

Are the threads started by way of external process, thread started within a C library, or started with an actual Node.js call i can write directly in node.js code and also use to run literally arbitrary node.js code?

chmod775 · on Sept 27, 2017

If I'm going to assume that by "node.js" code, you actually mean JavaScript code running within the V8 engine as a part of the whole that is Node.JS, then: The answer is no, you cannot do that using pure JS and without using (external) stuff like WebWorkers.

If you are talking more generally about anything we can run within Node.JS, is using Node.JS user-facing APIs and works without doing any modifications to Node.JS, the answer is yes. You can easily achieve that with node's native module support, and I would encourage you to do so over committing the folly of doing anything CPU intensive in JS.

That is if you manage to find any CPU intensive task that isn't already handled by node built-ins or some library out there.

Edit:

Now you may say: "Ahah! You can't use threads using [limited usage of node]. Also threads must be usable for arbitrary workloads in order to consider a thing multi-threaded. Node.js is single-threaded!"

I just can't argue with that. It's just a matter of opinions now about what makes a thing/environment single-threaded, and what makes it multi-threaded.

To me it's simple: [x] Thing uses multiple threads to perform work.

romanovcode · on Sept 25, 2017

> delegating the work to a threadpool in the background.

Yes, delegating to the C++ modules in the background, which do support multi-threading.

These modules have nothing to do with Node directly so meme stays - node itself is single-threaded no matter how you spin this off.

chmod775 · on Sept 25, 2017

> delegating to the C++ modules in the background

Eh, what are these if not part of node?

What kind of definition are you applying so that a thing is suddenly no longer the sum of its parts?

Edit: JavaScript may be single threaded. Node.JS is not. JavaScript is but a part of Node.JS.