Node.js 0.2.0 released

pornel · on Aug 20, 2010

So the API with ASCII as the default encoding is frozen :(

simonw · on Aug 20, 2010

I got the impression that was a performance trade-off. UTF-8 decoding/encoding isn't free.

pornel · on Aug 20, 2010

I know it's a trade-off, but IMHO it's a very poor one.

That's premature optimisation. API is forever. This decision sacrificed easy internationalisation and correctness of data for minor performance benefit in current implementation.

It's a big deal, because node.js isn't merely encoding-ignorant (like PHP), it actually removes higher bits. If you forget to specify encoding somewhere, your text will be malformed.

saikat · on Aug 20, 2010

UTF8 seems to be the default encoding. I think he had just forgotten to update the docs for .write:

http://github.com/ry/node/commit/7fc794a8ec55bd9d137c4888404...

mathias_10gen · on Aug 20, 2010

That sounds like a good reason to just leave the data in UTF8 rather than converting to ASCII embedded in UTF16 which is broken by design.

morphir · on Aug 20, 2010

let's see.. ascii is 4 bit. UTF8 is 8 bit. Is this really an issue on todays computers?

mhansen · on Aug 20, 2010

ASCII is 7-bit (encoded in 8 bits - the high bit is ignored) and UTF-8 takes 8 bits for most characters, but can take 16+ bits for some characters.

Node is built for massive scalability on applications that (mostly) pass text from one source to another. Thus, having to convert the encoding of every string that passes through node can be a bottlenck.

Felixge has a good writeup of this: http://debuggable.com/posts/streaming-utf-8-with-node-js:4bf...

pmjordan · on Aug 20, 2010

UTF-8 takes 8 bits for most characters

It should be noted that "most" here presumably means "most characters in an average English or western/central European language text" as out of the ~2^21 (~2 million) Unicode code points, only 128 are represented using 8 bits in UTF-8.

pornel · on Aug 20, 2010

It doesn't matter. Whenever ASCII is an option, UTF-8 is optimal too.

ASCII is not an option for languages other than average English with poor typography and inability to deal with foreign names and addresses (e.g. LinkedIn made horrible mistake of using Latin1 initially. I still have contacts with &xxxx; visible in their names).

I think node.js should use UTF-8 by default, and require users to consciously switch bottleneck parts of their apps to ASCII.

pmjordan · on Aug 20, 2010

I wasn't stating my opinion in my last post, just facts/clarifications.

But yes, I agree that UTF-8 would be a better default than ASCII unless someone provides hard evidence that encoding/decoding is a severe performance bottleneck in most real applications. (even then, I'd default to the "correct", not the fastest)

Sephr · on Aug 20, 2010

Felix doesn't seem to realize that JavaScript already has native functions for this. All of his code can be simplified to decodeURIComponent(escape(utf8ByteString)).

simonw · on Aug 20, 2010

It's not the size, it's the work needed to decode and encode data.

Node.js is really, really good at shunting I/O around - it's ideal for writing things like proxies and file upload handlers. With ASCII, the bytes that come in are the bytes that go out again. If you're dealing with UTF-8 and unicode strings every time some data comes in you need to decode it as UTF-8, then pass the unicode string around within Node, then encode it back to bytes before you send it off again.

That makes a lot of sense for a web framework like Django (in fact it's what Django does) but Node is more of an I/O toolkit, so that performance overhead isn't welcome unless it's explicitly needed.

pornel · on Aug 20, 2010

Modern CPUs are constrained by speed of memory, and amount of calculations you do on each byte doesn't matter that much. Node.js already takes the hit by copying memory to convert UTF-16 to ASCII.

pmjordan · on Aug 20, 2010

The uses you mention sound like something for which you'd use byte buffers, not strings.

grantj · on Aug 20, 2010

Which part of the API has ASCII as the default encoding? From the v0.2.0 docs it seems like Buffer objects default to utf8.

pornel · on Aug 20, 2010

Request, response and streams default to ASCII, e.g. response.write(chunk, encoding='ascii')

grantj · on Aug 20, 2010

That should really be in big red text in the docs considering it actually destroys bits. The api also seems inconsistent wrt net.Stream writes are encoded in ASCII but plain writable streams default to utf8: stream.write(string, encoding='utf8', [fd])

ryah · on Aug 20, 2010

docs are wrong, it defaults to utf8

kennu · on Aug 20, 2010

Finally, a (hopefully) frozen API for 0.2.x!

I hope Ubuntu 10.10 Maverick gets a stable 0.2.x version of Node.

jdub · on Aug 20, 2010

If you want to keep track of hot, fresh node-y goodness independently of the Ubuntu release cycle (as I do), then please enjoy my nodejs PPA builds.

They're built for lucid, but run fine on maverick, and like everything else in that PPA, are used in production (thus I have an incentive to maintain them well).

  https://launchpad.net/~jdub/+archive/ppa

Enjoy!

(Note: I build a static version of node built against the internal copy of libraries it ships, rather than the dynamic build used by the main Debian and Ubuntu node packages. I really only do this to avoid maintaining those libraries in my PPA as well, and ryah keeps up with their updates anyway.)

fortes · on Aug 20, 2010

Thanks for doing this. Any idea if they'll ever make it into the official reps?

I've avoided installing much from PPA (never sure about security / stability there), but this may break my habit.

tomggb · on Aug 20, 2010

Nothing like a big release days before Node Knockout! I can't wait, either way.

chopsueyar · on Aug 20, 2010

What types of projects have you used node.js with?

spahl · on Aug 20, 2010

A lot of projects where I would have used python twisted (http://twistedmatrix.com) in the past, I now use node.js.

It's all projects where I need to connect different protocols together. Like amqp & websockets for example.