Interactive Word Clouds in D3.js

mbostock · on Feb 9, 2012

This is a truly impressive implementation, based on previous work by Jonathan Feinberg [1]. The display uses SVG, but the character outlines are computed by canvas bitmaps. Then, Jason implemented hierarchical bounding boxes to accelerate the intersection checks. Be sure to play with the draggable "0" and "1" letters beneath the demo!

Try clicking on words to navigate between word clouds. The cross-fade transition is beautiful, and words that overlap between clouds transform in their new position.

Non-English word clouds look amazing, too:

http://www.jasondavies.com/wordcloud/#http%3A%2F%2Fsearch.tw...

[1] http://static.mrfeinberg.com/bv_ch03.pdf

jasondavies · on Feb 9, 2012

Thanks!

Minor correction: I implemented hierarchical bounding boxes separately, as per Jonathan Feinberg's paper, but they turned out to be slower in all the cases I tried (even large words and areas). I was also using a quadtree to cut down the number of comparisons with previously-placed words.

In my version, once a word is placed, I copy it to the relevant position in a large sprite representing all the words placed so far. So placing a new word means it only needs to be compared with the candidate area of the large sprite, rather than multiple comparisons with all previously-placed words.

I'd like to try a hierarchical sprite version, where you compare against coarse-grained sprites first of all. This would essentially be a quadtree. The implementation would be a bit trickier because I'm also compressing blocks of 32 1-bit pixels into 32-bit integers, which also helped with performance.

brianstaats · on Feb 9, 2012

Well done Jason! Finally, a tag cloud written in an accessible technology. Thoughts on using an invisible bounding container for the words? Example: silhouettes, shapes, words. What other features have you contemplated but have not implemented?

jasondavies · on Feb 9, 2012

Thanks!

Yes, I've thought about using invisible silhouettes, which presumably is how http://www.tagxedo.com/ works. The placement algorithm might be a bit different though, perhaps starting at the centroid of available areas and so on. I think it would work better to have a smaller pool of words and allow reuse, or alternatively they could be sized randomly (perhaps with some weighting).

The sprite collision code could certainly be reused for this, though!

bazitov · on Feb 12, 2012

Great work Jason!

I was amazed of the beauty of Wordie but then I loved Darth Vader (Figure 3-11. Do not underestimate the power of the randomized greedy algorithm). The words reuse should definitely need to be allowed, but the result will be very beautiful when applied to smoothly changing contour (shape).

Is there a problem if I try to reimplement your code in C++ (especially openframeowks)?

jasondavies · on Feb 14, 2012

Thanks!

Yeah, I'll definitely try the randomised greedy algorithm when I get time, I think it should be fairly straightforward now that I have the sprite collision primitives in there.

No problem at all if you want to reimplement in another language, the license is BSD: https://github.com/jasondavies/d3-cloud/blob/master/LICENSE

I'm guessing it will be blazingly fast using the bitwise operations in C/C++. :)

NelsonMinar · on Feb 9, 2012

So many details to love about this. The little angle control at bottom middle is beautifully interactive. Also the transitions between clouds as you change parameters or choose new words.

indubitably · on Feb 14, 2012

This is awesome.

Curious detail: any idea why searching for "Islam" seems to make things go haywire?

http://www.jasondavies.com/wordcloud/#http%3A%2F%2Fen.wikipe...

Perhaps it's to do with the Arabic script corrupting the SVG somehow?

jasondavies · on Feb 15, 2012

Seems to be working fine for me.

I occasionally get gzipped data back from Wikipedia (even though I explicitly set Accept-Encoding in the proxy) - I think it's due to their intermediate caches not respecting the Vary headers. I plan on adding gzip support soon to rectify this, but until then, you can get around it by adding ?foo to the end of the custom URL.