K-Means Clustering and Art

tylerneylon · on Dec 31, 2011

I wrote some similar code in python recently -- I think color visualizations like this are both fun and potentially useful for certain image manipulations.

This includes a readable 36-line implementation of k-means clustering that could be shorter if one wanted to play some code golf :) I used a pie chart layout, with pie slices proportional to their corresponding cluster sizes.

Code: https://github.com/tylerneylon/imghist/blob/master/imghist.p...

Sample images: http://blog.zillabyte.com/post/11193458776/color-as-data http://blog.zillabyte.com/post/13141231882/hue-histograms

If anyone else is interested in this stuff, Austin A made a great suggestion on the original post to use the Lab colorspace.

gburt · on Dec 31, 2011

I also did something similar to this recently, but in PHP. I did grouping similar colors (although I rendered mine as tiled squares), as well as a few other things like grouping by brightness. Some neat results, for sure. I can put the code up for it if anyone is interested.

Here was the quick k-means implementation I threw together if anyone wants to play with it (my whole library licensed GPL).

https://github.com/gburtini/Learning-Library-for-PHP/blob/ma...

It could definitely use some serious cleaning up (and I will probably OO-ize it when I get a chance -- or I'll take pull requests), but it definitely works.

mturmon · on Jan 1, 2012

In general, you can toss any set of numbers into a clustering algorithm, and it's kind of interesting to puzzle over the structures that come out. The more you know about the domain, the more interesting it tends to be.

PCA can be the same way. You toss images or whatever in, and out come either eigenvectors or principal components of the images. Either way it's often interesting to domain experts.

nmb · on Jan 1, 2012

Worth noting that Google Chrome also uses K-Means to select the color of the stripe below a website thumbnail: http://www.quora.com/Google-Chrome/How-does-Chrome-pick-the-...

Sharlin · on Jan 1, 2012

Yes, k-means clustering is well known for its use in color quantization (for instance, reducing the color depth of a 24-bit image to a 8 bit paletted representation that most faithfully captures the original.) Another popular algorithm is median cut which uses an k-d tree to recursively subdivide the color space based on the median color values of the pixels in the source image. Just about any image manipulation program that can output paletted images probably uses one of these algorithms.

seanp2k2 · on Dec 31, 2011

OK, so I don't have much problem domain knowledge here, but couldn't you optimize the cluster size based on algorithmic bounds on variation within the cluster?

diiq · on Dec 31, 2011

Wonderful. I am envisioning a live pair of images, palette and picture, so manipulations of the palette altered the picture.

Newky · on Dec 31, 2011

I could be wrong on this, but we learned about K-means clustering this year in college, as far as I know, the K random exemplars which you use for K-means clusters obviously reduces the number of colors in the image to a certain N ( N <= K). This would mean that any live manipulation of the K colors would actually only modify the image which consists of only N colors.

The net result would be a live image like you suggest, but one with much less detail. Still very interesting though.

tylerhobbs · on Dec 31, 2011

K-means is only grouping all of the pixels in to K disjoint clusters, not actually altering the value of the pixels. Taking an entire cluster of pixels and shifting their values in the RGB space could produce some very interesting images. For example, you could easily make all of the almost-white pixels slightly blue. There would be no loss of variety among the pixels, they would just all have their blue values bumped up.

gburt · on Dec 31, 2011

You compute 'centroids' which denote the centers of the clusters, but you don't have to change the values of all points to the centers of the clusters.

In other words, you can maintain the detail in RGB space (as this author has) while reorganizing things in location space by their k-means clusters.

Newky · on Dec 31, 2011

I understand this, but does this mean that it is bi-directional in that a change to one of the palettes would reflect in the image. This is what I understood from the comment. If so, how does this work? Sorry for any misinformation in my comment

bfrs · on Jan 1, 2012

Does anyone know how k-means ranks as an image segmentation technique? How does it compare to say watershed, meanshift, globalPb, etc.?