Hacker News new | past | comments | ask | show | jobs | submit login

Now we only detect dying clusters while keeping k constant. Dynamic values of k would be a nice feature to add in later releases:)



This is a common question for me: how to determine dynamically the number of clusters (including the splitting / merging). I've looked into jenks breaks, but it also seems to require a cluster number.

http://en.wikipedia.org/wiki/Jenks_natural_breaks_optimizati...

Do you have any advice or ideas for automatically picking count of clusters in an unknown data set?


One common approach is to look for the elbow in the curve <metric> vs K (number of clusters). This is essentially finding the number of clusters after which the rate of information gained/variance explained/<metric> slows. I believe it's possible to binary search for this point if you can assume the curve is convex.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: