The kmeans metric is exactly the metric you would want to optimize the performance of an algorithm like [bolt](https://arxiv.org/abs/1706.10283). In that and other discretization routines, the value of k is a parameter related to compression ratio, efficiency, and other metrics more predictable than some ethereal notion of how many clusters the data "naturally" has.
I've used something similar for tissue segmentation from hyperspectral images of animals where I know there should be K different tissue types I care about.
Google's Material You uses this to initiate color theming
(n.b. Celebi's, note usage of Lab / notHSL, respect Cartesian / polar nature of inputs / outputs, and ask for high-K, 128 is what we went with but it's arbitrary. Can get away with as few as 32 if you're ex. Doing brand color from favicon)
I did a modified version of this once for a map of auto dealerships, although rather than working with a fixed k, I used a fix threshold for cluster distance. The algorithm I was working with had O(n³) complexity so to keep the pregeneration of clusters manageable, I partitioned data by state. The other fun part was finding the right metric formula for measuring distances. Because clusters needed to correspond to the rectangular view window on the map, rather than a standard Euclidean distance, I used d = max(Δx,Δy) which gives square neighborhoods rather than round ones.
I implemented an algorithm which used k-means to reduce noise in a path tracer.
For each pixel instead of a single color value it generated k mean color values, using an online algorithm. These were then combined to produce the final pixel color.
The idea was that a pixel might have several distinct contributions (ie from different light sources for example), but due to the random sampling used in path tracing the variance of sample values is usually large.
The value k then was chosen based on scene complexity. There was also a memory trade-off of course, as memory usage was linear in k.
Online means it processes the items as they come[1]. This means the algorithm can't consider all the items at once, and has to try to be clever on the spot.
In my case the algorithm uswd would use the first k samples as the initial means, and would then find which of the k means were closest to the current sample, and update that mean[2].
Given that in parh tracing one would typically use a fairly large number of samples per pixel relative to k, this approach did a reasonable job of approximating the k means.
Measuring multiple physical objects with the same sensor, you can use k-means to separate the measurements from each object, given that you know how many objects are being sensed. I can't get more specific than that.
So after re-reading your comment a few times, I am left with this thought: either you don't understand what k-means clustering is, or I don't understand what k-means clustering is. I wouldn't describe myself as a machine learning expert, but I have taken some grad level classes in statistics, analytics, and methods like this/related to this.
Not GP, but I understood their question as follows.
Assume you collect some kindergartners and top NBA players into a room and collect their heights. Now say you pass these to two hapless grad students and ask them to
perform K-means clustering.
Suppose one of the grad students knew the composition of the people you measured and can guess these height should clump into 2 nice clusters. The other student who doesn't know the composition of the class - what should they guess K to be?
I understood the GP's comment to refer to the state of the second grad student. How useful is K-means clustering without knowing K in advance?
>I understood the GP's comment to refer to the state of the second grad student. How useful is K-means clustering without knowing K in advance?
There are several heuristics for this. Googling I see that the elbow method, average sillhouette method and gap statistic method is the most used.
I think you could play around with your own heuristics as well. Simple KDE plots showing the amount of peaks. Maybe, say the variance between clusters should be greater than the variance inside any cluster could maybe work. (Edit: this seems to be the main point of the average sillhouette method).
The first problem is picking k. The second problem is the definition of distance or, equivalently, the uniformity of the space. Naively using the Euclidean distance in an embedding where similarity is non-uniform leads to bad outcomes. This problem is solved by learning a uniform embedding, and this is much harder than running k-means.
k-means assumes these hard parts of the problem are taken care of and offers a trivial solution to the rest. Thanks for the help, I'll cluster it myself.
You have to choose the number of clusters, before using k-means.
Imagine that you have a dataset, where you think there are likely meaningful clusters, but you don't know how many, especially where it's many-dimensioned.
If you pick a k that is too small, you lump unrelated points together.
If k is too large, your meaningful clusters will be fragmented/overfitted.
There are some algorithms that try to estimate the number of clusters or try to find the k with the best fit to the data to make up for this.
Couldn’t you make some educated guesses and then stop when you arrive at a K that gives you meaningful clusters that are neither too high level nor too atomized.