"... however, that none of these balancing options are resource-aware: there is no “balance shards across nodes by the size of the shards” flag to set or knob to turn."
I was under the impression that shards are kept uniform in size as ES will try and equally spread data to all shards in an index, so that there aren't imbalances.
You can manually route data to a specific shard when reading/writing, which would cause a shard to be much larger in size and use more resources, but generally there are very few cases (0.01%) when this is a good idea.
Again, I'm not sure how the shards are so imbalanced.
"Node A will degrade faster than the rest of the cluster due to extra CPU use, memory writes, and disk read/writes."
Also, remember that you can scale reads using replicas. Writes initially only happen on primary shards, and then propagate to replicas.
Not trying to be nit-picky. I'm just not sure why the shards are so imbalanced in your example. This seems more like a hack for a poor model/architecture.
At some point you'll have to redesign your ES architecture.
I'm a Sr Dev here at Datarank in charge of our ES architecture. Maybe I can shed some light on your points.
"I was under the impression that shards are kept uniform in size as ES will try and equally spread data to all shards in an index, so that there aren't imbalances."
This is true by default and it's fine for simple document retrieval but it doesn't scale well if you want to do complex aggregations on arbitrary filters of large datasets. For that you need document clustering. In our case (and probably others), the clustering can't be done uniformally, see http://engineering.datarank.com/2015/06/30/analysis-of-hotsp....
"Writes initially only happen on primary shards, and then propagate to replicas." - also true but you still want this to be as distributed as possible. In some cases we are bulk loading millions of documents. We want the load to be equally distributed as possible while still allowing for clustering as mentioned above. Also, we want to minimize heap usage per node for GC and buffering performance.
"At some point you'll have to redesign your ES architecture." - Perhaps but this system scales to 100s nodes easily and maybe 1000s. We allow our customers to perform very complex aggregates on some pretty large datasets with 50ms response times. All other architectures we tried failed to scale well.
Clustering of comments is vital to our use case for performance. Our data is distributed log-normally. We must be able to scale quickly and easily. The default setup didn't scale well with our volume. All of these factors lead us to this solution.
aewhite covered most points, but I'd also point out that a simple use case of a cluster with multiple indices of varying sizes (such as using ES as part of the ELK stack to store logs - a new index is created every day) will run into many of the same problems with the default balancer. Since the number of shards per index isn't configurable after index creation, shard size growth and disparity is unavoidable.
"... however, that none of these balancing options are resource-aware: there is no “balance shards across nodes by the size of the shards” flag to set or knob to turn."
I was under the impression that shards are kept uniform in size as ES will try and equally spread data to all shards in an index, so that there aren't imbalances.
You can manually route data to a specific shard when reading/writing, which would cause a shard to be much larger in size and use more resources, but generally there are very few cases (0.01%) when this is a good idea.
Again, I'm not sure how the shards are so imbalanced.
"Node A will degrade faster than the rest of the cluster due to extra CPU use, memory writes, and disk read/writes."
Also, remember that you can scale reads using replicas. Writes initially only happen on primary shards, and then propagate to replicas.
Not trying to be nit-picky. I'm just not sure why the shards are so imbalanced in your example. This seems more like a hack for a poor model/architecture.
At some point you'll have to redesign your ES architecture.