Author of the article here; I suppose I should confess that I did consider autom...

tlrobinson · on April 18, 2019

> the sorting was the time-consuming part

I'm not sure I understand why sorting is necessary.

> Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image.

It's actually very easy: my program outputs images annotated with each recognized Skittle circled with the guessed color: https://imgur.com/a/jlPWXRf You don't need to manually count, just make sure all the circles are the correct colors.

I'm also pretty confident you could improve the accuracy quite a bit by improving the lighting when taking the photos, and/or incorporating something more sophisticated like a neural network (probably trained with Skittles from several bags, separated by color+rejects, with many photos in different random arrangements)

possiblywrong · on April 18, 2019

> You don't need to manually count, just make sure all the circles are the correct colors.

I guess this takes me a while, and seems significantly more error-prone to me than the corresponding re-count when they are sorted. Granted, it's pretty easy when the Skittles are sorted as they are in these images. But how quickly can you scan this image: https://imgur.com/a/KpddGdH and check whether there are any errors? And how confident are you in that visual check?

You are right that this approach may be good enough to find a duplicate, which was the primary objective of the experiment. But I had hoped that this might also serve as a useful dataset for future student exercises, in probability, or even in just this sort of computer vision project... but I wanted to have accurate ground truth, so to speak. Inspecting your spreadsheet, it looks like this algorithm is still less than 95% accurate, even if we only evaluate the "clean" images with Uncounted=0.