Hacker News new | past | comments | ask | show | jobs | submit login

Author of the article here; I suppose I should confess that I did consider automating the counting, which did seem like an interesting problem... but I rejected the idea, for what I think is good reason: the sorting was the time-consuming part. If we skip the sorting, and depend on the code to get the counting right, and we assume that we are at worst off by one-- or more realistically, off by two, to account for the questionably-identifiable chunks of paste and such-- then how often do we need to go back for manual verification?

With hindsight (although up-front simulation bore this out as well), we see that we would have to go back and manually verify anywhere from 8% (off-by-one) to 25% (off-by-two); that's every fourth pack. Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image. In short, I'm unconvinced that an automated-- while similarly accurate-- accounting would be that much more efficient.




> the sorting was the time-consuming part

I'm not sure I understand why sorting is necessary.

> Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image.

It's actually very easy: my program outputs images annotated with each recognized Skittle circled with the guessed color: https://imgur.com/a/jlPWXRf You don't need to manually count, just make sure all the circles are the correct colors.

I'm also pretty confident you could improve the accuracy quite a bit by improving the lighting when taking the photos, and/or incorporating something more sophisticated like a neural network (probably trained with Skittles from several bags, separated by color+rejects, with many photos in different random arrangements)


> You don't need to manually count, just make sure all the circles are the correct colors.

I guess this takes me a while, and seems significantly more error-prone to me than the corresponding re-count when they are sorted. Granted, it's pretty easy when the Skittles are sorted as they are in these images. But how quickly can you scan this image: https://imgur.com/a/KpddGdH and check whether there are any errors? And how confident are you in that visual check?

You are right that this approach may be good enough to find a duplicate, which was the primary objective of the experiment. But I had hoped that this might also serve as a useful dataset for future student exercises, in probability, or even in just this sort of computer vision project... but I wanted to have accurate ground truth, so to speak. Inspecting your spreadsheet, it looks like this algorithm is still less than 95% accurate, even if we only evaluate the "clean" images with Uncounted=0.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: