I'm not very familiar with this style of programming puzzle, but I wonder how this is less arbitrary than whiteboard coding topcoder or kaggle. I know people who were hired based on all of these, but the latter two are different from Starfighter in that they exist for their own reasons (competition for its own sake, and getting answers to data science questions) with providing a signal to employers being a side effect. I wonder if the selection bias in a system set up for the purpose of employment will be better or worse.