I cannot come up with a convincing reason that the 2% is missing however - or if the 2% is in addition to. Which would raise even more weird questions.
edit: [#] that seems a bit aggressive, but I am not aiming at the parent post here, apologies if it reads badly. I think I mean these guys would make it onto Americas Dumbest Hackers TV special if that were the case.
The md5 sum of the UDIDs list contains 1337 on purpose.
It seems to me that the easier way to achieve this is either by randomly modifying the order of lines, which they didn't do, or to add/remove some bytes.
It could be the 2% diff between the two files : bruteforcing the md5 hashing by removing some random lines till they got a md5 digest containing 1337.
They also said the data was taken "in the past 2 weeks". Assuming that Blue Toad's data changes daily or so, getting a 98% match to their "current" data seems reasonable. If the data changes "real time" then it might be impossible to find the exact time when the data matches 100%.