Hacker News new | past | comments | ask | show | jobs | submit login

I'm not an expert on file formats so I looked into Wikipedia. Here's what it says on PNG[1]:

  A PNG file starts with an 8-byte signature.
  The hexadecimal byte values are 89 50 4E 47 0D 0A 1A 0A;
  the decimal values are 137 80 78 71 13 10 26 10. 
So if a file starts with 89 50 4E 47 0D 0A 1A 0A, you know it may be a valid PNG, otherwise you know it's not.

GIF starts with another marker at zero offset, so no valid GIF is a valid PNG, and vice versa.

Some formats are mutually exclusive because they “fight” for contents of first several bytes.

Some formats are more relaxed and introduce the exploited possibility of carefully engineered ambiguity.

edit: removed a section that was utterly wrong

[1]: http://en.wikipedia.org/wiki/Portable_Network_Graphics




It's a little more complicated than that, actually. Any given application of a file format may use various obfuscation techniques on the file's header or contents that render the file invalid from the perspective of the published standard (if there is one; it is also common in these cases to change the file extension to further disguise what format the file actually uses). Programs that do this may or may not de-obfuscate the file prior to use, depending largely on how and why the file was obfuscated.

For instance, a common obfuscation method is simply removing the magic number from the file; in this case, the program may simply try to use the file as the given format and return an error (or crash; we are talking largely about proprietary software in these cases after all) if the file can't be read.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: