CPython supports several internal representations from one to four bytes per character to optimize for space and performance. There's also a nifty sort of Bloom filter for quick discrimination of strings that might contain characters of interest.
When working as an AIX kernel program in 1985, I set registers to a unique value so it would be easy spot code that tried to use an uninitialized value. My choice: 0xdeadbeef. Good to see that constant is still in use.
Whenever I find myself having to change a MAC address, I end up using DEADBEEFCAFE.
I hope I never forget about changing them back and end up having to debug two different machines with the same MAC (which has actually happened to me in the wild, with two machines coming out of factory with the same MAC, talk about bad luck and shitty quality control).
I've seen duplicate MACs twice in the last few years, on two different lines of embedded/consumer electronics boards from two different factories. There was a kind of Abbot & Costello routine that went on the first time, when a Taiwanese colleague with limited English reported the problem to me.
This is a technique useful in many different places. Using enums/defines that don't start at 0 in C/C++, for instance, helps debugging when you're dealing with possible memory corruption. Likewise, making sure related enums don't overlap in values helps disambiguate logic errors and other potential bugs when those enums are used in data structures.
This adds the original word to the left of the hex. It makes the list much more scannable.
Tweaking the dots to count 10 shows that there are apparently 33 choices for WEP codes, if you are so inclined. (Which of course you shouldn't be, but, well...) And, alas, there do not appear to be any 64-bit constants according to my dictionary, though there's enough 32-bit coices to have some fun with phrases ("collated catcalls", "sadistic sabotage", "fattiest feedbags", "besotted ascetics", etc.). And that's just the even 8-8 phrases, 9-7 has almost as many ("godless geodesics", "falsest statistic", 0x7a55e11edb00b1e5).
Thank dhart for an introduction to the "< /file program" idiom that puts file first without resorting to cat, and for column for that matter. Neat. In return I'll offer perl's new /r flag which "returns" the result of tr// or s/// rather than some count.
I'm not seeing any meaningful exploits coming from this. You can maybe send a request that will fail but I can't see any sort of injection taking place.
Modern browsers don't support UTF-7 any more after a number of XSS attacks relying on inserting UTF-7 encoded script elements which then cause the document to be sniffed as UTF-7.
The only place UTF-7 is still widely used is in email clients.
When UTF-8 was first defined, they didn't know how big the Unicode range was going to be, so they defined it as a 1-6 byte encoding that could encode any 32-bit codepoint.
When Unicode was deemed to end at U+10FFFF (because that's the largest value that UTF-16 can encode), UTF-8 was revised to be a 1-4 byte encoding that ends in the same place.
Python clearly implements UTF-8 in a way that uses at most four bytes per codepoint (why support five and six byte sequences if they'll never be used?). I think what we're seeing in '\xfb\x9b\xbb\xaf' is four bytes out of a six byte sequence.
That's really cool! Character encoding issues is something we wrestle with all the time, and it is surprisingly hard to reason about all the ways supposedly "string" data are handled in the course of a typical workflow. I cringe; I hadn't even considered bugs in the encoding and decoding process itself.
This reminds me of Godel's incompleteness theorem - which I'll poorly present as: Any system that is sufficiently complex and complete will contain legal assertions that will disprove or destroy the system. (Those that do not are not complete).
Neither throwing an exception nor having a perfectly-deterministic buggy behavior is what Godel was referring to. This shouldn't remind you of anything related to the incompleteness theorem, because it's completely unrelated.
I don't want to be condescending, but that isn't what the theorem says. (I'm not even sure it's true.) Incompleteness means there is a true statement, that cannot be proved true inside the system.
http://hg.python.org/cpython/file/tip/Objects/unicodeobject....
CPython supports several internal representations from one to four bytes per character to optimize for space and performance. There's also a nifty sort of Bloom filter for quick discrimination of strings that might contain characters of interest.