The Whois protocol, RFC 812, does not specify the result format returned. The protocol is actually incredibly simple; you connect to port 43 via TCP, send a query delimited by a CRLF, the server sends back a plain text, human readable response.
Now, many registrars use a format that has lines that look like "Name: Value". But not all do. And those that do will frequently include other text, including terms of service for using their WHOIS service or providing instructions. And some don't use a "Name: Value" format, or they allow for multi-line indented values, or whatnot. Some seem to use "%" to delimit informational lines, as if it's a comment characters, while others don't bother to use anything.
So, parsing whois is a bit more like scraping the web than like parsing a real file format. It can break sometimes.
I understand, and while I do have experience in text processing I would not want to take on a project like this. The number of people responding about this raises the question, though: why is this being reinvented so much? My impression is that everybody writes their own parser, which smells like NIH syndrome.
Now, many registrars use a format that has lines that look like "Name: Value". But not all do. And those that do will frequently include other text, including terms of service for using their WHOIS service or providing instructions. And some don't use a "Name: Value" format, or they allow for multi-line indented values, or whatnot. Some seem to use "%" to delimit informational lines, as if it's a comment characters, while others don't bother to use anything.
So, parsing whois is a bit more like scraping the web than like parsing a real file format. It can break sometimes.