Every time I've had to deal with Unicode and internationalization, it's been a problem.
For example, a few years ago I grabbed a source tarball from somewhere, I forget what or where. It had the author's name in a comment, which included an O with dots over it. That was the only non-ASCII character in the source code. No matter what I did, both Eclipse and command-line javac refused to compile the source.
Finally I wrote a script to delete his name from every source file manually. It compiled flawlessly.
Then there's the time I found some text files with two characters of binary junk at the beginning, followed by completely normal text. Again, I forget what I was doing, but some program was refusing to process them correctly. It was something internationalization-related called the BOM. Eventually I ended up writing a script to walk a directory and remove the first two bytes of every file. (This can probably be done with dd and xargs on UNIX, but I was using Windows at the time, which means that something like this will require spending an hour or so in your favorite programming language.)
These experiences lead me to believe that, for bootstrapped USA startups at least, you shouldn't worry about a market outside the English-speaking world.
If you need to worry about junk like accented characters or moon runes (Chinese/Japanese/Korean characters), it means you're big enough to afford to hire someone specifically to address the problem.
I assume this is a not very subtle troll? Java source is unicode? (The offhand reference to dd and xargs is a bit too much).
How do you define "English-speaking world", btw? Those too ignorant to have heard of non-ascii-characters (ie: excluding Canada, as anyone doing business there should at least have heard of French)?
Anyway, for anyone actually burnt by something similar on a GNU system try looking up recode(1).
And personally I think to exclude all internationalisations because they're harder is a terrible attitude to have. Particularly these days when there's an online tutorials for pretty much any job imaginable (not to mention the numbers of helpful experts willing to give up their time for free on various forums and communities).
> which means that something like this will require spending an hour or so in your favorite programming language
Ok, this is where I stop worrying about how quickly I write code. Did this (removing BOM) quite a few times and it took just a few minutes in Python (under Windows). Heck, this could be two-liner I think :)
I, for one, applaud this attitude. It gives programmers and companies that know what they're doing a leg up over people who couldn't even bother to figure out UTF-8. Natural segmentation of a target market is a good thing.
For example, a few years ago I grabbed a source tarball from somewhere, I forget what or where. It had the author's name in a comment, which included an O with dots over it. That was the only non-ASCII character in the source code. No matter what I did, both Eclipse and command-line javac refused to compile the source.
Finally I wrote a script to delete his name from every source file manually. It compiled flawlessly.
Then there's the time I found some text files with two characters of binary junk at the beginning, followed by completely normal text. Again, I forget what I was doing, but some program was refusing to process them correctly. It was something internationalization-related called the BOM. Eventually I ended up writing a script to walk a directory and remove the first two bytes of every file. (This can probably be done with dd and xargs on UNIX, but I was using Windows at the time, which means that something like this will require spending an hour or so in your favorite programming language.)
These experiences lead me to believe that, for bootstrapped USA startups at least, you shouldn't worry about a market outside the English-speaking world.
If you need to worry about junk like accented characters or moon runes (Chinese/Japanese/Korean characters), it means you're big enough to afford to hire someone specifically to address the problem.