Can someone tell me why 'Cat with a wry smile' is in Unicode? Presumably at some point someone thought that it would be useful to somebody else, hence it's inclusion. It would be very interesting to hear the back-story behind such seemingly useless glyphs.
These are Emoji: it's a set of smileys/icons originally used by Japanese carriers. Apple included them in their iPhone, and in order to standardise them they (successfully) requested they were added to Unicode.
Damn that looks good, it worked for me for ② and u with umlauts.
Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)
My simple, clean ampersand (&) became a paperclip (0x1f4ce), a "fried shrimp" (0x1f364), several species of geometric triangle, and dozens of other silly, silly glyphs.
All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.
I've never been able to work out if it is an invisible ⨉ character, or a character meant for use when things are invisible. Either way works for Muse I suppose.
It's for mathematics. Some kind of math markup may want to distinguish x times y written xy from a two-letter variable called xy. So the product is x<invisible times>y but is rendered xy.
(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")
(@) -- 1st suggestion
(#) -- not suggested. Top suggestion is "capital 'H' with stroke"
($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.
(%) -- 1st suggestion
(^) -- 10th suggestion, to be fair this one is impossible
(&) -- not suggested (fried shrimp lol)
(*) -- 3rd suggestion
(?) -- 1st suggestion
(∫) -- 1st suggestion
(∂) -- 4th suggestion (1st one is same thing in boldface)
----------
This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.
There is a prior probability in the results - but it's hard to balance it so that latin characters don't dominate the results if you search something else. That is why I opted for the way it is now - this is a tool to find unicode characters, you have latin characters on your keyboard. It's not really meant for OCR
Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)
Very nice and amazingly accurate. But am I using your website the right way? When I look through the characters in the search results, if I see one that isn't rendering on my computer and just has a block instead of the symbol. I've been clicking on "bad" for "rate this suggestion". Thinking that it tallies up the total good/bad for a character to mean "how likely people are to have this character installed and working on their computers".
However, I now have a feeling that's not what that feature is for.
My guess was that the Good/Bad rating is to help with some sort of Machine Learning going on the background. E.g. the way I draw my Ampersand might be slightly different than yours, so if either or both of us see the result we were hoping for (&), that should get our Good rating. If it returns an (8), it may or may not deserve a Bad. If something way off appears as a top result (^), that would be pretty Bad.
Ooh my word. I drew a Skrillex and it recognized it perfectly; some Chuck Tays because he's a chill bro, school because his music appeals to people in school (12-24), and the Bengali vowel sound because he plays that in his "drops".
It's all in the eye of the beholder: it looks like a cactus to me! In seriousness, if you do see something else, you're obviously old enough to appreciate the joke :)
So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.
To be useful Shapecatcher needs to become better at recognition.
"If you can't find Chinese, Japanese or Korean glyphs, it is because I have yet to find a good free CJK font to use."
Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?
Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?
Sounds cool but sadly hasn't worked for the letters I often need but haven't easy access to (czech characters like: ř, ď and š). Perhaps because the element that makes them distinct from the latin (the "haček") is so tiny.
Edit, I just drew the ř larger and it recognised it correctly. Cool :)
Trying to draw U+1F4A9 (Pile of Poo). After several attempts, no luck.
I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).
Idea: instead of matching the shape of what the user has drawn raster-wise, let the user draw an svg-like path, and try to identify the letter by the trace.
Agreed that a pen tool or some type of editor would be kind of nice, but for what he's going for (proving out an idea), this is still pretty fun. I know a little of the science behind it, but it'd be great to read through some well-commented source code. He did link in the thesis on this, however: http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...
Does anyone know if the mirror image of this character: “ (U+201C) exists? I'm looking for a character that is the mirror image of the left double quotation mark, where the base is on the bottom and the character tapers from bottom right to top left. I don't know if any languages use that character.
1) it does count edge pixels - recognition is based on the shape of what you draw, so for the algorithm, a black picture is like a white one (but I admire your endurance to paint it all black)
2) server is under heavy load atm, it might drop some requests - just retry
Really cool stuff. Luckily, I draw my lowercase 'a's in double-story form, but if you attempt to draw an a (or an accented a) with just a circle and a line, it's recognized as an 'o'.
Really cool. The drawing tool would be even better if it drew as soon as you clicked (i.e., would draw a single dot if you click and don't drag the mouse).
It didn't recognize the capital letter A, only variations on the letter A and other unknown characters. I'm sure that the letter A is an unicode character!
Doesn't seem to include the drawing stroke order or anything, just the ultimate image. I guess if you know the strokes, you know the letter already, hm.
Are strokes order available for Unicode codepoints? I know that for many languages they're not important (for example people are free to write 't' from top-to-bottom but from bottom-to-top too) while in others stroke order is very important, like in Japanese.
However how would stroke order work for a Unicode codepoint? If it exists, there has to be a lot of info in addition to the codepoint of, say, a kanji.
But that aside, this looks like a neat idea. Not something I have any immediate use for myself, but could certainly be useful in some situations.
[1] http://www.fileformat.info/info/unicode/char/2603/index.htm