Shapecatcher: Draw the Unicode character you want

Hupo · on Feb 1, 2013

Couldn't get it to find the unicode snowman[1]: http://i.imgur.com/gaIY9Gd.png (my drawing skills are awesome, no?)

But that aside, this looks like a neat idea. Not something I have any immediate use for myself, but could certainly be useful in some situations.

[1] http://www.fileformat.info/info/unicode/char/2603/index.htm

fredley · on Feb 1, 2013

Can someone tell me why 'Cat with a wry smile' is in Unicode? Presumably at some point someone thought that it would be useful to somebody else, hence it's inclusion. It would be very interesting to hear the back-story behind such seemingly useless glyphs.

molf · on Feb 1, 2013

These are Emoji: it's a set of smileys/icons originally used by Japanese carriers. Apple included them in their iPhone, and in order to standardise them they (successfully) requested they were added to Unicode.

lloeki · on Feb 1, 2013

The history of Emoji before they got standardized is interesting. BTW Google was a requestor too.

wmil · on Feb 1, 2013

I can't get it to recognize 'Pile of Poo' (U+1F4A9)

Major bug.

dschep · on Feb 1, 2013

worked for me: http://i.imgur.com/4EJhUpK.png

cdcarter · on Feb 2, 2013

And indeed, a MiniDisc is a pile of poo (and Sony end of lifed them today!)

matthuggins · on Feb 1, 2013

TIL there's a pile of poo character.

lucb1e · on Feb 1, 2013

If a pc can't find it, perhaps drawing things is going to be the new captcha!

flux_w42 · on Feb 1, 2013

Hmm, I got the 0x26c4: Snowman without snow: ⛄

quirm · on Feb 1, 2013

You have to draw the snowflakes

rryan · on Feb 1, 2013

Along these same lines, for LaTeX fans there is http://detexify.kirelabs.org/

seanp2k2 · on Feb 1, 2013

For the curious, this explains a lot of the science in a hands-on, approachable way: http://stackoverflow.com/questions/10168686/algorithm-improv...

EDIT: As this is a deep topic, there are also books if that's more your style: http://www.amazon.com/dp/0123725380/?tag=stackoverfl08-20

or maybe you like Wikipedia (Ol' Trusty): http://en.wikipedia.org/wiki/Feature_detection_(computer_vis...

quirm · on Feb 2, 2013

and for the very curious, my bachelor thesis is on shapecatcher, http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...

There is a whole chapter on shape contexts in it, which I use with shapecatcher, too.

rurounijones · on Feb 1, 2013

Damn that looks good, it worked for me for ② and u with umlauts.

Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)

andydrizen · on Feb 1, 2013

If you like this, see DeTeXify http://detexify.kirelabs.org/classify.html

uvdiv · on Feb 1, 2013

My simple, clean ampersand (&) became a paperclip (0x1f4ce), a "fried shrimp" (0x1f364), several species of geometric triangle, and dozens of other silly, silly glyphs.

http://i.imgur.com/NLkl75J.png

the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

http://www.unicode.org/charts/PDF/U1F300.pdf

Latest Abstruse Goose comic sums up my emotional response:

http://abstrusegoose.com/496

nwh · on Feb 1, 2013

Who doesn't love 🍤?

There's lot of other strange Unicode too. There's things like '⁢' (U2062 INVISIBLE TIMES), ⓞⓓⓓ ©ⓗⓐⓡⓐ©ⓣⓔⓡ ⓢⓔⓣⓢ and sɹǝʇɔɐɹɐɥɔ uʍop ǝpısdn.

All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.

‮http://news.ycombinator.com/?/moc.elgoog//:ptth

uvdiv · on Feb 1, 2013

Who doesn't love 🍤?

I just discovered the OS X terminal renders the friend shrimp glyph in color!

edit: and it's now my $PS1.

speeder · on Feb 1, 2013

Yes, I tested pasting "🍤" to my OSX, and was surprised by a color rendering of a friend shrimp.

(yes, you can copy the character between "" and paste in your OSX terminal)

decad · on Feb 1, 2013

Shame this doesn't work in zsh.

uvdiv · on Feb 1, 2013

Why wouldn't it work on zsh? The font face is a property of the terminal emulator (e.g. xterm, Terminal.app), not the shell.

http://i.imgur.com/RKTOBKe.png

edit: It appears (?) stock linux fonts don't include emoji.

wahnfrieden · on Feb 1, 2013

echo 🍤

Raphael · on Feb 1, 2013

fried!

lloeki · on Feb 1, 2013

> "INVISIBLE TIMES"

Sounds like a Muse song title

nwh · on Feb 1, 2013

I've never been able to work out if it is an invisible ⨉ character, or a character meant for use when things are invisible. Either way works for Muse I suppose.

taejo · on Feb 1, 2013

It's for mathematics. Some kind of math markup may want to distinguish x times y written xy from a two-letter variable called xy. So the product is x<invisible times>y but is rendered xy.

Groxx · on Feb 1, 2013

I love that link example xD I never thought of doing that!

uvdiv · on Feb 1, 2013

Experiments with some common glyphs.

(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")

(@) -- 1st suggestion

(#) -- not suggested. Top suggestion is "capital 'H' with stroke"

($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.

(%) -- 1st suggestion

(^) -- 10th suggestion, to be fair this one is impossible

(&) -- not suggested (fried shrimp lol)

(*) -- 3rd suggestion

(?) -- 1st suggestion

(∫) -- 1st suggestion

(∂) -- 4th suggestion (1st one is same thing in boldface)

----------

This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.

macleodan · on Feb 1, 2013

Why would you use this tool to lookup common characters found on your keyboard? Maybe a better idea is to show uncommon characters first.

quirm · on Feb 1, 2013

Exactly, this is why priors for latin alpha characters are not set that high - you have them on your keyboard already.

quirm · on Feb 1, 2013

There is a prior probability in the results - but it's hard to balance it so that latin characters don't dominate the results if you search something else. That is why I opted for the way it is now - this is a tool to find unicode characters, you have latin characters on your keyboard. It's not really meant for OCR

pseut · on Feb 1, 2013

You could do prior probability conditional on not being available on a US keyboard, or at least make that an option.

edit: just read your comment elsewhere on the page indicating that you do something like what I described.

peapicker · on Feb 1, 2013

$#& - all were my first suggestion... perhaps I draw them better than you do? :)

FreeFull · on Feb 1, 2013

& only seems to work if drawn with the top loop broken.

JensenDied · on Feb 2, 2013

I was only trying to see the fried shrimp http://imgur.com/Ev3gGwd

Raphael · on Feb 1, 2013

dollar sign works if you draw the vertical line all the way through.

masklinn · on Feb 1, 2013

> the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

A softbank-based Emoji set was imported (in an extended form) into Unicode 6.0: http://en.wikipedia.org/wiki/Emoji#Emoji_in_the_Unicode_stan...

Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)

dexen · on Feb 1, 2013

http://kyon.pl/img/18939,unicode,lol,inside_joke,.html

ChrisNorstrom · on Feb 1, 2013

Very nice and amazingly accurate. But am I using your website the right way? When I look through the characters in the search results, if I see one that isn't rendering on my computer and just has a block instead of the symbol. I've been clicking on "bad" for "rate this suggestion". Thinking that it tallies up the total good/bad for a character to mean "how likely people are to have this character installed and working on their computers".

However, I now have a feeling that's not what that feature is for.

kayge · on Feb 2, 2013

My guess was that the Good/Bad rating is to help with some sort of Machine Learning going on the background. E.g. the way I draw my Ampersand might be slightly different than yours, so if either or both of us see the result we were hoping for (&), that should get our Good rating. If it returns an (8), it may or may not deserve a Bad. If something way off appears as a top result (^), that would be pretty Bad.

_hgt1 · on Feb 1, 2013

It recognized my drawing of a cactus! http://i.imgur.com/pDgIOPk.png

seanp2k2 · on Feb 1, 2013

Ooh my word. I drew a Skrillex and it recognized it perfectly; some Chuck Tays because he's a chill bro, school because his music appeals to people in school (12-24), and the Bengali vowel sound because he plays that in his "drops".

EDIT: forgot to link screenshot: http://imgur.com/LyaXy4h

seanp2k2 · on Feb 1, 2013

This is seriously the funniest thing I've seen on HN in weeks. Made me actually laugh.

MojoJolo · on Feb 1, 2013

Damn. I think it's not intended as a cactus. Hahaha.

_hgt1 · on Feb 1, 2013

Of course it's a cactus! What else would it be?

WA · on Feb 1, 2013

I wonder how many people try to paint cacti as a quick test for tools like that ;)

laumars · on Feb 1, 2013

Guilty as charged.

darrenkopp · on Feb 1, 2013

i must be pretty terrible at drawing the snowman, because I couldn't get it to match that.

_hgt1 · on Feb 1, 2013

My comment, with >36 votes, was flagged to oblivion. What a bunch of puritanical, cactus-haters you are, Hacker News.

hwang89 · on Feb 1, 2013

prickly.

dpham · on Feb 1, 2013

Pile of poo http://shapecatcher.com/unicode/info/128169

jQueryIsAwesome · on Feb 1, 2013

It's not every day that the top comment on reddit.com/r/funny is also the top comment on Hacker News.

speeder · on Feb 1, 2013

I know this is supposed to be funny, but I had to flag it.

I mean, I did found it funny, but I did not enjoyed seeing that in my work (specially because I work with kids stuff)

Osmium · on Feb 1, 2013

It's all in the eye of the beholder: it looks like a cactus to me! In seriousness, if you do see something else, you're obviously old enough to appreciate the joke :)

_hgt1 · on Feb 1, 2013

Are cacti considered dangerous for children in your country?

jessaustin · on Feb 1, 2013

Cacti are dangerous to children and adults wherever they appear. Those thorns are sharp! :)

ozh · on Feb 1, 2013

with kids or with kids stuff? Protip: don't click on imgur links.

anonymfus · on Feb 1, 2013

Unknown link can be redirect to imgur, so more useful hint would be block imgur links.

_hgt1 · on Feb 1, 2013

Internet ahead! Danger!

seanp2k2 · on Feb 1, 2013

Found a "secret" http://shapecatcher.com/engine/

jaymzcampbell · on Feb 1, 2013

Reminds me of the excellent Detexify, which serves a similar purpose but for LaTeX symbols: http://detexify.kirelabs.org/classify.html

symmetricsaurus · on Feb 1, 2013

Tried to draw an alpha but it didn't get it.

So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.

To be useful Shapecatcher needs to become better at recognition.

cygwin98 · on Feb 1, 2013

Nice job done. Here are what I got: ᗧ ᗣ ᗤ ᗢ

could be awesome for a character-based PacMan impl.

drucken · on Feb 2, 2013

"If you can't find Chinese, Japanese or Korean glyphs, it is because I have yet to find a good free CJK font to use."

Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?

Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?

smcl · on Feb 1, 2013

Sounds cool but sadly hasn't worked for the letters I often need but haven't easy access to (czech characters like: ř, ď and š). Perhaps because the element that makes them distinct from the latin (the "haček") is so tiny.

Edit, I just drew the ř larger and it recognised it correctly. Cool :)

k_bx · on Feb 1, 2013

(surprisingly noone said this) Too bad it doesn't generate links to drawing results.

jessaustin · on Feb 1, 2013

Hosting uploaded images exposes sites to a great deal of annoyance.

quirm · on Feb 2, 2013

I think there is away to not host images - I could just record strokes and play that back with js.

quirm · on Feb 1, 2013

I was working on this last weekend - sorry I didn't know someone would post this is again to ycombinator ;)

fredley · on Feb 1, 2013

Trying to draw U+1F4A9 (Pile of Poo). After several attempts, no luck.

I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).

zenon · on Feb 1, 2013

I tried to draw the elusive snowman: http://i.imgur.com/dr5VKTh.png

Clearly it's not impressed by my drawing skills.

dschep · on Feb 1, 2013

http://i.imgur.com/4EJhUpK.png worked on second attempt (added the 'stink lines')

speeder · on Feb 1, 2013

Nice idea, too bad that I tried to draw several variations of PI, and it showed me several interesting characters, but never a PI.

Seriously, it even showed some very PI-like things, but not PI itself. This is a downer.

the_gipsy · on Feb 1, 2013

Idea: instead of matching the shape of what the user has drawn raster-wise, let the user draw an svg-like path, and try to identify the letter by the trace.

seanp2k2 · on Feb 1, 2013

Agreed that a pen tool or some type of editor would be kind of nice, but for what he's going for (proving out an idea), this is still pretty fun. I know a little of the science behind it, but it'd be great to read through some well-commented source code. He did link in the thesis on this, however: http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...

oftenwrong · on Feb 1, 2013

http://kanji.sljfaq.org/draw-canvas.html

a_p · on Feb 1, 2013

Does anyone know if the mirror image of this character: “ (U+201C) exists? I'm looking for a character that is the mirror image of the left double quotation mark, where the base is on the bottom and the character tapers from bottom right to top left. I don't know if any languages use that character.

FreeFull · on Feb 1, 2013

‟ (U+201F) possibly?

anonymfus · on Feb 1, 2013

Interesting thing, it recognized white queen correctly with diamonds on crown, but as black queen without diamonds and failed with only one diamond:

http://i45.tinypic.com/3535j4x.png — screenshot of variant with three diamonds.

quasque · on Feb 1, 2013

Pretty neat, but I'm not sure what else it wants me to draw here http://i.imgur.com/rfU10rj.png

Also it seems to fail on badly-drawn birds http://i.imgur.com/IZgrRkq.png

quirm · on Feb 1, 2013

1) it does count edge pixels - recognition is based on the shape of what you draw, so for the algorithm, a black picture is like a white one (but I admire your endurance to paint it all black) 2) server is under heavy load atm, it might drop some requests - just retry

RyanMcGreal · on Feb 1, 2013

Sadly it didn't recognize my clumsy attempt to draw the Look of Disapproval: http://i.imgur.com/0lvPFaJ.png

On the upside, I learned that there is a Panda Face unicode character.

niyazpk · on Feb 1, 2013

The look of disapproval is not a single character; it comprises of three characters: [1] _ [1]

[1] http://shapecatcher.com/unicode/info/3232

rmrfrmrf · on Feb 1, 2013

Really cool stuff. Luckily, I draw my lowercase 'a's in double-story form, but if you attempt to draw an a (or an accented a) with just a circle and a line, it's recognized as an 'o'.

eoJ · on Feb 1, 2013

Nice. It's not very good at finding faces though. The right face is there, but way down.

http://i.imgur.com/PD1DkV6.png

zandomatter · on Feb 1, 2013

Really cool. The drawing tool would be even better if it drew as soon as you clicked (i.e., would draw a single dot if you click and don't drag the mouse).

lucb1e · on Feb 1, 2013

It didn't recognize the capital letter A, only variations on the letter A and other unknown characters. I'm sure that the letter A is an unicode character!

Blara · on Feb 1, 2013

Worked for me, it's called: Latin capital letter a: A (0x41)

JoeAltmaier · on Feb 1, 2013

Doesn't seem to include the drawing stroke order or anything, just the ultimate image. I guess if you know the strokes, you know the letter already, hm.

martinced · on Feb 1, 2013

Are strokes order available for Unicode codepoints? I know that for many languages they're not important (for example people are free to write 't' from top-to-bottom but from bottom-to-top too) while in others stroke order is very important, like in Japanese.

However how would stroke order work for a Unicode codepoint? If it exists, there has to be a lot of info in addition to the codepoint of, say, a kanji.

TazeTSchnitzel · on Feb 1, 2013

It claims Chinese/Japanese/Korean is unsupported, but works fine for me:

http://i.imgur.com/AQBybDT.png

bryze · on Feb 1, 2013

Is that the only character you tried? Try writing "論" and see what happens..

TazeTSchnitzel · on Feb 1, 2013

I tried 四 and it didn't work. I guess Hiragana and Katakana work, but not Hanzi/Kanji/Hanja.

quirm · on Feb 2, 2013

Right. Hiragana and Katakana ist trivial (just a couple of characters), but no support for Kanjis currently

tempestn · on Feb 9, 2013

Awesome idea, but all it ever gives me is a loading bar. shapecatcher.com/engine/recognize eventually returns "504 Gateway Time-out"

mikle · on Feb 1, 2013

It did pretty well with the first five Hebrew letters. Pretty well as in all but one was found in the top 10 of results.

kaolinite · on Feb 1, 2013

Worked really well for me, found a smiley face and a few accented characters. Awesome idea and implementation.

SeanDav · on Feb 1, 2013

Couldn't get it to recognize my drawing of a castle/rook in chess, but it did find my bishop. Great site.

thewarrior · on Feb 1, 2013

They need to prioritise the glyphs based on frequency of occurrence to prevent ridiculous matches.

neumann_alfred · on Feb 1, 2013

You can rate the recognition results ;)

cynwoody · on Feb 1, 2013

It got the ⌘-sign on the first try (Place of interest sign: 0x2318, aka the Command Key).

nefasti · on Feb 1, 2013

I clicked the link reading "Draw the Unicorn character you want" =(

sparist · on Feb 1, 2013

Worked fantastically on my iPad. Excellent idea and execution.

miga · on Feb 1, 2013

Well, it didn't recognize pi.

JoeAltmaier · on Feb 1, 2013

Amazing. Try a happy face!

Snoptic · on Feb 1, 2013

2 previous discussions.

http://www.hnsearch.com/search#request/submissions&q=Sha...

Maybe the author can write a post detector next.