Hacker News new | past | comments | ask | show | jobs | submit login
Shapecatcher: Draw the Unicode character you want (shapecatcher.com)
271 points by barredo on Feb 1, 2013 | hide | past | favorite | 107 comments



Couldn't get it to find the unicode snowman[1]: http://i.imgur.com/gaIY9Gd.png (my drawing skills are awesome, no?)

But that aside, this looks like a neat idea. Not something I have any immediate use for myself, but could certainly be useful in some situations.

[1] http://www.fileformat.info/info/unicode/char/2603/index.htm


Can someone tell me why 'Cat with a wry smile' is in Unicode? Presumably at some point someone thought that it would be useful to somebody else, hence it's inclusion. It would be very interesting to hear the back-story behind such seemingly useless glyphs.


These are Emoji: it's a set of smileys/icons originally used by Japanese carriers. Apple included them in their iPhone, and in order to standardise them they (successfully) requested they were added to Unicode.


The history of Emoji before they got standardized is interesting. BTW Google was a requestor too.


I can't get it to recognize 'Pile of Poo' (U+1F4A9)

Major bug.



And indeed, a MiniDisc is a pile of poo (and Sony end of lifed them today!)


TIL there's a pile of poo character.


If a pc can't find it, perhaps drawing things is going to be the new captcha!


Hmm, I got the 0x26c4: Snowman without snow: ⛄


You have to draw the snowflakes


Along these same lines, for LaTeX fans there is http://detexify.kirelabs.org/


For the curious, this explains a lot of the science in a hands-on, approachable way: http://stackoverflow.com/questions/10168686/algorithm-improv...

EDIT: As this is a deep topic, there are also books if that's more your style: http://www.amazon.com/dp/0123725380/?tag=stackoverfl08-20

or maybe you like Wikipedia (Ol' Trusty): http://en.wikipedia.org/wiki/Feature_detection_(computer_vis...


and for the very curious, my bachelor thesis is on shapecatcher, http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...

There is a whole chapter on shape contexts in it, which I use with shapecatcher, too.


Damn that looks good, it worked for me for ② and u with umlauts.

Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)


If you like this, see DeTeXify http://detexify.kirelabs.org/classify.html


My simple, clean ampersand (&) became a paperclip (0x1f4ce), a "fried shrimp" (0x1f364), several species of geometric triangle, and dozens of other silly, silly glyphs.

http://i.imgur.com/NLkl75J.png

the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

http://www.unicode.org/charts/PDF/U1F300.pdf

Latest Abstruse Goose comic sums up my emotional response:

http://abstrusegoose.com/496


Who doesn't love 🍤?

There's lot of other strange Unicode too. There's things like '⁢' (U2062 INVISIBLE TIMES), ⓞⓓⓓ ©ⓗⓐⓡⓐ©ⓣⓔⓡ ⓢⓔⓣⓢ and sɹǝʇɔɐɹɐɥɔ uʍop ǝpısdn.

All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.

http://news.ycombinator.com/?/moc.elgoog//:ptth


Who doesn't love 🍤?

I just discovered the OS X terminal renders the friend shrimp glyph in color!

edit: and it's now my $PS1.


Yes, I tested pasting "🍤" to my OSX, and was surprised by a color rendering of a friend shrimp.

(yes, you can copy the character between "" and paste in your OSX terminal)


Shame this doesn't work in zsh.


Why wouldn't it work on zsh? The font face is a property of the terminal emulator (e.g. xterm, Terminal.app), not the shell.

http://i.imgur.com/RKTOBKe.png

edit: It appears (?) stock linux fonts don't include emoji.


echo 🍤


fried!


> "INVISIBLE TIMES"

Sounds like a Muse song title


I've never been able to work out if it is an invisible ⨉ character, or a character meant for use when things are invisible. Either way works for Muse I suppose.


It's for mathematics. Some kind of math markup may want to distinguish x times y written xy from a two-letter variable called xy. So the product is x<invisible times>y but is rendered xy.


I love that link example xD I never thought of doing that!


Experiments with some common glyphs.

(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")

(@) -- 1st suggestion

(#) -- not suggested. Top suggestion is "capital 'H' with stroke"

($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.

(%) -- 1st suggestion

(^) -- 10th suggestion, to be fair this one is impossible

(&) -- not suggested (fried shrimp lol)

(*) -- 3rd suggestion

(?) -- 1st suggestion

(∫) -- 1st suggestion

(∂) -- 4th suggestion (1st one is same thing in boldface)

----------

This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.


Why would you use this tool to lookup common characters found on your keyboard? Maybe a better idea is to show uncommon characters first.


Exactly, this is why priors for latin alpha characters are not set that high - you have them on your keyboard already.


There is a prior probability in the results - but it's hard to balance it so that latin characters don't dominate the results if you search something else. That is why I opted for the way it is now - this is a tool to find unicode characters, you have latin characters on your keyboard. It's not really meant for OCR


You could do prior probability conditional on not being available on a US keyboard, or at least make that an option.

edit: just read your comment elsewhere on the page indicating that you do something like what I described.


$#& - all were my first suggestion... perhaps I draw them better than you do? :)


& only seems to work if drawn with the top loop broken.


I was only trying to see the fried shrimp http://imgur.com/Ev3gGwd


dollar sign works if you draw the vertical line all the way through.


> the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

A softbank-based Emoji set was imported (in an extended form) into Unicode 6.0: http://en.wikipedia.org/wiki/Emoji#Emoji_in_the_Unicode_stan...

Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)



Very nice and amazingly accurate. But am I using your website the right way? When I look through the characters in the search results, if I see one that isn't rendering on my computer and just has a block instead of the symbol. I've been clicking on "bad" for "rate this suggestion". Thinking that it tallies up the total good/bad for a character to mean "how likely people are to have this character installed and working on their computers".

However, I now have a feeling that's not what that feature is for.


My guess was that the Good/Bad rating is to help with some sort of Machine Learning going on the background. E.g. the way I draw my Ampersand might be slightly different than yours, so if either or both of us see the result we were hoping for (&), that should get our Good rating. If it returns an (8), it may or may not deserve a Bad. If something way off appears as a top result (^), that would be pretty Bad.


It recognized my drawing of a cactus! http://i.imgur.com/pDgIOPk.png


Ooh my word. I drew a Skrillex and it recognized it perfectly; some Chuck Tays because he's a chill bro, school because his music appeals to people in school (12-24), and the Bengali vowel sound because he plays that in his "drops".

EDIT: forgot to link screenshot: http://imgur.com/LyaXy4h


This is seriously the funniest thing I've seen on HN in weeks. Made me actually laugh.


Damn. I think it's not intended as a cactus. Hahaha.


Of course it's a cactus! What else would it be?


I wonder how many people try to paint cacti as a quick test for tools like that ;)


Guilty as charged.


i must be pretty terrible at drawing the snowman, because I couldn't get it to match that.


My comment, with >36 votes, was flagged to oblivion. What a bunch of puritanical, cactus-haters you are, Hacker News.


prickly.



It's not every day that the top comment on reddit.com/r/funny is also the top comment on Hacker News.


I know this is supposed to be funny, but I had to flag it.

I mean, I did found it funny, but I did not enjoyed seeing that in my work (specially because I work with kids stuff)


It's all in the eye of the beholder: it looks like a cactus to me! In seriousness, if you do see something else, you're obviously old enough to appreciate the joke :)


Are cacti considered dangerous for children in your country?


Cacti are dangerous to children and adults wherever they appear. Those thorns are sharp! :)


with kids or with kids stuff? Protip: don't click on imgur links.


Unknown link can be redirect to imgur, so more useful hint would be block imgur links.


Internet ahead! Danger!



Reminds me of the excellent Detexify, which serves a similar purpose but for LaTeX symbols: http://detexify.kirelabs.org/classify.html


Tried to draw an alpha but it didn't get it.

So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.

To be useful Shapecatcher needs to become better at recognition.


Nice job done. Here are what I got: ᗧ ᗣ ᗤ ᗢ

could be awesome for a character-based PacMan impl.


"If you can't find Chinese, Japanese or Korean glyphs, it is because I have yet to find a good free CJK font to use."

Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?

Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?


Sounds cool but sadly hasn't worked for the letters I often need but haven't easy access to (czech characters like: ř, ď and š). Perhaps because the element that makes them distinct from the latin (the "haček") is so tiny.

Edit, I just drew the ř larger and it recognised it correctly. Cool :)


(surprisingly noone said this) Too bad it doesn't generate links to drawing results.


Hosting uploaded images exposes sites to a great deal of annoyance.


I think there is away to not host images - I could just record strokes and play that back with js.


I was working on this last weekend - sorry I didn't know someone would post this is again to ycombinator ;)


Trying to draw U+1F4A9 (Pile of Poo). After several attempts, no luck.

I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).


I tried to draw the elusive snowman: http://i.imgur.com/dr5VKTh.png

Clearly it's not impressed by my drawing skills.


http://i.imgur.com/4EJhUpK.png worked on second attempt (added the 'stink lines')


Nice idea, too bad that I tried to draw several variations of PI, and it showed me several interesting characters, but never a PI.

Seriously, it even showed some very PI-like things, but not PI itself. This is a downer.


Idea: instead of matching the shape of what the user has drawn raster-wise, let the user draw an svg-like path, and try to identify the letter by the trace.


Agreed that a pen tool or some type of editor would be kind of nice, but for what he's going for (proving out an idea), this is still pretty fun. I know a little of the science behind it, but it'd be great to read through some well-commented source code. He did link in the thesis on this, however: http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...


Related:

A version of this for kanji that is very accurate

http://kanji.sljfaq.org/draw-canvas.html


Does anyone know if the mirror image of this character: “ (U+201C) exists? I'm looking for a character that is the mirror image of the left double quotation mark, where the base is on the bottom and the character tapers from bottom right to top left. I don't know if any languages use that character.


‟ (U+201F) possibly?


Interesting thing, it recognized white queen correctly with diamonds on crown, but as black queen without diamonds and failed with only one diamond:

http://i45.tinypic.com/3535j4x.png — screenshot of variant with three diamonds.


Pretty neat, but I'm not sure what else it wants me to draw here http://i.imgur.com/rfU10rj.png

Also it seems to fail on badly-drawn birds http://i.imgur.com/IZgrRkq.png


1) it does count edge pixels - recognition is based on the shape of what you draw, so for the algorithm, a black picture is like a white one (but I admire your endurance to paint it all black) 2) server is under heavy load atm, it might drop some requests - just retry


Sadly it didn't recognize my clumsy attempt to draw the Look of Disapproval: http://i.imgur.com/0lvPFaJ.png

On the upside, I learned that there is a Panda Face unicode character.


The look of disapproval is not a single character; it comprises of three characters: [1] _ [1]

[1] http://shapecatcher.com/unicode/info/3232


Really cool stuff. Luckily, I draw my lowercase 'a's in double-story form, but if you attempt to draw an a (or an accented a) with just a circle and a line, it's recognized as an 'o'.


Nice. It's not very good at finding faces though. The right face is there, but way down.

http://i.imgur.com/PD1DkV6.png


Really cool. The drawing tool would be even better if it drew as soon as you clicked (i.e., would draw a single dot if you click and don't drag the mouse).


It didn't recognize the capital letter A, only variations on the letter A and other unknown characters. I'm sure that the letter A is an unicode character!


Worked for me, it's called: Latin capital letter a: A (0x41)


Doesn't seem to include the drawing stroke order or anything, just the ultimate image. I guess if you know the strokes, you know the letter already, hm.


Are strokes order available for Unicode codepoints? I know that for many languages they're not important (for example people are free to write 't' from top-to-bottom but from bottom-to-top too) while in others stroke order is very important, like in Japanese.

However how would stroke order work for a Unicode codepoint? If it exists, there has to be a lot of info in addition to the codepoint of, say, a kanji.


It claims Chinese/Japanese/Korean is unsupported, but works fine for me:

http://i.imgur.com/AQBybDT.png


Is that the only character you tried? Try writing "論" and see what happens..


I tried 四 and it didn't work. I guess Hiragana and Katakana work, but not Hanzi/Kanji/Hanja.


Right. Hiragana and Katakana ist trivial (just a couple of characters), but no support for Kanjis currently


Awesome idea, but all it ever gives me is a loading bar. shapecatcher.com/engine/recognize eventually returns "504 Gateway Time-out"


It did pretty well with the first five Hebrew letters. Pretty well as in all but one was found in the top 10 of results.


Worked really well for me, found a smiley face and a few accented characters. Awesome idea and implementation.


Couldn't get it to recognize my drawing of a castle/rook in chess, but it did find my bishop. Great site.


They need to prioritise the glyphs based on frequency of occurrence to prevent ridiculous matches.


You can rate the recognition results ;)


It got the ⌘-sign on the first try (Place of interest sign: 0x2318, aka the Command Key).


I clicked the link reading "Draw the Unicorn character you want" =(


Worked fantastically on my iPad. Excellent idea and execution.


Well, it didn't recognize pi.


Amazing. Try a happy face!


2 previous discussions.

http://www.hnsearch.com/search#request/submissions&q=Sha...

Maybe the author can write a post detector next.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: