Project Naptha: a browser extension that enables text selection on any image

atourgates · on April 22, 2014

Wow - this is amazing.

Right this very moment (well, a few moments ago when I wasn't procrastinating on HN) I was in the midst of extracting data from a client's old website in preparation of creating a new website.

A lot of that data is contained within images.

From a few preliminary tests, I'm hugely impressed. This seems on-par with any other OCR software I've used, and the fact that it happens in realtime in the browser is amazing.

I tried it on a piece of content I'd just had to type out, that was originally in an image. Typing out the content took about 10 minutes. Copying and pasting with Naptha, and then making some minor edits/corrections, did the same thing in about 2 minutes.

vidarh · on April 22, 2014

There's actually been a bit of research on the error rates you need to beat for OCR to be cost-effective vs. having people re-type. I don't have the references handy, but I believe it's generally cost effective to OCR with error rates up to nearly 2%, and most current "consumer grade" OCR is well below 1% error rates for scans that aren't absolutely atrociously poor quality.

My Msc thesis was on reducing OCR error rates by pre-processing of various forms, and while I managed to get some reduction in error rates, one of the things I found was actually that given how low the error rates generally was to begin with, you have a very tiny budget in terms of extra processing time before further error reduction just isn't worth it - if a human needs to check the document for errors anyway, a "quick and dirty" scan+OCR is often far better than even spending the time to get "as good as possible" results. Spending even a few extra seconds per page to place the page perfectly in a scanner, or waiting a few extra seconds for more complicated processing, can be a net loss.

It's a perfect example of "worse is better": OCR, at least for typed text, is good enough today that the best available solutions aren't really worthwhile to spend resources on (for users) unless/until they give results so perfect it doesn't need to be checked by a person afterwards.

WalterBright · on April 23, 2014

It was suggested to me by a friend that to get good OCR results, run it through the scanner/OCR twice, then diff the results. Usually one or the other will get it right, and if you run the two results through a difference editor like 'meld', it's quick to fix.

vidarh · on April 23, 2014

That may work for some cases, and especially with horrible OCR engines and low quality scanners, but frankly when I did my research into this, the results varied extremely little from run to run, and you could usually easily identify specific artefacts in the source that tripped the engine up (rather than problems with the quality of the scan). E.g. letters that were damaged, or had run together, creases in the paper etc.

With really low res scanners I can image it could make a big difference.

Corrado · on April 28, 2014

Back in the late 90's I worked for a company that did a lot of OCRing and they ran the same image through multiple engines and then manually corrected the results. I think they had 3 engines, all from different companies, which processed all images and put the results into a custom format. Human beings were then employed to manually merge and correct the final text. It worked fairly well, especially considering the hardware/software available at the time.

The biggest problem was stuffing too many files into an NTFS directory. Apparently, NTFS didn't like tens of thousands of files in one directory. :)

TheLoneWolfling · on April 23, 2014

What about running it through two+ different OCR engines?

netcan · on April 23, 2014

If this is done all in software (IE, it isn't analyzing a slightly different image), why wouldn't the OCR just do this itself?

timdiggerm · on April 23, 2014

Somebody's got to decide which way to go with the diffs

ithkuil · on April 23, 2014

majority out of an odd number of runs?

tlrobinson · on April 23, 2014

"There's actually been a bit of research on the error rates you need to beat for OCR to be cost-effective vs. having people re-type."

Doesn't that depend entirely on what you're using the text for and how accurate it needs to be?

vidarh · on April 23, 2014

To a certain extent, of course. The 2% was based on the assumption that if you are benchmarking against re-typing, you expect the same kind of quality you'd get from having a good typist re-typing the documents.

From my own experiments, I tend to find that you can read through and correct errors only relatively marginally faster than you can type because you either follow along with the cursor or need to be able to position the cursor very quickly when you find an error, and as the error rate increases, trying to position the cursor to each error very quickly gets too slow.

Dropping accuracy in your effort to correct the text doesn't really seem to speed things up much. You likely speed it up if you're willing to assume that anything that passes the spellchecker is ok (but it won't be, especially as modern OCR's often try to rely on data about sequences of letters, or dictionaries, when they're uncertain about characters)

If you're ok with lower accuracy, e.g. for search, and the alternative is not processing the document at all, then it'd be drastically different.

goldenkey · on April 23, 2014

Time is not as relevant as energy when we are talking about people whose jobs have a huge strain.

trishume · on April 22, 2014

Holy crap, antimatter15 does so many cool things. I keep finding things that are really cool and then scroll down to find they are all written by him. First Shinytouch, then Protobowl years later and now this. And he's only a year older than me (19) so it isn't that he's had more time. Check out his Github profile for more of his projects: http://github.com/antimatter15

ErikBjare · on April 24, 2014

I was amazed as well, same age as me. Now I feel challenged to execute even more of my ideas, well done Sir!

jgj · on April 22, 2014

> Unfortunately, your browser is not yet supported, currently only Google Chrome is supported.

FF 28 seems to be working fine with the "Weenie Hut Jr." version...is it just the add-on that isn't supported?

awesome tech, btw

antimatter15 · on April 22, 2014

Yeah, I just haven't gotten around packaging the whole thing as a Firefox Addon. It's actually technically possible to run the whole thing on a normal unprivileged webpage (in fact, that's my development environment).

mcpherrinm · on April 22, 2014

If all you need to do is inject Javascript into a webpage, you should be able to make a Firefox addon to do that in easily under an hour -- check out the SDK getting started guide at https://developer.mozilla.org/en-US/Add-ons/SDK/Tutorials/Ge...

Ironlink · on April 23, 2014

In that case, why not make it available as a javascript bookmark?

http://en.wikipedia.org/wiki/Bookmarklet

SteveDeFacto · on April 22, 2014

Please give us a Firefox version! I'm begging you!

Natsu · on April 23, 2014

Seconded. This is one of the most incredibly useful projects I've seen in a long time.

vidarh · on April 22, 2014

Reminds me of Powersnap on the Amiga. Many applications did their own text rendering without supporting cut and paste, and so this guy called Nico Francois had the bright idea of letting you select a region of a window, and matching the standard fonts against the windows bitmap.

Of course then it was "easy": almost all the text would have been rendered with one of a tiny number of fonts available on the system, with little to no distortion.

Ogre · on April 23, 2014

Powersnap was amazing. I seem to recall it was usually able to figure out what font each program was using and only had to search for letters for that specific font, and only fall back to a bigger search if that failed. I might be misremembering, but regardless, it was essentially as fast as any copy-paste today, in an environment where many programs weren't even written to support it.

Even though it solved a problem we don't usually have today (this story notwithstanding), it was still one of the most amazingly useful programs ever.

vidarh · on April 23, 2014

You're probably right - the manual says it did. It'd be able to get the last used font from the RastPort structure used to draw to the window [1].

If the window was rendered with multiple font that wouldn't be reliable, but I guess it'd likely be "good enough" to avoid a wider search most of the time.

[1] Here's the RastPort struct from AROS (open source re-implementation of AmigaOS): http://repo.or.cz/w/AROS.git/blob/HEAD:/compiler/include/gra...

pestaa · on April 22, 2014

This is great news for those who have to live with disabilities.

Maybe soon I won't feel guilty for leaving my alt attributes empty.

leeoniya · on April 22, 2014

@antimatter15, i have a project that does client-side image analysis and decompses document structures. it looks like your OCR code would be a great replacement for the server-side Tesseract ocr i currently use :)

here's what the project does now with js + web workers:

http://i.imgur.com/QvXSkY2.png

processing time is < 1500ms in Chrome and < 2000ms in FF

the code is open source, though using it isnt yet polished. i'm working slowly on a blog post series to detail how to use the lib(s). https://github.com/leeoniya/pXY.js

a walkthrough of the base lib is here: http://o-0.me/pXY/

antimatter15 · on April 22, 2014

The OCR code is an Emscripten port of the GPL-licensed Ocrad program. I published it on Github a few months ago, http://antimatter15.github.io/ocrad.js/demo.html

But in my experience, the recognition quality isn't good enough to replace Tesseract if you have that capability.

leeoniya · on April 22, 2014

it would be very useful to maybe just use part of the code. (the part that detects where there is text, rather than what the text is)

skizm · on April 22, 2014

Doesn't work great. Went to reddit's advice animal page to try it out and it doesn't seem to work with livememe (I think they have an invisible layer over their images to try and block hot linking).

Here is a copy/paste example from imgur:

http://i.imgur.com/sKQXx8v.jpg

Top: vou SAID w[ W[R[ |[AVINĞ`ON TIM[TOAV

Bottom: TN[ FACTTNATl'M MAWING TNISM[M[ g INST[AD of DRIVING D[TERMIN[D TN#rWASA ll[

Maybe it needs to be a certain font for better results. Still pretty cool. Hopefully all the kinks get worked out. I would definitely find this useful.

EDIT: need to make sure the language is set to "internet meme" and it works much better.

antimatter15 · on April 22, 2014

By default it uses Ocrad.js, a pure javascript OCR engine (ported via emscripten, see http://antimatter15.github.io/ocrad.js/demo.html). But if you right click on the selection and change the language to "Internet Meme", it should transcribe it correctly (note that this sends the selection off to a server for remote processing- It's not the default for privacy and scalability considerations at the moment).

skizm · on April 22, 2014

Ah, much better!

Top: YOU SAID WE WERE LEAVING'ON TIME:TODAY

Bottom: THE FACT THAT I'M MAKING THIS MEME INSTEAD OF DRIVING DETERMINED THAT WAS A LIE

Next time I'll RTFM.

abandonliberty · on April 23, 2014

Mine automatically selects the appropriate language on the meme images. Including your link. Updated already?

madsushi · on April 22, 2014

I tried your same sample image. The top came out much better than the bottom:

YOU SAID WE WERE LEAVING'ON TIME:TODAY

TN[ FACTTNAT |'M MAWING TNIS M[M[ INST[AD of DRIVING D[TERMIN[D TN#rWASA ll[

I imagine that the thick outline of the font makes it hard to detect the edge of the letters, especially since it obscures the true "background".

e: using the Internet Meme language worked much better!

YOU SAID WE WERE LEAVING'ON TIME:TODAY

(:/J

THE FACT THAT I'M MAKING THIS MEME INSTEAD OF DRIVING DETERMINED THAT WAS A LIE

elwell · on April 22, 2014

Every time I click "Allow" on "Access data on all sites" for an extension I creep closer to my security hole paranoia threshold. If it was all in JS, who cares? But this sends ajax to remote servers of course.

Am I alone?

antimatter15 · on April 23, 2014

Checking the "Disable Lookup" item on the settings menu prevents it making ajax calls to any server and does all processing locally. Of course there's a resulting drop in speed and OCR accuracy. The lookup requests are all HTTPS, are never logged, and contain no user identifying information.

erikrothoff · on April 23, 2014

That is the wording that Google Chrome chose for "allow this extension to access the DOM on any page". It sounds bad but these are the permissions an extension needs to be able to access images and text on any page.

elwell · on April 23, 2014

Yeah, or any password.

ajuc · on April 23, 2014

No. I only watched demo, because of that.

yaddayadda · on April 23, 2014

1) Very, very flippin' cool!

2) Erase Text option menu location Using version 0.7.2, the "Erase Text" option is displayed under the "Translate" section (certainly not where I would ever intentionally look for it).

3) Select Text -> Right-click changes selection After selecting my text, when I right-click the selected text often (almost always) changes. For example, with the kitten text, I selected both paragraphs, but when I right-clicked to go to Translate->Erase the first paragraph ceased to be highlighted. After erasing the second paragraph I tried in vain to select and erase the first paragraph, but everytime I'd right-click the selected paragraph only a single word would still be highlighted. I eventually tried erasing text while only one word was highlighted and the entire first paragraph was erased.

4) I really appreciate the Security & Privacy section of the project page.

5) I would love to see a Firefox version of Project Naptha!

bigbugbag · on April 23, 2014

I wonder how deep this project is in violation of the GPLv3.

For starter it's based on gnu ocrad [1] but fails to state a license and to publish any source code.

[1]: https://www.gnu.org/software/ocrad/

antimatter15 · on April 23, 2014

https://github.com/antimatter15/ocrad.js

abeisgreat · on April 23, 2014

Looking through the code, you'll see he cites everything down to blog posts which he used. As he mentioned, it's based on the already published Ocrad.js too.

JoelHobson · on April 23, 2014

This is simply incredible. I'm just blown away by it.

I wonder if you could get better performance when running locally by sending the result through a spellchecker and doing some Bayesian magic on the word choice...

iooi · on April 22, 2014

Couldn't get it to work on: http://graphics8.nytimes.com/adx/images/ADS/37/09/ad.370964/...

Also for: http://www.wsoddata.com/clients/8bec9b10/ads/300x250_static/... It can't get the top-right text correctly

Awesome tech though

antimatter15 · on April 22, 2014

One of the rules for the heuristic for what images to ignore is that it needs to have over 19,000 square px, and that first image was a bit under that.

rooted · on April 22, 2014

Very slick! Does it automatically start OCRing every image, or does it wait for a user to try to select the image text? Asking because I'm concerned about this decreasing performance.

antimatter15 · on April 22, 2014

It waits until you start selecting the image text, but the text detection starts when your cursor moves toward an image. It uses WebWorkers extensively, so on a multicore system, the performance shouldn't be hit. I haven't noticed an effect on battery life, but that's not out of the question.

tiles · on April 22, 2014

This is amazing! Is there a planned open-source license or commercialization of this?

SchizoDuckie · on April 22, 2014

Wow. Just wow. How did I live my life before this?

Once again, such a simple implementation by somebody that grabs some components that have been around for ages and mashes them up in a way that makes people question why it wasn't invented before

I've got this installed and it'll probalby never leave my chrome profiles. Keep up the awesome work!

userbinator · on April 23, 2014

I have a feeling that if you just make the OCR better, a lot of users are going to use this for entering CAPTCHAs...

vxNsr · on April 23, 2014

Doesn't seem to work on reCAPTCHA images at all.

userbinator · on April 23, 2014

Like I said, needs better OCR.

michaelchum · on April 23, 2014

I remember your 2nd place win at HackMIT, congratz again. It was THE most useful hack by far and I'm glad you've made it a public product now, and free. Wow, it seems like you beat all those years industrial OCR products... and by far. This is simply amazing, keep on the great work!!!

aalpbalkan · on April 22, 2014

Certainly a cool idea but it didn't work fine on an XKCD comic:

http://www.xkcd.com/ bottom line here is recognized as: "T1EN°5'lI'ONAl.1?E£ONNH\56PNCE(YHCEPlP6ﬁN(N)SURLH’PR3AO-i‘lDlsIr'£7E‘5IJ%z"

antimatter15 · on April 22, 2014

Randall Munroe's handwriting is a bit difficult to OCR because a lot of the letters are smushed together close enough that the it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of). Maybe Google's (or Vicarious's) magical convolutional neural net that can solve CAPTCHAs would fare better.

gorhill · on April 22, 2014

> it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of)

This made me realize I never saw such a thing as OWR, i.e. a software that would first try to recognized whole words, then go down to character level if no satisfying match found.

Found out this exists already: https://en.wikipedia.org/wiki/Intelligent_word_recognition

mikesname · on April 22, 2014

> it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of)

In my experience, the ability to handle overlapping letters (which is very common on type-written text and professionally typeset material) is one of the key things that separate the relatively lightweight OCRs (like Ocrad and GOCR) from the big complicated ones (Tesseract, Cuneiform, Abbyy etc). Whitespace character segmentation cannot be taken for granted if you want to do any useful OCR of "historical" material.

jpasden · on April 24, 2014

This is amazing, and it has truly revolutionary implications for learners of scripts like Chinese, which are still truly indecipherable to learners when embedded in images. I was really happy to see that this extension supports both simplified and traditional Chinese. I tried it out, and while it shows promise there, it definitely still needs a lot of work.

I posted a review on my blog here: http://www.sinosplice.com/life/archives/2014/04/24/can-proje...

OP, I'd be happy to work with you on improving the recognition of Chinese text. Just get in touch with me through my blog (linked to above).

m_ke · on April 22, 2014

Cool, I implemented the stroke width transform for text detection about a year ago. Nice to see someone else using implementing it, but I'm pretty sure convolutional neural nets do a better job at text localization.

plicense · on April 22, 2014

This isn't particularly awesome, because

1. The implementation of Stroke Width Transform is not super good. So far, http://libccv.org/ has the best implementation of SWT. But again, you can neither make the head nor the tail of that implementation.

2. There are just too many false text regions and the text detection accuracy is no where near what you can call good. A mixed use of multiple OCR engines might give better results.

All that said, you can't take away the cleverness of the application of detecting text. Mind == Blown, on that area.

antimatter15 · on April 22, 2014

I actually modeled my implementation after libccv's implementation. Part of what libccv seems to do is to run it multiple times at different scales, which isn't something that's very computationally feasable for a pure javascript implementation. My implementation has a second stage color filter which refines the SWT (this is something of a tradeoff that improves accuracy for machine-generated text and reduces accuracy for natural scenes, and I'm under the impression that the corpus used by SWT focuses on the latter).

Ocrad is being used as the default because it runs locally and it's small enough that it's easy to ship with. The remote OCR engine uses Tesseract which gets much closer to acceptable in a lot of circumstances.

But there is a lot of work which can be done to improve it. I have a friend who constantly nags me for not having a solid test corpus to run regression analysis/parameter tuning/science. Certainly it lacks the rigor of an academic and scientific endeavor, but I've always imagined this as a sort of advanced proof of concept. I think the application of transparent and automatic computer vision, deserves to be part of the interaction paradigm for the next generation of operating systems and browsers.

sailfast · on April 22, 2014

This looks very cool and could come in quite handy.

In case anyone from the project is monitoring - text selection did seem to work fine for me in FireFox (ESR 24.3) despite the "Not Supported" text being displayed.

zxexz · on April 23, 2014

I think the developer just meant that he hasn't yet made a FF add-on yet; the code works great for me in FF as well.

x0ner · on April 22, 2014

Extension is awesome and while the code is messy, it has enough little jokes to keep you amused. For those looking to access the backend OCR service, it seems to be down right now, but will hopefully come back up soon.

Here were the API references I could find for the remote OCR:

- GET https://sky-lighter.appspot.com/api/read/<chunk.key>

- GET https://sky-lighter.appspot.com/api/lookup?url=<image.src>

- POST https://sky-lighter.appspot.com/api/translate

Apparently the author was one of the winners of HackMIT 2013 according to some of the comments. Couple of fun things in there if you decide to poke around in the code. Jump into naptha-wick.js for the remote logic.

Note from the Dev (http://challengepost.com/users/antimatter15, http://antimatter15.com/wp/, https://twitter.com/antimatter15):

/* It's April 16, 2014.

It's been six months since I started this project.

Just under two years after I first came up with the idea.

It's weird to think of time as something that happens, to think of code as something that evolves. And it may be obvious to recognize that code is not organic, that it changes only in discrete steps as dictated by some intelligence's urging, but coupled with a faulty and mortal memory, its gradual slopes are indistinguishable from autonomy.

Hopefully, this project is going to launch soon. It looks like there's actually a chance that this will be able to happen.

The proximity of its launch has kind of been my own little perpetual delusion. During the hackathon, I announced that it would be released in two weeks time.

When winter break rolled by, I had determined to finish and release before the end of the year 2013.

This deadline rolled further way, to the end of January term, IAP as it is known. But like all the artificial dates set earlier, it too folded against the tides of procrastination.

I'll spare you February and March, but they too simply happened with a modicum of dread. This brings us to the present day, which hopefully will have the good luck to be spared from the fate of its predecessors.

After all, it is the gaseous vaporware that burns.

*/

antimatter15 · on April 22, 2014

Yeah, I made the mistake of setting the App Engine budget to $1.00. Turns out that's probably not enough for a sustained run as HN's #2.

Yeah, the code is super messy, but I'd prefer if you didn't play around too much with the remote OCR service, specifically, the translation parts because Google Translate is pretty expensive per-use.

LeBleu · on April 23, 2014

You have no donate link... if you're gonna be on big sites like HN, you might as well have a donation link so that hopefully you break even on App Engine.

steren · on April 23, 2014

Very impressive work. I'm not surprised to find antimatter15 behind it.

The website was not very clear if work was done client-side or not (mentioning server calls). It turns out that server calls can be disabled and the extension is working quite fine without. By default, I would disable this option and offer opt-in, it is better for privacy I think.

StringyBob · on April 22, 2014

This is great. I'd love to see this extended for natural images with whatever algorithm Google uses for OCR in streetview - http://googleonlinesecurity.blogspot.co.uk/2014/04/street-vi...

bananas · on April 24, 2014

This is EXACTLY what I need at the moment.

I get a big problem with various people sending me screenshots with stackdumps in. This is perfect for extracting them into the ticket bodies and it does it perfectly (I've just done 20 with it and manually checked them!)

This is the sort of stuff that really improves people's lives by making all data equal.

Craque · on April 24, 2014

Please help, it looks brilliant, however, only the test page works for me. Can't get any other pages to work. Text simply isn't selectable - cursor remains as a pointer, not an'I' :(

I'm using the latest version of Chrome on a modern Mac and have Naptha properly installed and Chrome has been relaunched.

Any hints would be appreciated.

yeukhon · on April 24, 2014

Awesome. I was actually at HackMIT. It is great to see you actually continue working on this. As a matter of fact, I told my friends who were working on similar idea for their senior project your project name last Fall. I emailed you for the Microsoft reference papers :) Not sure if I should copy and paste that.

Anyway, good luck!

tehaaron · on April 23, 2014

This is really neat. I was playing with it on pictures of street signs and buildings and realized that if I select some text and then do ctrl+a it tried to select everything it thought was text...Then I used right click > translate > reprint to see what it thought each thing was.

Here is the picture: http://thesuperslice.com/wp-content/uploads/2012/04/downtown...

And the text outcome - found it most interesting what symbols it thought it recognized:

lam

on-0'0

s.

Ic 0on

§-i-

I-*-

-unm

-$3.»;

o

G %T1

00-O

. o C-‘7' H ' .-.”-." «'~3;

.35

$16 O-O

‘D Q-=¢1

‘-M

km“

‘MIMI

DOW:

TLDR

D001”

'."'IIu

ff"

)0‘

\\

,¢-.5 ,:~L.

r/J

RyanMcGreal · on April 24, 2014

I tried it on the handwritten all-caps text on this page: http://xkcd.com/1271/

It (sort of) worked:

"I AB5ENTH|NDEDLY5ELECT RANDU1 Bl.OO<5 OFTEXTHSI READ, PND FEEL SLRONSCDUSLY SATISFIED LHEN THE HIGHUGHTED AREA |"PKE5 H 5Yl’R1ETRICHL 5|-PPE"

frankosaurus · on April 23, 2014

I had high hopes for this, as I sometimes need to manually transcribe serial numbers from customers' screenshots.

However, it seems to confuse letter O and number 0. Since serial numbers are not English words, I'm not sure how you would solve this unless you had a lookup for commonly used web fonts.

bigbugbag · on April 23, 2014

Seemed like an interesting project, clicked on the linked scanned the page an it seems to be an empty pointless web page trying to explain over pages worth of scrolling that it allows to deal with text trapped inside images which I already knew when I clicked the link.

Going back to the page after closing it once, I noticed written in smaller characters that this somewhat pointless page is for a useless extension as it is exclusively limited to the worst offender privacy wise of a web browser that I would not touch with a stick. google chrome is the new internet explorer to me as its main use is to download firefox.

In conclusion this looked promising but a confusing web page and browser lock-in renders it useless and shows that it is far from doing what it claims. "... on every image you see while browsing the web" should be "...on every image you see while browsing the web in google chrome".

No github and no open license tells me that as a linux user of opera I'm pretty much assured I will never see a version of this extension.

bigbugbag · on April 23, 2014

Not sure why my comment is downvoted, this is worthwhile (potential) user feedback/criticism.

Webpage is not to the point and design has some room for improvement. See point 1 and 2 of http://www.webpagesthatsuck.com/biggest-mistakes-in-web-desi...

Omnipresent · on April 23, 2014

This is extremely powerful for the end user. I've been doing a bit of OCR work using some pre-processing methods combined with Tesseract and OpenCV. I am curious to know how you are doing this on the fly and also as a chrome extension. Is the processing done in JS?

mbauman · on April 22, 2014

The biggest thing I'd like to see is enabling in-page (control/command-f) search. In my quick scan through the page it looks like it doesn't do that… is that right? Are there plans to add invisible text to the DOM that control-f can find?

antimatter15 · on April 22, 2014

One problem with that is that it processes images lazily. It continually extrapolates cursor moments ~1 second into the future and processes those relevant parts of relevant images. But it should be possible that after an image is processed (or even eagerly by looking up previously recognized regions from the cached OCR server), the page could be made Ctrl+F-able.

omegant · on April 22, 2014

Or add the option to the ctrl+F pop up "search inside the image" That way you save memory if you are not using it.

cornholio · on April 23, 2014

I like the way this extension removes text in the image, but I would much rather have a video delogo filter for that does not suck. It would be very useful for removing hard subtitles, station logos, screener warnings etc.

eddyb · on April 22, 2014

The Mentalist reference, anyone?

In any case, pretty cool project, I'm a bit amazed how far we've come since I've last played with OCRs (and defeated one bad CAPTCHA implementation, still in use at pastebin.com it seems).

adem · on April 22, 2014

Cool idea, definitely worth exploring the possibilities. A quick run showed me that it often interprets the "i" as "l" whenever the the gap between the line and dot is not apparant

deviltreh · on April 22, 2014

Now that is pretty damn cool. Will help at work when marketing people do not copy paste email/article and just put screenshot of it and if you want to quite something from that picture...

RaphiePS · on April 22, 2014

Saw you demo this at the hackathon session at CPW! Really, really cool.

jpdlla · on April 23, 2014

@antimatter15 any recommendations for optimizing Tesseract?

antimatter15 · on April 23, 2014

It basically runs SWT on the image, and creates a 3d Lab histogram of the colors the SWT marked as text. Then it does a morphological dilation of 10 pixels and subtracts the original mask to get the colors of the pixels that represent the background.

Then it just binarizes the image by whether the internal histogram is larger than the corresponding value of the color on the external histogram.

It's a strategy that works quite well on machine-printed text, but probably less effective than existing strategies when it comes to scans or photographs.

kylebrown · on April 23, 2014

Curious about this too. Also, what's the stack providing Tesseract-as-a-Service? According to my cursory search, Google app engine won't run Tesseract as its a native library, not an API. I'd like to try this on non-Latin/CJK hardcoded subtitles, but ocrat does latin only.

antimatter15 · on April 23, 2014

I wrote a little C program that uses TessBaseAPI to extract letter locations which gets triggered with ImageMagick's convert by a NodeJS script. The app engine frontend which acts as a caching reverse proxy.

jorangreef · on April 23, 2014

Thanks, would also appreciate recommendations for better/faster Tesseract results. Particularly with scanned document PDFs.

Tsagadai · on April 23, 2014

I have wanted an extension to do this for so long. I even started coding my own at one stage but hit various issues. Thank you so much for creating this.

jawerty · on April 23, 2014

You guys should consider making an API for this. It would be awesome to have an API that inputs images via url and outputs the text of said image.

darkhorn · on April 22, 2014

What project won the first place in HackMIT 2013?

garrettgrimsley · on April 22, 2014

A hack called "Lightboard."

http://bostinno.streetwise.co/all-series/photos-recap-and-wi...

https://github.com/vincentsiao/Lightboard

MasterScrat · on April 22, 2014

So this isn't from the same team?

I remember seeing that from the project list and really wishing I could download it right away.

Just another example that the "idea are worthless!" saying is bullshit. This was a great idea, anyone implementing it first decently would get success with it.

vanderZwan · on April 23, 2014

> In a sense that’s kind of like what a human can do: we can recognize that a sign,

Oh god... how does it finish! I need closure!

(PS: this is awesome)

antimatter15 · on April 23, 2014

Fixing that! I also have to write the entire second half of the chronology section, but at least it looks less like I pulled a "Monty Python animator".

jonnynezbo · on April 22, 2014

This is pretty cool, but not perfect. Upon copy & pasting the captured text, several of the words and letters are wrong.

swavaldez · on April 23, 2014

Everyone deserves to have this extension. It's even better if it could be a browser's default feature. ;P

Aardwolf · on April 22, 2014

I would like it if this would work on ANY text in webpages.

Too many webpages make it too hard to select even actual plain text.

krsunny · on April 23, 2014

Completely agree with the "where has this been all my life" sentiments. This is awesome, thank you.

username42 · on April 23, 2014

Just tested with a random scanned page (http://www.hpl.hp.com/research/info_theory/ShannonWeb/fullsi...) the result is almost garbage. It seems as bad as most OCR software I have encountered. This was to be expected as it is based on ocrad.

ishi · on April 23, 2014

Almost garbage? This is the OCR result for the 2nd paragraph. Almost perfect, although the last word in each line gets joined to the first one in the next line:

"The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at anotherpoint. Frequently the messages have meamlng; that is they refer to or arecorrelated according to some system with certain physical or conceptualentities. These semantic aspects of communication are irrelevant to theengineering problem. The signiﬁcant aspect is that the actual message isone selected from a set of possible messages. The system must be designedto operate for each possible selection, not just the one which will actuallybe chosen since this is unknown at the time of design."

entropy_ · on April 23, 2014

I tried it with both ocrad and tesseract modes, and indeed, the ocrad mode produces garbage, the tessaract mode produces a really good result but takes a longer time doing it(mainly the time it takes to upload the entire thing and get the result back).

That seems to make sense to me, at least. Use ocrad mode by default, if it doesn't perform well, switch to tessaract and you'll hopefully get a better result.

username42 · on April 23, 2014

When I did the test, it was garbage. Since your answer, I have repeated my test with results similar to yours.

bigbugbag · on April 23, 2014

Thanks! I wanted to try not sure it would fare better than usual OCR but was denied as I'm not a google product.

nileshtrivedi · on April 23, 2014

Very nifty. Although, it would have been even more awesome if it worked with Google Books.

bz123 · on April 23, 2014

cool idea, a bit buggy yet and when i am trying to actually save images i do get the custom extension right click bar instead of the normal chrome bar to save the image, but i guess its still under development.

swah · on April 24, 2014

Great idea, this should make a couple million for the creators.

amazd · on April 23, 2014

This seems like a great addition to my side project (amazd.com)

darkhorn · on April 22, 2014

In Firefox 31.0a1 I can copy the text only with Ctrl+C.

sourcex · on April 26, 2014

This is Awesome and very useful! BTW did he do it ?

ernestipark · on April 22, 2014

Awesome. How does this affect page performance?

valbaca · on April 22, 2014

Worth it for the dozen-click easter egg.

seshakiran · on April 22, 2014

Very nice. Will give it a try.

nemrow · on April 22, 2014

Way cool! I am impressed.

Thiz · on April 23, 2014

Magic.

Indistinguishable from magic.

atixid91 · on April 23, 2014

wow a step ahead! amazing extension....

est · on April 23, 2014

bonus points to scan QR-codes

pinaceae · on April 23, 2014

Very cool stuff, but need to satisfy my OCD:

It's spelled Naphtha (http://en.wikipedia.org/wiki/Naphtha). And for the HN hordes - read the bottom of the linked project page, it is supposed to be a reference to Naphtha.

:)

clarkm · on April 23, 2014

Isn't that like telling Google that it's actually spelled Googol?

sscalia · on April 22, 2014

Badass. Now support Good Browsers™ like Safari and Firefox.

wdewind · on April 22, 2014

Curious, what makes you find those to be better than Chrome? I recently switched to FF for a variety of random work reasons, and found it so much worse than chrome (basic UI, dev tools, speed) that I switched back asap. Maybe I'm missing something awesome about them.

ycaspirant · on April 24, 2014

I use Safari, and I find it to be better than Chrome because it's easier to sync with my iPhone and iPad, and with iCloud keychain even my passwords are synced.

wdewind · on April 25, 2014

Interesting thanks

chc · on April 22, 2014

From the phrasing, I'm guessing this is more of a statement of values than a technical assessment.

jbeja · on April 22, 2014

This my friends is called "Innovation".

batmansbelt · on April 22, 2014

Now the NSA will be reading the contents of your animated GIFs.

SoftwareMaven · on April 22, 2014

Do you really think the NSA hasn't had access to OCR technology until now?

batmansbelt · on April 22, 2014

It was a hilarious joke.

tlrobinson · on April 23, 2014

Apparently not hilarious.

batmansbelt · on April 23, 2014

Tough crowd.

SoftwareMaven · on April 23, 2014

In the immortal words of reddit: Woosh!

Sarcasm is hard to read on the internet. I'm usually pretty good at it, but this one flew right past me.

pearjuice · on April 22, 2014

The NSA reads it before its animated. What's new?

bondolo · on April 22, 2014

I can imagine quite a few blind people are creaming their jeans about now.