Hacker News new | past | comments | ask | show | jobs | submit login
Project Naptha: a browser extension that enables text selection on any image (projectnaptha.com)
1055 points by antimatter15 on April 22, 2014 | hide | past | favorite | 134 comments



Wow - this is amazing.

Right this very moment (well, a few moments ago when I wasn't procrastinating on HN) I was in the midst of extracting data from a client's old website in preparation of creating a new website.

A lot of that data is contained within images.

From a few preliminary tests, I'm hugely impressed. This seems on-par with any other OCR software I've used, and the fact that it happens in realtime in the browser is amazing.

I tried it on a piece of content I'd just had to type out, that was originally in an image. Typing out the content took about 10 minutes. Copying and pasting with Naptha, and then making some minor edits/corrections, did the same thing in about 2 minutes.


There's actually been a bit of research on the error rates you need to beat for OCR to be cost-effective vs. having people re-type. I don't have the references handy, but I believe it's generally cost effective to OCR with error rates up to nearly 2%, and most current "consumer grade" OCR is well below 1% error rates for scans that aren't absolutely atrociously poor quality.

My Msc thesis was on reducing OCR error rates by pre-processing of various forms, and while I managed to get some reduction in error rates, one of the things I found was actually that given how low the error rates generally was to begin with, you have a very tiny budget in terms of extra processing time before further error reduction just isn't worth it - if a human needs to check the document for errors anyway, a "quick and dirty" scan+OCR is often far better than even spending the time to get "as good as possible" results. Spending even a few extra seconds per page to place the page perfectly in a scanner, or waiting a few extra seconds for more complicated processing, can be a net loss.

It's a perfect example of "worse is better": OCR, at least for typed text, is good enough today that the best available solutions aren't really worthwhile to spend resources on (for users) unless/until they give results so perfect it doesn't need to be checked by a person afterwards.


It was suggested to me by a friend that to get good OCR results, run it through the scanner/OCR twice, then diff the results. Usually one or the other will get it right, and if you run the two results through a difference editor like 'meld', it's quick to fix.


That may work for some cases, and especially with horrible OCR engines and low quality scanners, but frankly when I did my research into this, the results varied extremely little from run to run, and you could usually easily identify specific artefacts in the source that tripped the engine up (rather than problems with the quality of the scan). E.g. letters that were damaged, or had run together, creases in the paper etc.

With really low res scanners I can image it could make a big difference.


Back in the late 90's I worked for a company that did a lot of OCRing and they ran the same image through multiple engines and then manually corrected the results. I think they had 3 engines, all from different companies, which processed all images and put the results into a custom format. Human beings were then employed to manually merge and correct the final text. It worked fairly well, especially considering the hardware/software available at the time.

The biggest problem was stuffing too many files into an NTFS directory. Apparently, NTFS didn't like tens of thousands of files in one directory. :)


What about running it through two+ different OCR engines?


If this is done all in software (IE, it isn't analyzing a slightly different image), why wouldn't the OCR just do this itself?


Somebody's got to decide which way to go with the diffs


majority out of an odd number of runs?


"There's actually been a bit of research on the error rates you need to beat for OCR to be cost-effective vs. having people re-type."

Doesn't that depend entirely on what you're using the text for and how accurate it needs to be?


To a certain extent, of course. The 2% was based on the assumption that if you are benchmarking against re-typing, you expect the same kind of quality you'd get from having a good typist re-typing the documents.

From my own experiments, I tend to find that you can read through and correct errors only relatively marginally faster than you can type because you either follow along with the cursor or need to be able to position the cursor very quickly when you find an error, and as the error rate increases, trying to position the cursor to each error very quickly gets too slow.

Dropping accuracy in your effort to correct the text doesn't really seem to speed things up much. You likely speed it up if you're willing to assume that anything that passes the spellchecker is ok (but it won't be, especially as modern OCR's often try to rely on data about sequences of letters, or dictionaries, when they're uncertain about characters)

If you're ok with lower accuracy, e.g. for search, and the alternative is not processing the document at all, then it'd be drastically different.


Time is not as relevant as energy when we are talking about people whose jobs have a huge strain.


Holy crap, antimatter15 does so many cool things. I keep finding things that are really cool and then scroll down to find they are all written by him. First Shinytouch, then Protobowl years later and now this. And he's only a year older than me (19) so it isn't that he's had more time. Check out his Github profile for more of his projects: http://github.com/antimatter15


I was amazed as well, same age as me. Now I feel challenged to execute even more of my ideas, well done Sir!


> Unfortunately, your browser is not yet supported, currently only Google Chrome is supported.

FF 28 seems to be working fine with the "Weenie Hut Jr." version...is it just the add-on that isn't supported?

awesome tech, btw


Yeah, I just haven't gotten around packaging the whole thing as a Firefox Addon. It's actually technically possible to run the whole thing on a normal unprivileged webpage (in fact, that's my development environment).


If all you need to do is inject Javascript into a webpage, you should be able to make a Firefox addon to do that in easily under an hour -- check out the SDK getting started guide at https://developer.mozilla.org/en-US/Add-ons/SDK/Tutorials/Ge...


In that case, why not make it available as a javascript bookmark?

http://en.wikipedia.org/wiki/Bookmarklet


Please give us a Firefox version! I'm begging you!


Seconded. This is one of the most incredibly useful projects I've seen in a long time.


Reminds me of Powersnap on the Amiga. Many applications did their own text rendering without supporting cut and paste, and so this guy called Nico Francois had the bright idea of letting you select a region of a window, and matching the standard fonts against the windows bitmap.

Of course then it was "easy": almost all the text would have been rendered with one of a tiny number of fonts available on the system, with little to no distortion.


Powersnap was amazing. I seem to recall it was usually able to figure out what font each program was using and only had to search for letters for that specific font, and only fall back to a bigger search if that failed. I might be misremembering, but regardless, it was essentially as fast as any copy-paste today, in an environment where many programs weren't even written to support it.

Even though it solved a problem we don't usually have today (this story notwithstanding), it was still one of the most amazingly useful programs ever.


You're probably right - the manual says it did. It'd be able to get the last used font from the RastPort structure used to draw to the window [1].

If the window was rendered with multiple font that wouldn't be reliable, but I guess it'd likely be "good enough" to avoid a wider search most of the time.

[1] Here's the RastPort struct from AROS (open source re-implementation of AmigaOS): http://repo.or.cz/w/AROS.git/blob/HEAD:/compiler/include/gra...


This is great news for those who have to live with disabilities.

Maybe soon I won't feel guilty for leaving my alt attributes empty.


@antimatter15, i have a project that does client-side image analysis and decompses document structures. it looks like your OCR code would be a great replacement for the server-side Tesseract ocr i currently use :)

here's what the project does now with js + web workers:

http://i.imgur.com/QvXSkY2.png

processing time is < 1500ms in Chrome and < 2000ms in FF

the code is open source, though using it isnt yet polished. i'm working slowly on a blog post series to detail how to use the lib(s). https://github.com/leeoniya/pXY.js

a walkthrough of the base lib is here: http://o-0.me/pXY/


The OCR code is an Emscripten port of the GPL-licensed Ocrad program. I published it on Github a few months ago, http://antimatter15.github.io/ocrad.js/demo.html

But in my experience, the recognition quality isn't good enough to replace Tesseract if you have that capability.


it would be very useful to maybe just use part of the code. (the part that detects where there is text, rather than what the text is)


Doesn't work great. Went to reddit's advice animal page to try it out and it doesn't seem to work with livememe (I think they have an invisible layer over their images to try and block hot linking).

Here is a copy/paste example from imgur:

http://i.imgur.com/sKQXx8v.jpg

Top: vou SAID w[ W[R[ |[AVINĞ`ON TIM[TOAV

Bottom: TN[ FACTTNATl'M MAWING TNISM[M[ g INST[AD of DRIVING D[TERMIN[D TN#rWASA ll[

Maybe it needs to be a certain font for better results. Still pretty cool. Hopefully all the kinks get worked out. I would definitely find this useful.

EDIT: need to make sure the language is set to "internet meme" and it works much better.


By default it uses Ocrad.js, a pure javascript OCR engine (ported via emscripten, see http://antimatter15.github.io/ocrad.js/demo.html). But if you right click on the selection and change the language to "Internet Meme", it should transcribe it correctly (note that this sends the selection off to a server for remote processing- It's not the default for privacy and scalability considerations at the moment).


Ah, much better!

Top: YOU SAID WE WERE LEAVING'ON TIME:TODAY

Bottom: THE FACT THAT I'M MAKING THIS MEME INSTEAD OF DRIVING DETERMINED THAT WAS A LIE

Next time I'll RTFM.


Mine automatically selects the appropriate language on the meme images. Including your link. Updated already?


I tried your same sample image. The top came out much better than the bottom:

YOU SAID WE WERE LEAVING'ON TIME:TODAY

TN[ FACTTNAT |'M MAWING TNIS M[M[ INST[AD of DRIVING D[TERMIN[D TN#rWASA ll[

I imagine that the thick outline of the font makes it hard to detect the edge of the letters, especially since it obscures the true "background".

e: using the Internet Meme language worked much better!

YOU SAID WE WERE LEAVING'ON TIME:TODAY

(:/J

THE FACT THAT I'M MAKING THIS MEME INSTEAD OF DRIVING DETERMINED THAT WAS A LIE


Every time I click "Allow" on "Access data on all sites" for an extension I creep closer to my security hole paranoia threshold. If it was all in JS, who cares? But this sends ajax to remote servers of course.

Am I alone?


Checking the "Disable Lookup" item on the settings menu prevents it making ajax calls to any server and does all processing locally. Of course there's a resulting drop in speed and OCR accuracy. The lookup requests are all HTTPS, are never logged, and contain no user identifying information.


That is the wording that Google Chrome chose for "allow this extension to access the DOM on any page". It sounds bad but these are the permissions an extension needs to be able to access images and text on any page.


Yeah, or any password.


No. I only watched demo, because of that.


1) Very, very flippin' cool!

2) Erase Text option menu location Using version 0.7.2, the "Erase Text" option is displayed under the "Translate" section (certainly not where I would ever intentionally look for it).

3) Select Text -> Right-click changes selection After selecting my text, when I right-click the selected text often (almost always) changes. For example, with the kitten text, I selected both paragraphs, but when I right-clicked to go to Translate->Erase the first paragraph ceased to be highlighted. After erasing the second paragraph I tried in vain to select and erase the first paragraph, but everytime I'd right-click the selected paragraph only a single word would still be highlighted. I eventually tried erasing text while only one word was highlighted and the entire first paragraph was erased.

4) I really appreciate the Security & Privacy section of the project page.

5) I would love to see a Firefox version of Project Naptha!


I wonder how deep this project is in violation of the GPLv3.

For starter it's based on gnu ocrad [1] but fails to state a license and to publish any source code.

[1]: https://www.gnu.org/software/ocrad/



Looking through the code, you'll see he cites everything down to blog posts which he used. As he mentioned, it's based on the already published Ocrad.js too.


This is simply incredible. I'm just blown away by it.

I wonder if you could get better performance when running locally by sending the result through a spellchecker and doing some Bayesian magic on the word choice...


Couldn't get it to work on: http://graphics8.nytimes.com/adx/images/ADS/37/09/ad.370964/...

Also for: http://www.wsoddata.com/clients/8bec9b10/ads/300x250_static/... It can't get the top-right text correctly

Awesome tech though


One of the rules for the heuristic for what images to ignore is that it needs to have over 19,000 square px, and that first image was a bit under that.


Very slick! Does it automatically start OCRing every image, or does it wait for a user to try to select the image text? Asking because I'm concerned about this decreasing performance.


It waits until you start selecting the image text, but the text detection starts when your cursor moves toward an image. It uses WebWorkers extensively, so on a multicore system, the performance shouldn't be hit. I haven't noticed an effect on battery life, but that's not out of the question.


This is amazing! Is there a planned open-source license or commercialization of this?


Wow. Just wow. How did I live my life before this?

Once again, such a simple implementation by somebody that grabs some components that have been around for ages and mashes them up in a way that makes people question why it wasn't invented before

I've got this installed and it'll probalby never leave my chrome profiles. Keep up the awesome work!


I have a feeling that if you just make the OCR better, a lot of users are going to use this for entering CAPTCHAs...


Doesn't seem to work on reCAPTCHA images at all.


Like I said, needs better OCR.


I remember your 2nd place win at HackMIT, congratz again. It was THE most useful hack by far and I'm glad you've made it a public product now, and free. Wow, it seems like you beat all those years industrial OCR products... and by far. This is simply amazing, keep on the great work!!!


Certainly a cool idea but it didn't work fine on an XKCD comic:

http://www.xkcd.com/ bottom line here is recognized as: "T1EN°5'lI'ONAl.1?E£ONNH\56PNCE(YHCEPlP6fiN(N)SURLH’PR3AO-i‘lDlsIr'£7E‘5IJ%z"


Randall Munroe's handwriting is a bit difficult to OCR because a lot of the letters are smushed together close enough that the it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of). Maybe Google's (or Vicarious's) magical convolutional neural net that can solve CAPTCHAs would fare better.


> it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of)

This made me realize I never saw such a thing as OWR, i.e. a software that would first try to recognized whole words, then go down to character level if no satisfying match found.

Found out this exists already: https://en.wikipedia.org/wiki/Intelligent_word_recognition


> it's not possible to unambiguously segment the text into distinct letters (which is a necessary first step in any OCR engine that I'm aware of)

In my experience, the ability to handle overlapping letters (which is very common on type-written text and professionally typeset material) is one of the key things that separate the relatively lightweight OCRs (like Ocrad and GOCR) from the big complicated ones (Tesseract, Cuneiform, Abbyy etc). Whitespace character segmentation cannot be taken for granted if you want to do any useful OCR of "historical" material.


This is amazing, and it has truly revolutionary implications for learners of scripts like Chinese, which are still truly indecipherable to learners when embedded in images. I was really happy to see that this extension supports both simplified and traditional Chinese. I tried it out, and while it shows promise there, it definitely still needs a lot of work.

I posted a review on my blog here: http://www.sinosplice.com/life/archives/2014/04/24/can-proje...

OP, I'd be happy to work with you on improving the recognition of Chinese text. Just get in touch with me through my blog (linked to above).


Cool, I implemented the stroke width transform for text detection about a year ago. Nice to see someone else using implementing it, but I'm pretty sure convolutional neural nets do a better job at text localization.


This isn't particularly awesome, because

1. The implementation of Stroke Width Transform is not super good. So far, http://libccv.org/ has the best implementation of SWT. But again, you can neither make the head nor the tail of that implementation.

2. There are just too many false text regions and the text detection accuracy is no where near what you can call good. A mixed use of multiple OCR engines might give better results.

All that said, you can't take away the cleverness of the application of detecting text. Mind == Blown, on that area.


I actually modeled my implementation after libccv's implementation. Part of what libccv seems to do is to run it multiple times at different scales, which isn't something that's very computationally feasable for a pure javascript implementation. My implementation has a second stage color filter which refines the SWT (this is something of a tradeoff that improves accuracy for machine-generated text and reduces accuracy for natural scenes, and I'm under the impression that the corpus used by SWT focuses on the latter).

Ocrad is being used as the default because it runs locally and it's small enough that it's easy to ship with. The remote OCR engine uses Tesseract which gets much closer to acceptable in a lot of circumstances.

But there is a lot of work which can be done to improve it. I have a friend who constantly nags me for not having a solid test corpus to run regression analysis/parameter tuning/science. Certainly it lacks the rigor of an academic and scientific endeavor, but I've always imagined this as a sort of advanced proof of concept. I think the application of transparent and automatic computer vision, deserves to be part of the interaction paradigm for the next generation of operating systems and browsers.


This looks very cool and could come in quite handy.

In case anyone from the project is monitoring - text selection did seem to work fine for me in FireFox (ESR 24.3) despite the "Not Supported" text being displayed.


I think the developer just meant that he hasn't yet made a FF add-on yet; the code works great for me in FF as well.


Extension is awesome and while the code is messy, it has enough little jokes to keep you amused. For those looking to access the backend OCR service, it seems to be down right now, but will hopefully come back up soon.

Here were the API references I could find for the remote OCR:

- GET https://sky-lighter.appspot.com/api/read/<chunk.key>

- GET https://sky-lighter.appspot.com/api/lookup?url=<image.src>

- POST https://sky-lighter.appspot.com/api/translate

Apparently the author was one of the winners of HackMIT 2013 according to some of the comments. Couple of fun things in there if you decide to poke around in the code. Jump into naptha-wick.js for the remote logic.

Note from the Dev (http://challengepost.com/users/antimatter15, http://antimatter15.com/wp/, https://twitter.com/antimatter15):

/* It's April 16, 2014.

It's been six months since I started this project.

Just under two years after I first came up with the idea.

It's weird to think of time as something that happens, to think of code as something that evolves. And it may be obvious to recognize that code is not organic, that it changes only in discrete steps as dictated by some intelligence's urging, but coupled with a faulty and mortal memory, its gradual slopes are indistinguishable from autonomy.

Hopefully, this project is going to launch soon. It looks like there's actually a chance that this will be able to happen.

The proximity of its launch has kind of been my own little perpetual delusion. During the hackathon, I announced that it would be released in two weeks time.

When winter break rolled by, I had determined to finish and release before the end of the year 2013.

This deadline rolled further way, to the end of January term, IAP as it is known. But like all the artificial dates set earlier, it too folded against the tides of procrastination.

I'll spare you February and March, but they too simply happened with a modicum of dread. This brings us to the present day, which hopefully will have the good luck to be spared from the fate of its predecessors.

After all, it is the gaseous vaporware that burns.

*/


Yeah, I made the mistake of setting the App Engine budget to $1.00. Turns out that's probably not enough for a sustained run as HN's #2.

Yeah, the code is super messy, but I'd prefer if you didn't play around too much with the remote OCR service, specifically, the translation parts because Google Translate is pretty expensive per-use.


You have no donate link... if you're gonna be on big sites like HN, you might as well have a donation link so that hopefully you break even on App Engine.


Very impressive work. I'm not surprised to find antimatter15 behind it.

The website was not very clear if work was done client-side or not (mentioning server calls). It turns out that server calls can be disabled and the extension is working quite fine without. By default, I would disable this option and offer opt-in, it is better for privacy I think.


This is great. I'd love to see this extended for natural images with whatever algorithm Google uses for OCR in streetview - http://googleonlinesecurity.blogspot.co.uk/2014/04/street-vi...


This is EXACTLY what I need at the moment.

I get a big problem with various people sending me screenshots with stackdumps in. This is perfect for extracting them into the ticket bodies and it does it perfectly (I've just done 20 with it and manually checked them!)

This is the sort of stuff that really improves people's lives by making all data equal.


Please help, it looks brilliant, however, only the test page works for me. Can't get any other pages to work. Text simply isn't selectable - cursor remains as a pointer, not an'I' :(

I'm using the latest version of Chrome on a modern Mac and have Naptha properly installed and Chrome has been relaunched.

Any hints would be appreciated.


Awesome. I was actually at HackMIT. It is great to see you actually continue working on this. As a matter of fact, I told my friends who were working on similar idea for their senior project your project name last Fall. I emailed you for the Microsoft reference papers :) Not sure if I should copy and paste that.

Anyway, good luck!


This is really neat. I was playing with it on pictures of street signs and buildings and realized that if I select some text and then do ctrl+a it tried to select everything it thought was text...Then I used right click > translate > reprint to see what it thought each thing was.

Here is the picture: http://thesuperslice.com/wp-content/uploads/2012/04/downtown...

And the text outcome - found it most interesting what symbols it thought it recognized:

lam

on-0'0

s.

Ic 0on

§-i-

I-*-

-unm

-$3.»;

o

G %T1

00-O

. o C-‘7' H ' .-.”-." «'~3;

.35

$16 O-O

‘D Q-=¢1

‘-M

km“

‘MIMI

DOW:

TLDR

D001”

'."'IIu

ff"

)0‘

\\

,¢-.5 ,:~L.

r/J


I tried it on the handwritten all-caps text on this page: http://xkcd.com/1271/

It (sort of) worked:

"I AB5ENTH|NDEDLY5ELECT RANDU1 Bl.OO<5 OFTEXTHSI READ, PND FEEL SLRONSCDUSLY SATISFIED LHEN THE HIGHUGHTED AREA |"PKE5 H 5Yl’R1ETRICHL 5|-PPE"


I had high hopes for this, as I sometimes need to manually transcribe serial numbers from customers' screenshots.

However, it seems to confuse letter O and number 0. Since serial numbers are not English words, I'm not sure how you would solve this unless you had a lookup for commonly used web fonts.


Seemed like an interesting project, clicked on the linked scanned the page an it seems to be an empty pointless web page trying to explain over pages worth of scrolling that it allows to deal with text trapped inside images which I already knew when I clicked the link.

Going back to the page after closing it once, I noticed written in smaller characters that this somewhat pointless page is for a useless extension as it is exclusively limited to the worst offender privacy wise of a web browser that I would not touch with a stick. google chrome is the new internet explorer to me as its main use is to download firefox.

In conclusion this looked promising but a confusing web page and browser lock-in renders it useless and shows that it is far from doing what it claims. "... on every image you see while browsing the web" should be "...on every image you see while browsing the web in google chrome".

No github and no open license tells me that as a linux user of opera I'm pretty much assured I will never see a version of this extension.


Not sure why my comment is downvoted, this is worthwhile (potential) user feedback/criticism.

Webpage is not to the point and design has some room for improvement. See point 1 and 2 of http://www.webpagesthatsuck.com/biggest-mistakes-in-web-desi...


This is extremely powerful for the end user. I've been doing a bit of OCR work using some pre-processing methods combined with Tesseract and OpenCV. I am curious to know how you are doing this on the fly and also as a chrome extension. Is the processing done in JS?


The biggest thing I'd like to see is enabling in-page (control/command-f) search. In my quick scan through the page it looks like it doesn't do that… is that right? Are there plans to add invisible text to the DOM that control-f can find?


One problem with that is that it processes images lazily. It continually extrapolates cursor moments ~1 second into the future and processes those relevant parts of relevant images. But it should be possible that after an image is processed (or even eagerly by looking up previously recognized regions from the cached OCR server), the page could be made Ctrl+F-able.


Or add the option to the ctrl+F pop up "search inside the image" That way you save memory if you are not using it.


I like the way this extension removes text in the image, but I would much rather have a video delogo filter for that does not suck. It would be very useful for removing hard subtitles, station logos, screener warnings etc.


The Mentalist reference, anyone?

In any case, pretty cool project, I'm a bit amazed how far we've come since I've last played with OCRs (and defeated one bad CAPTCHA implementation, still in use at pastebin.com it seems).


Cool idea, definitely worth exploring the possibilities. A quick run showed me that it often interprets the "i" as "l" whenever the the gap between the line and dot is not apparant


Now that is pretty damn cool. Will help at work when marketing people do not copy paste email/article and just put screenshot of it and if you want to quite something from that picture...


Saw you demo this at the hackathon session at CPW! Really, really cool.


@antimatter15 any recommendations for optimizing Tesseract?


It basically runs SWT on the image, and creates a 3d Lab histogram of the colors the SWT marked as text. Then it does a morphological dilation of 10 pixels and subtracts the original mask to get the colors of the pixels that represent the background.

Then it just binarizes the image by whether the internal histogram is larger than the corresponding value of the color on the external histogram.

It's a strategy that works quite well on machine-printed text, but probably less effective than existing strategies when it comes to scans or photographs.


Curious about this too. Also, what's the stack providing Tesseract-as-a-Service? According to my cursory search, Google app engine won't run Tesseract as its a native library, not an API. I'd like to try this on non-Latin/CJK hardcoded subtitles, but ocrat does latin only.


I wrote a little C program that uses TessBaseAPI to extract letter locations which gets triggered with ImageMagick's convert by a NodeJS script. The app engine frontend which acts as a caching reverse proxy.


Thanks, would also appreciate recommendations for better/faster Tesseract results. Particularly with scanned document PDFs.


I have wanted an extension to do this for so long. I even started coding my own at one stage but hit various issues. Thank you so much for creating this.


You guys should consider making an API for this. It would be awesome to have an API that inputs images via url and outputs the text of said image.


What project won the first place in HackMIT 2013?



So this isn't from the same team?

I remember seeing that from the project list and really wishing I could download it right away.

Just another example that the "idea are worthless!" saying is bullshit. This was a great idea, anyone implementing it first decently would get success with it.


> In a sense that’s kind of like what a human can do: we can recognize that a sign,

Oh god... how does it finish! I need closure!

(PS: this is awesome)


Fixing that! I also have to write the entire second half of the chronology section, but at least it looks less like I pulled a "Monty Python animator".


This is pretty cool, but not perfect. Upon copy & pasting the captured text, several of the words and letters are wrong.


Everyone deserves to have this extension. It's even better if it could be a browser's default feature. ;P


I would like it if this would work on ANY text in webpages.

Too many webpages make it too hard to select even actual plain text.


Completely agree with the "where has this been all my life" sentiments. This is awesome, thank you.


Just tested with a random scanned page (http://www.hpl.hp.com/research/info_theory/ShannonWeb/fullsi...) the result is almost garbage. It seems as bad as most OCR software I have encountered. This was to be expected as it is based on ocrad.


Almost garbage? This is the OCR result for the 2nd paragraph. Almost perfect, although the last word in each line gets joined to the first one in the next line:

"The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at anotherpoint. Frequently the messages have meamlng; that is they refer to or arecorrelated according to some system with certain physical or conceptualentities. These semantic aspects of communication are irrelevant to theengineering problem. The significant aspect is that the actual message isone selected from a set of possible messages. The system must be designedto operate for each possible selection, not just the one which will actuallybe chosen since this is unknown at the time of design."


I tried it with both ocrad and tesseract modes, and indeed, the ocrad mode produces garbage, the tessaract mode produces a really good result but takes a longer time doing it(mainly the time it takes to upload the entire thing and get the result back).

That seems to make sense to me, at least. Use ocrad mode by default, if it doesn't perform well, switch to tessaract and you'll hopefully get a better result.


When I did the test, it was garbage. Since your answer, I have repeated my test with results similar to yours.


Thanks! I wanted to try not sure it would fare better than usual OCR but was denied as I'm not a google product.


Very nifty. Although, it would have been even more awesome if it worked with Google Books.


cool idea, a bit buggy yet and when i am trying to actually save images i do get the custom extension right click bar instead of the normal chrome bar to save the image, but i guess its still under development.


Great idea, this should make a couple million for the creators.


This seems like a great addition to my side project (amazd.com)


In Firefox 31.0a1 I can copy the text only with Ctrl+C.


This is Awesome and very useful! BTW did he do it ?


Awesome. How does this affect page performance?


Worth it for the dozen-click easter egg.


Very nice. Will give it a try.


Way cool! I am impressed.


Magic.

Indistinguishable from magic.


wow a step ahead! amazing extension....


bonus points to scan QR-codes


Very cool stuff, but need to satisfy my OCD:

It's spelled Naphtha (http://en.wikipedia.org/wiki/Naphtha). And for the HN hordes - read the bottom of the linked project page, it is supposed to be a reference to Naphtha.

:)


Isn't that like telling Google that it's actually spelled Googol?


Badass. Now support Good Browsers™ like Safari and Firefox.


Curious, what makes you find those to be better than Chrome? I recently switched to FF for a variety of random work reasons, and found it so much worse than chrome (basic UI, dev tools, speed) that I switched back asap. Maybe I'm missing something awesome about them.


I use Safari, and I find it to be better than Chrome because it's easier to sync with my iPhone and iPad, and with iCloud keychain even my passwords are synced.


Interesting thanks


From the phrasing, I'm guessing this is more of a statement of values than a technical assessment.


This my friends is called "Innovation".


Now the NSA will be reading the contents of your animated GIFs.


Do you really think the NSA hasn't had access to OCR technology until now?


It was a hilarious joke.


Apparently not hilarious.


Tough crowd.


In the immortal words of reddit: Woosh!

Sarcasm is hard to read on the internet. I'm usually pretty good at it, but this one flew right past me.


The NSA reads it before its animated. What's new?


I can imagine quite a few blind people are creaming their jeans about now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: