Like other commenters point out, automatic OCR on Apple platforms is a godsend, ...

avree · 2025-11-11T02:50:03 1762829403

OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.

There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.

manwe150 · 2025-11-11T03:35:01 1762832101

But apps (most notably Snapchat comes to mind) have been doing exactly that analysis though. Theoretically they could then [offer to] edit the photo immediately afterwards to add context, since they had access to the photo roll or files https://android.stackexchange.com/a/119767

aexer0e · 2025-11-11T20:05:55 1762891555

> 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing sounds like a privacy nightmare

The apps don't have to know a screenshot was taken for this feature to exist; they could write into a passive "in case a screenshot is taken, use this as metadata" object data field that the OS uses when the user takes a screenshot

m463 · 2025-11-11T21:07:36 1762895256

I agree

deep linking allows apps to know/intercept known URLs and do "things". I don't know if the screenshot mechanism would involve this.

I do know that some things cannot be screenshotted. On macs this is any HDCP image on the screen (shows up as a blank rectangle). On android I believe some apps cannot be captured in a screenshot. Don't know about ios.

paulmooreparks · 2025-11-11T02:48:25 1762829305

OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.

heddycrow · 2025-11-11T04:44:31 1762836271

I was content to write the original off as "to each his own", but this one I feel you on.

Maybe the problem is sharing without caring and/or without being aware.

Case in point, folks capture large blocks of text as you mentioned and paste it into slack which converts certain characters unless included in a code block. This can be much worse than sharing a screenshot.

Please know the best way to share what you are sharing when you share. I've had to come to expect this request will not be honored.

I also might be guilty of not honoring sharing with caring myself. For example, I didn't read this entire thread before posting; others may have made this exact point already.

pests · 2025-11-11T02:41:09 1762828869

> It feels like screenshots have become the de facto common denominator in our mobile computing era,

Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.

Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.

NooneAtAll3 · 2025-11-11T04:18:11 1762834691

> Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded

bloody hell of all privacy concerns

gyomu · 2025-11-11T07:32:58 1762846378

Why? Either it's public content, and it can be traced back manually anyways (screenshots from social media posts typically include the username), or it's private content and knowing the URL slug doesn't change anything (the fact that you're sharing a screenshot of private content is the privacy breach, not the fact that some UUID is embedded).

flemhans · 2025-11-11T04:54:04 1762836844

Fun side-fact: The original MacPaint, while in development, had an "ocr" copy feature, albeit much simpler of course.

It didn't make it in the release version out of fear that people would use MacPaint as a Word Processor.

ggirelli · 2025-11-11T08:01:42 1762848102

Why spend electricity and time to read the text in a screenshot, and then more time making sure there are no mistakes. When the sender could have just copied the original text?

pjc50 · 2025-11-11T10:43:47 1762857827

> metadata in screenshots

Interesting idea, but I think this understates how often screenshots are "slightly adversarial". I'm taking a screenshot because the app or webpage has deliberately made it hard to select text for some reason. Or the UI is just annoying about selection (e.g. trying to select the text from a link anchor without being considered as having clicked on it, which is fiddly on Android).

Then there's the question of fully adversarial screenshots. I can definitely see why people want "I want to send this to someone and discourage them from seamlessly resharing it", but at the same time: it's my screen. Not generally a problem on desktops unless you're dealing with video content.

PeterStuer · 2025-11-11T12:05:41 1762862741

Honestly, why are you developing software if you are "confused when it comes to files"?

Your OCR isn't going to help you for the missing off-screenshot clipped parts.

epigramx · 2025-11-11T03:42:50 1762832570

OCR is not AI

1gn15 · 2025-11-11T05:35:22 1762839322

    AI is whatever hasn't been done yet.
        — Larry Tesler, 1970

Source: https://en.wikipedia.org/wiki/AI_effect

MathMonkeyMan · 2025-11-11T03:59:18 1762833558

Yes but they're quite good at it. Reliable OCR is font dependent, whereas I think a lot of models just kind of figure it out regardless.

paulmooreparks · 2025-11-11T04:56:59 1762837019

One reason I don't quite trust AI for OCR is that it will, on occasion, hallucinate the output.

ssl-3 · 2025-11-11T05:35:52 1762839352

All OCR is untrustworthy. But sometimes, OCR is useful. (And I've heard it said that all LLM output is a hallucination; the good outputs are just hallucinations that fit.)

A few months ago a warehouse manager sent us a list of serial numbers and the model numbers of some gear they were using -- with both fields being alphanumeric.

This list was hand-written on notebook paper, in pencil. It was photographed with a digital camera under bad lighting, and that photograph was then emailed.

The writing was barely legible. It was hard to parse. It was awful. It made my boss's brain hurt trying to work with it, and then he gave it to me and it made my brain hurt too.

If I had to read this person's writing every day I would have gotten used to it eventually, but in all likelihood I'll never read something this person has written ever again. I didn't want to train myself for that and I didn't have enough of a sampleset to train with, anyway.

And if it were part of a high-school assignment it would have been sent back with a note at the top that said "Unreadable -- try again."

But it wasn't a high school student, and I wasn't their teacher. They were a paying customer and this list was worth real money to us.

I shoved it into ChatGPT and it produced output that was neatly formatted into a table just as I specified with my minimal instruction ("Read this. Make a table.").

The quality was sufficient to allow us to fairly quickly compare the original scribbles to the OCR output, make some manual corrections that we humans knew how to do (like "6" was sometimes transposed with "G"), and get a result that worked for what we needed to accomplish without additional pain.

0/10. I'm glad it worked and I hope I never have to do that again, but will repeat if I must.

number6 · 2025-11-11T06:36:05 1762842965

There was a good talk some years ago at some of the CCC events where some guy found out that scanners sometimes change numbers on forms.

kiliankoe · 2025-11-11T09:31:29 1762853489

It's David Kriesel's infamous talk about the even more infamous Xerox bug.

Talk: https://media.ccc.de/v/31c3_-_6558_-_de_-_saal_g_-_201412282...

Bug: https://en.wikipedia.org/wiki/Xerox#Character_substitution_b...

pylotlight · 2025-11-11T05:30:10 1762839010

But AI can OCR

simianparrot · 2025-11-11T06:46:58 1762843618

They do so by running the image through an OCR tool call

dpoloncsak · 2025-11-11T20:28:31 1762892911

They can, sure...that's really just LLMs though.

ML models to recognize handwriting have existed way before LLMs could call tools, though

Identifying digits is like the "Hello World!" of ML

https://www.youtube.com/watch?v=aircAruvnKk

simianparrot · 2025-11-12T07:07:32 1762931252

An OCR tool is ML. AI is generally used to mean LLM’s. You’re repeating what I already wrote

stavros · 2025-11-11T08:09:31 1762848571

No they don't, they natively "see" images.

number6 · 2025-11-11T06:34:25 1762842865

That's a thing I always marvel about - how LLMs are so versatile and do so much stuff so good that was out of reach just some years ago

docmars · 2025-11-11T18:57:25 1762887445

Especially when you consider how expensive "good" OCR software is

ponector · 2025-11-11T09:35:54 1762853754

On apple platforms it definitely is an AI. Apple intelligence!

9rx · 2025-11-11T07:26:33 1762845993

AI says that OCR is AI.

radarsat1 · 2025-11-11T03:46:02 1762832762

God of the gaps