Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice. Here's a similar personal story with a PSA that sometimes blurring is NOT sufficient.

A friend of mine posted on Instagram a picture of a U.S. visa (or something similar; it was probably five years ago) to announce her trip to the U.S., and she took care to blur out sensitive information such as her passport number. But a Gaussian blur is easy to reverse and I successfully unblurred it and told her my discovery. I didn't use any specialized software; it was just Mathematica with its built-in ImageDeconvolve function with guessed parameters for the Gaussian kernel.

I personally recommend blacking out (add a black rectangle) instead of blurring, and if it is a PDF, convert to an image afterwards because too many PDF editors use non-destructive operations to add a new object instead of changing what's underneath.



Your advice is good, and I agree that you didn't use specialized software to reverse the blur, but this

> I didn't use any specialized software; it was just Mathematica with its built-in ImageDeconvolve function with guessed parameters for the Gaussian kernel.

is one of the most HN comments I've come across recently :)


Reminds me of the Simpson's 3D episode. Professor Frink's

>"Well, it should be obvious to even the most dimwitted individual, who holds an advanced degree in hyperbolic topology..."


Professor Frink, Professor Frink. He'll make you laugh, he'll make you think. He likes to run and then the thing with the.. person...


That monkey is going to pay...


Such an underrated character. Thank god for Futurama.


"Gleevin gliven"


That reminds me of this legendary comment: https://news.ycombinator.com/item?id=9224


Ha, I knew what that comment was before I clicked. (“Is it that rsync/ftp comment? Yup.”) ((EDIT: but it was curlftpfs, not rsync))


You'd love this follow-up Drew Houston and BrandonM thread shortly after Dropbox's IPO: https://news.ycombinator.com/item?id=16660140


Thanks for that. I hadn't seen it when it was new.

Now show me the thread where Steve Jobs gave a shoutout to CmdrTaco :)

(https://slashdot.org/story/01/10/23/1816257)


The HN equivalent of "I put on my robe and wizard hat".


god, YES! i needed this reference in my life today :)


I never realized how low that comment's ID was until now. We've all said a lot since then :)


Me too. The canonical HN comment, forever.


> is one of the most HN comments I've come across recently :)

That gave me a laugh. I don't have any experience with Mathematica, but everytime I see it mentioned (usually on HN) I'm amazed at the sheer breadth the system is capable of. The amount of use cases and possibilities blows my mind.


The top solution on this Code Golf question is possibly the most comical example of Mathematica's scope that I've ever seen: https://codegolf.stackexchange.com/questions/71631/upgoat-or...


That answer is absurd yet awe inspiring what Mathematica can do.


Yeah, TIL Mathematica knows what a goat is, and can recognize one on sight.


That statement really intrigued me. Since I like goats, I had to know how to do this.

Use ImageInstanceQ[image, object], where image is the image and object is "caprine animal". [0] [1]

[0] https://reference.wolfram.com/language/ref/ImageInstanceQ.ht...

[1] https://codegolf.stackexchange.com/questions/71631/upgoat-or...


That's a very unique nerd-snipe you just experienced. https://xkcd.com/356/


TIL Mathematica is the GOAT.


The other answers are also very clever and interesting. There are quite a few ways to determine whether the goat is up or down, and some are very simple.


The one that used reverse image search on Bing is so deliciously relatable.

On the one hand, it's perfectly built to spec and satisfied all requirements given by the customer.

On the other hand, you know it's incredibly fragile, and that the customer actually wants something different.


Whatever knocks this exchange off the top spot will be really special: https://news.ycombinator.com/item?id=35079


If it is in the installable version now, it will be in Wolfram Alpha in 5 years if you can guess the right command, and in 10 year Wolfram Alpha will just automatically select the blurred part and make a fake unblurred versions of the jpg.


Yet another example of someone mistaking the quality of a single person for the quality of a platform


> I personally recommend blacking out (add a black rectangle) instead of blurring, and if it is a PDF, convert to an image afterwards because too many PDF editors use non-destructive operations to add a new object instead of changing what's underneath.

We had a similar issue in Australia as well.

Politicians phone bills are published on the government website in summary form.

Someone in 2017 decided to blank out their phone numbers by changing the phone number text colour to white (same as background).

End result - hundreds of politicians and former prime ministers had their phone numbers leaked.

https://www.abc.net.au/news/2017-03-20/phone-numbers-of-fede...


I used to work in IT for a state based police force in Australia. Traffic reports can be requested by those involved in traffic accidents, which includes parties to the accident and their details.

People used to be able to get the personal information of police officers if they were involved, intentionally or not, in a traffic accident with a police car. They would request for the traffic accident report, and that included the personal information (including home address) of the police officers in the car. I was in QA and I tested the change when it was fixed. It now includes the address of Police HQ when a police officer is involved in a traffic incident.


Yup. I wrote a blog post about this a long time ago in 2007, and it was republished in Gizmodo in 2014: https://gizmodo.com/why-you-should-never-use-pixelation-to-h...

You can dictionary attack pixelated photos.

With Gaussian kernels, besides deconvolution you can sometimes also dictionary attack them if you have the original font and if the kernel is properly normalized kernel (i.e. most gaussian blurs).

Although I haven't tried, I think there may even be neural network based techniques that can perform even more effectively than a dictionary attack.

Separately, if the image editing tools added sufficient random noise to their mosaic filters they might be able to thwart most of these attacks, or at least make them significantly harder.


Interesting, thank you for the link. I had a hunch this should be possible but I wasn't aware that it was already proven. I used a similar trick on image recognition: turn images into a single 32 bit word by heavy pixelation and then look up a matching description. It's interesting how often that will work once you feed it with enough data. After all, that gives you 4 billion inputs mapped onto 4 billion descriptions, and plenty of those will contain the Eiffel tower with various cloudy backgrounds apparently recognized perfectly.

It's a total cheat but it is funny how close that can get you to something that might be actually useful.


I wonder if you could use adaptive optimal kernels, AOK[0]? I had used this for work on multiphase flow recognition from an electrical capacitance tomography, ECT, as a proxy for void fraction. We wanted to tinker with time-frequency representations.

[0]: https://pdfs.semanticscholar.org/20c2/b82eef0809df80a402f125...


> electrical capacitance tomography

Mind blown. Wow, that is very impressive.


Yes, that is cool. I had just come back from an internship in Wireline at Schlumberger where I was exposed to tools like one that did nuclear magnetic resonance, NMR, thousands of metres below. Pretty sweet tech. Transitioned to ECT for that project, then ECG for anomaly detection on anonymized hospital patient data. I never will underestimate the effect hair and sweat have on data. That was a cool year with lessons that served well later.


I once had to provide my employer copies of court documents proving something or other in order to qualify for the benefits plan I was attempting to enroll. The part of the document that contained the info they required also contained other information I did not want them to have, and I was more than irked at having to do this in the first place. I used Photoshop to draw a 99% black box as the redaction, but then using a 100% black font color typed in a nasty little message. Nobody was ever going to see it, but just knowing that if they did it would be a shock. I qualified for the package.


> and if it is a PDF, convert to an image afterwards because too many PDF editors use non-destructive operations to add a new object instead of changing what's underneath.

You'd be surprised at how many times this happens on Government documents with redaction.

:S


That's why some even departments now have policies of printing and re-scanning retracted documents. It is dumb, but yet pretty hard to get wrong.

Both MS Word and PDF have leaked redacted/removed information in the past. Wasting paper given the severity of some of these leaks is minimal cost.


If it is hard to get wrong, is it still dumb? Being able to verify with your own eyes that the redacted parts are indeed redacted is a pretty strong benefit to that process. You'll need to train staff to properly black out stuff (no idea what they do, heavy cardboard cut-outs or cutting out the censored content and using a black background for the scan?), but once that process is in place, it works.

With software you either need vetted and approved, very expensive software, or you have to accept a much higher error rate, because the operator cannot verify the results of the process with certainty.


Incidentally, you just wrote a pretty good argument for (political) voting on paper instead of via machines.


Absolutely. A system you can see and understand garners a lot more trust than a black-box (even if the box runs vetted and open software).


I think the correct solution is a machine that prints out both a human- and machine-readable representation of the vote. The voter can confirm that the human-readable representation is correct, and you can randomly hand-count a few boxes of ballots to check that the hand-count matches the machine-count.

An election doesn't need to be tamper-proof we just need to be able to detect tampering well enough to make tampering a loser's game.


You could do such a hybrid system, but honestly purely paper based systems seem to work well enough in practice. Eg Germany uses paper and human counting, and the results are usually available fairly quickly.

The problem with randomly hand-counting a few boxes of ballots is that you then need to convince people that the random selection was uniform and fair and actually random.

There are methods to do that, but there are at least as complicated and full of cryptographic finesse, that they ain't simpler than vetting an electronic voting system in the first place.

Having said that: human counting isn't fool proof and is still open to abuse and tampering.

It's mainly that any village idiot can in-theory audit the human-run system, and that it would take a conspiracy with lots of people to engage in wide spread tampering.

The more people involved, the harder it is to prevent leaks.


It's not just tampering one needs to worry about with elections. There's also secrecy (to prevent voter coercion).


Right, otherwise the problem would be trivial. If it wasn't clear, the plan was the printed ballot would anonymously go in a box to be machine counted.


Someone could stuff the box with extra ballots?


Yup, but they can do so with old-fashioned paper ballots too. Any security measures for paper ballots will also work with my idea, and the machine could also do fancier things like printing out a timestamp and signature of the timestamp . I really want things to be simple though: if the system of voting is too complex, then it will be distrusted, and distrust in the voting system is toxic to democracy.

What they can't trivially do with any system including paper ballots is remove ballots, compared to digital voting machines where you can add e.g. -100 votes to candidtate A, 100 votes to candidate B, thus ensuring that the total-votes field is correct while advantaging candidate B -- this was actually demonstrated by a security researcher on a Diebold touch-screen machine.


FOIA reports usually have a small textbox over the redacted information with a reference to the reason for redaction, likely made in Adobe PDF. Then the docs are either printed and scanned or just converted to an image only PDF.


Then they use the big multifunction networked printer’s built in scanner, which saves a copy to the “little” hard drive they all tend to have in them now, and forget to ensure these things get wiped/destroyed... years later they sell the printer once the lease ends and the surprise inside is months to years of raw scanned documents the new owner gets access to with very little effort.


Why don't they convert the PDF to image and convert back? This approach seems to be a lot more efficient, and less prone to other type of human errors (e.g. missing page). Is there still an attack vector?


It's a bit like point and speak checklists on aircraft - it takes a certain amount of energy to do so you can't skip it without doing it deliberately


The Japanese train system utilizes similar concepts IIRC. When I first read about this I was astonished about how effective it was [0](up to 85% error reductions)!

[0] https://www.atlasobscura.com/articles/pointing-and-calling-j...

[1] https://news.ycombinator.com/item?id=18952193


Toronto and New York City use a similar point-only system on its subway systems. Without the white gloves though.

https://www.theglobeandmail.com/canada/toronto/article-autom...


If you do that, look at the document, hit CTRL+Z, then look at the document again, it will likely look identical, thanks to the fact that rendering a PDF to a JPEG with 70-90% quality... at ~600DPI... then scaling it back out to a 75-150DPI screen... is going to look visually lossless.

So, not only do you have the energy-investment thing noted in the/a sibling comment, you have the issue that there's no giant "THIS IS AN IMAGE" or "THIS HAS TEXT IN IT" that you can just Look At and know that yeah the document is okay. There's no lowest-common-denominator provability thing. You have to hyperspecifically know what to look for (render to image) then know how to verify whether it's an image or not.

And... how do you verify if it's an image? I don't have any PDF authoring/editing software on this machine, so the only thing I can think of is checking the Undo menu for "convert to image" or similar.


There will be no CTRL + Z, as it can only be used to save to a new document (just like scanning).

Under the hood, you created a new document, rasterize the original document page by page as JPEG, and insert the JPEGs back to the new document.

You can even create a fake "printer", that outputs a PDF with rasterized images as pages, so you don't have to teach the office clerks to anything extra.

To me, it seems to be indistinguishable from printing and scanning.

PS: It's pretty easy to verify if the page contains nothing but an image, programmically, especially if you also wrote the software that rasterize it in the first place.


> It's pretty easy to verify if the page contains nothing but an image, programmically, especially if you also wrote the software that rasterize it in the first place.

It's pretty easy for a computer to verify any of this, the point is making it idiot proof. You don't have to be much of an idiot, if you process hundreds of documents a year where there's no way to visually verify the difference between a badly redacted document and a well redacted document, to screw up once. Especially when the difference between them is that you remembered to push the "redact correctly button", and if you forgot that, remembered to push the "verify if is redacted correctly programmatically" button before hitting send.

What you do is create a ritual where you have to walk across the room and use a physical machine. You'll remember doing that. And if you don't, since the output will look a bit crap, you can confirm it trivially.

Creating a process that has to be done perfectly every time or it fails catastrophically, and has few indications of failure during the process, is worse than having no process at all.


It is probably still easier to screw up on a computer than by looking at physical documents to verify them and then scanning them.


Even when the black box is done right, sometimes there are quasi side-channel leaks of the size. The box covering a name for instances may be discoverable if there are only a few names possible, and it's a small box, meaning it's the shortest name.


A friend of mine once had to review some (Swedish) court document with redacted witness names. It was a word document with history intact. Just undoing a few steps was all it took.


One of my lecturers did that back at university - they generated an Excel spreadsheet containing everyone's marks, then for each student, deleted all but that student and saved as a different file.

Document history was turned on and anyone who hit ctrl+z got the full class marks.

(The same lecturer initially failed me because they forgot to add my final exam score to my assignments score, and then took four months to fix it. They weren't very competent.)


My all-time favourite recommendation is "print, cut out the sensitive parts with an exacto knife, rescan".

Firstly because it's a nice mix of analog and digital, and secondly because it's short enough to fit in a tweet - yet extremely secure.


"Information to be withheld should be black highlighted using a tool such as the word highlighter tool like this ⬛⬛⬛⬛⬛ and then printed off. This print out should then be scanned in and saved as a PDF."

Ministry of Defence redaction policy, https://assets.publishing.service.gov.uk/government/uploads/...


...shred cut out parts, burn remains, mix with water, encase in cement, explode, divide rubble into four parts, disperse one part each in Lake Superior, Pacific Ocean, Atlantic Ocean, and the Great Salt Lake; assume an alias, move to Alaska...


This is how military redactions have been done forever. If a soldier writes home to his family and includes classified details (“I watched the sun rise over Mt Vesuvius yesterday but today we are moving west”) the censors just cut out the text with a knife.


Wouldn’t that mean they were marching into the gulf of Naples?


Obviously they wouldn't want the enemy to know their troops are amphibious.


> I personally recommend blacking out (add a black rectangle) instead of blurring

I've seen people use image editors on mobile and they'll "scribble" out sensitive information, but one of the problems is that if you pick the wrong pen it'll blend your strokes so it's not 100% opacity (but on a casual glance it's close enough). You can zoom in and change the contrast of a photo that has been redacted this way and recover information.


It's unfortunate because that's the "thicker" brush so people tend to choose it first…


A pedophile ringleader was once caught by reversing a graphical swirl he used to try to hide his face in a picture.


Yes. Wikipedia has an article about him here: https://en.wikipedia.org/wiki/Christopher_Paul_Neil


Nuts. He sexually abused multiple children and distributed pictures of this, but spent less than five years in jail and is now out.


> I personally recommend blacking out (add a black rectangle) instead of blurring

Real life document workflows can be really tricky. What if one is required to print or photocopy the obscured document? Devastating for printer's toner or cartridge lifetime... In some cases opaque grayish rectangle does the job.


White (with a black border) is fine too. Black is popular, but the goal is to make it an image with no residual data.


> Devastating for printer's toner or cartridge lifetime

Which could result in thousands of dollars of loss over decades. Is that really a significant concern? Charge the client for it.


I generally edit the sensitive part out and match it to the background of the document looks much cleaner IMHO.

However, I agree that it requires some quick hand in image manipulation software.


I found many years ago that my pay statements suffered from the last item you mentioned. My personal info had a black box over things like the SSN...but if I just moved the window around the black box followed slower than the document so everything was visible. ADP never acknowledged the problem when I brought it to their attention, but they did eventually fix it.


Sure. I would go a step further - just don’t post any photos of these sorts of documents ever. The risk and reward ratio is too skewed.


That is my argument against using any social media in a nutshell - the risk and reward ratio is too skewed.


Did the blog author actually un-blur the booking reference though? He states he tried to un-blur the barcode, was unsuccessful and then realized the booking reference was right there in the picture. Nothing about un-blurring it.


The original image was not blurred, he simply read off the plaintext booking reference. (After first trying and failing to scan the also unblurred bar code.)


>a Gaussian blur is easy to reverse

That's the most surprising thing I've read today. I assumed it was destructive.


It's lossy, but not destructive, and a 'sharpen' operation is technically the same as blur but in reverse. So you won't end up pixel-perfect after doing an 'unblur' but you will be able to make out more than you could before.


If you know anything about the probability distribution of likely inputs, it's even easier to reverse with minimal loss.

Eg knowing that the input was black text on white background or a natural image (instead of eg white noise) helps a lot.


Also if you have multiple pixelated/blurry images that helps you can reconstruct it more easily, e.g. if different newspapers print pixelated picture of the "suspect" you can reconstruct it pretty accurately.

Machine learning can also do a surprising good job of it, especially if you know what the target is (e.g. a face) https://www.vox.com/future-perfect/2019/9/4/20848008/ai-mach...

Sample code: https://gist.github.com/JonathanFly/80b669a72bf624d17b56a1cf...


> Machine learning can also do a surprising good job of it, especially if you know what the target is (e.g. a face)

Yes. Though that's just a corollary of doing better when you know something about the probability distribution of inputs.

(But a very useful and practical corollary. My formulation didn't give any hint how you might make use of that knowledge of the distribution.)


The thing to remember here is that the only way to hide (real world) data in an image is to reduce the amount of data in the picture... a blur or swirl leaves most if not all data just in the picture (although distorted) Any filter that removes data (such as pixelate or blacking out / whiting out) can be used to safely hide this data... Just remember to also strip out any unwanted meta data (Exif-data) and do not use layers but a 'flattened' version of the picture.


Pixelation is also attackable. Generate input (e.g. GAN) and apply pixelation until it converges. Probably won't be super accurate but enough to probably ID someone.

Black/delete (and flatten/rebroadcast) is the only way.


I'd worry about hallucinations when applying a GAN to a pixellated image. You'll get out a face, but who's to say that it's the correct face? Lots of people look similar.


"I personally recommend blacking out (add a black rectangle) instead of blurring, and if it is a PDF, convert to an image afterwards because too many PDF editors use non-destructive operations to add a new object instead of changing what's underneath."

I have this at work, with engineering drawings. With mobile equipment often were not dealing with engineering companies per se, and they won't or don't know how to get us CAD models of their equipment. And we often don't have the equipment on have at the time we need to make drawings.

But if you have a PDF with vector drawings, often a manual, and one or two good dimensions you can make a reasonably accurate model. AutoCAD even makes this easy with the PDFIMPORT function.

More often than I would expect, there's a whole other drawing view either covered by a white box or off-page. Once it looked like it had been drawn over with a white paintbrush tool, and if course the path of that too was also visible.


Why not use a randomized blur so people who like to do such things can waste time trying to figure it out when it's actually nothing but random numbers and has none of the original info?


Sometimes a black bar or even cropping isn't sufficient. You still have to trust the editing software.

There was a scandal around 2003 when a TV host took a topless photo, cropped it and shared the cropped photo online. Unfortunately, the software (Photoshop—I think CS3) she used to crop the photo stored the original photo as metadata if you didn't change the original filename. The original (uncropped) photo could be seen in the "Open File" preview dialog when opening the cropped version.


Blacking out is the correct thing to do.

Not cutting it so that it becomes transparent since this may still preserve the color component of the RGBA-pixels, even if it is invisible and blended with a black background.


If using for example Word you can conveniently just change the background text color to black. /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: