US Government plans to develop AI that can unmask anonymous writers

dehrmann · on Sept 30, 2022

Plans? I assumed lots of people were already working on this. There's already a lot of training data out there, and I suspect most users can be identified by use of a handful of uncommon trigrams and sentence stats. I know you can recognize things I've written at work because they have real em dashes—people rarely type with those.

badrabbit · on Sept 30, 2022

NSA already does this. "Stylometry" I believe is the term. Perhaps they used heuristics and algorithms so far? I would NLP was good enough for this years ago.

dehrmann · on Sept 30, 2022

Back in 2015 you might have some linguists work with data scientists to do feature engineering and use those as inputs to an LR model. I suppose you can let a deep learning model feature engineer for you, but either way, you'll get to some of the same heuristics you're thinking of.

canjobear · on Oct 1, 2022

I taught this as a routine class project in 2018.

geysersam · on Oct 1, 2022

What training data did you use in that project?

jacooper · on Oct 1, 2022

Didn't they de-annonymize Satoshi through analyzing his "stylometry" too?

ProllyInfamous · on Oct 1, 2022

Two of the people who [likely] made up "Satoshi"s core team are now deceased, RIP (Hal Finney [ALS] & Len Sussman [Suicide]).

midmagico · on Oct 1, 2022

Both of these have been widely debunked as Satoshi candidates, especially the latter.

MonkeyMalarky · on Sept 30, 2022

Someone unintentionally did something similar with HN comments to find users who sound most similar to you/eachother and people were finding their throwaway and alt accounts.

vmoore · on Sept 30, 2022

Yes I recall that: 'Find Your Hacker News Doppelgänger':

https://news.ycombinator.com/item?id=27568709

dhosek · on Sept 30, 2022

Interesting. There was commentary about finding anonymous/throwaway accounts, but on mine, I did not find my anonymous account (although I use that very rarely). The accounts that turned up seemed to be all real, and in a couple cases I could guess what might have made the checker match us (e.g., mentions of MFAs or Apple //e or similar politics), but not all. I didn’t notice any linguistic similarities.

rkagerer · on Sept 30, 2022

Yeah back then I felt mine didn't produce good matches either.

ravenstine · on Sept 30, 2022

Neat experiment, though it took me fewer than 30 seconds to rule out my nearest "doppelganger". Too many patterns there I've never used.

matheusmoreira · on Oct 1, 2022

That's interesting. I wonder if they will periodically update the data set to reflect new posts.

RobRivera · on Sept 30, 2022

I'm a big counter intelligence man, so I love deliberately inserting randomness into potential identification metrics.

In the words of martha stewart, it's a good thing

boomboomsubban · on Oct 1, 2022

You probably don't want to associate yourself with a convicted criminal as a deflection technique.

bombcar · on Oct 1, 2022

We both have unfortunate usernames when it comes to that

RobRivera · on Oct 1, 2022

I'm supportive of helping our convicts reintegrate into our society as healthy contributors and seeing they receive the support they need to find their niche in our diverse economy and social landscape.

CTDOCodebases · on Oct 1, 2022

Remember to do it with your code as well.

I doubt TrueCrypt developers decided that it was Not Secure As Microsoft’s bitlocker after considering the paper that got released the following year.[0]

[0] - https://www.usenix.org/system/files/conference/usenixsecurit...

matheusmoreira · on Oct 1, 2022

I wonder if there is a way to do that automatically.

RobRivera · on Oct 1, 2022

rng shuffling of style and library usages, paradigms. and naming convention. camelCase vs NOCAMELCASE based on a coinflip, etc. leverage Lint static analyzer tools to enforce prior to submission.

lot of fun ways to stick it to those bulk surveillance data lakes that may be the targets of such training sets

kps · on Sept 30, 2022

I use em dashes — though with space — and also ellipses… Also 2×4s (that are actually 1½×3½), where 2 ≠ 4 above −270°C, and footnotes¹ and all that.

¹ There.

edgyquant · on Sept 30, 2022

Great. Even easier then

lajamerr · on Sept 30, 2022

The next step after is to build an AI to convert your style of writing to another person's style.

notadev · on Sept 30, 2022

Running all writing through AI to re-write everything in a different style while conveying the same information might work. Create a way to apply it to all online writing and you have something like a writing VPN.

klabb3 · on Sept 30, 2022

Exactly. You barely need "AI" for this. Changing style enough to put some sand in a sophisticated text identifier could be as simple as introducing some spelling and grammar variations. It could easily be commoditized (isn't grammarly doing this already minus the privacy part?). Of all cat and mouse games LE and intelligence are playing, for this one I'm betting on the mouse.

autoexec · on Oct 1, 2022

Just write all your 'sensitive' texts in leetspeak, use very few words, and very short words and it should be fine. It's like the typing equivalent of cutting letters out of magazines.

iinnPP · on Oct 1, 2022

I proposed this as an automated process for our provinces social services/inquiries email. The idea being that the text would be harder to use as a method of discrimination.

noir_lord · on Sept 30, 2022

You can probably spot me at work because I use really simple words.

I write obviate then I re-read and replace it with “removes the need to do $foo.

Some writers at work forget that the goal of the written word is to communicate to the reader not impress them with word salad.

I regard it the same as clever code (note: not the same as complex code) - written for the author not the reader.

cbozeman · on Oct 1, 2022

William Faulkner:

"Ernest Hemingway: he has no courage, has never crawled out on a limb. He has never been known to use a word that might cause the reader to check with a dictionary to see if it is properly used."

Ernest Hemingway:

“Poor Faulkner. Does he really think big emotions come from big words? He thinks I don’t know the ten-dollar words. I know them all right. But there are older and simpler and better words, and those are the ones I use."

mdp2021 · on Oct 1, 2022

> not impress them with word salad

No. Look, I had some "fight" in these pages two days ago because somebody complained of my style.

It is the farthest from legitimate to suppose that one's intention is to impress - that is accusing people of childishness, and you need to have solid grounds to suppose that. There are higher chances that one wants to be _precise_.

That would be precision in the expression of some content, to have such expression match as perfectly as possible a defined structure of thoughts you elaborated. You may want to describe the facts precisely, and/or the mental processes that those facts provoke.

Also some image subtly (and not so subtly) emerged, days ago, of "language as a cage": but if you want to express exactly some "/that/", it is the language that will have to provide the means - as opposed to a horrifying idea of "expressing what language allows".

There are contexts in which it is of paramount importance that the reader understands the message (e.g. instructions) - pay attention towards using the most digestible expression then. But when you want your content to be expressed as the defined sculpture of structures of ideas, while many different expressions are possible they are pretty much "what they are, because they are what they should be".

SoftTalker · on Oct 1, 2022

"Don't use a five-dollar word when a fifty-cent word will do."

I would have sworn that was from Elements of Style but apparently it's a Mark Twain quote.

MaysonL · on Oct 1, 2022

“The difference between the right word and the almost right word is the difference between lightning and the lightning bug.” Mark Twain

groffee · on Oct 1, 2022

Arguably it's that kind of thinking that's destroyed society. Dumbing things down to the lowest common denominator.

Shorel · on Oct 1, 2022

I agree with the sentiment, but in some cases it is not dumbing down but making it clearer.

For me, the gold standard of this is Carl Sagan writings. Beautiful ideas, explained in the clearest language possible.

denton-scratch · on Oct 1, 2022

Don't agree. Like the commenter upthread, I have for some years had the feeling that my language is bloated with subordinate clauses, and that I use too many ten-dollar words. It's easy to use flowery language to conceal from yourself the shallowness of your understanding. So I get out my Xacto knife, and cut out the crap. Sometimes.

mdp2021 · on Oct 1, 2022

> destroy[ing] society

More precisely: not just having people less intellectually exercised¹, but also and especially creating some sort of "demand" that "things have to be simple". Non-recognition of complexity is also one of the cracks allowing populism. Your duty is to "spend one further thought", while some instances seem to defend an idea of an "as-if-constitutional right to reduced consideration".

(¹ I wrote the other day, «I also believe that "Now drop and give me twenty" will return us fitter personnel than "Please, be fed from a straw"»)

cbozeman · on Oct 1, 2022

To quote the late philosopher Dr. Rick Roderick, "Deeply rooted in our culture is anti-intellectualism; our fear of eggheads. The work of intellectuals has always been separated off from the work of ordinary people - you have to be freed from the constraints of manual labor. When I was a dishwasher, I didn't have time to do this. Any time I was involved in manual labor, I didn't really have the time to do this intellectual work."

Most people do not have time, nor desire to do additional intellectual work after slaving away doing whatever it is they do for income.

im3w1l · on Oct 1, 2022

When I compare modern thinkers to those of say 150 years ago, I find that modern thinkers not only use a simpler style, the ideas are better too.

fsckboy · on Oct 1, 2022

As a sesquipedalian, my writing is sesquipedalian.

https://en.wiktionary.org/wiki/sesquipedalian

club_tropical · on Oct 1, 2022

Your example doesn't match the principles you highlighted. Obviate->remove is replacing a perfect-fit with a generic alternative. "Obviate" was neither a word salad nor complex, it was straight up (marginally) better. This was a tradeoff to appeal to people whose vocabulary is narrow, not improve comprehension.

noir_lord · on Oct 1, 2022

People reading what I write are frequently users of English as a second language.

Expecting most of the English people to know obviate is pushing it, for someone using English as a second language it’s just adding complexity.

It’s about aiming where you expect the average reader to be.

Because you know the word don’t fall into the trap of assuming everyone else does.

Shorel · on Oct 1, 2022

Many times it's the other way around.

Obviate has several cognates in romance languages, so it is straightforward.

Now, the phrasal verbs that are so obvious to a native speaker, these are the real head-scratchers.

Or street slang. It may sound like dumbing it down to you, but it actually adds complexity for a non-native.

Al-Khwarizmi · on Oct 1, 2022

Indeed.

"Register": I immediately know what it means and it takes tenths of a second for me to make the decision to click (or not to).

"Sign up": despite having a CEFR C2 level of English and using it at work every day, I still need to think for a few seconds if this is a registration or a login (cf. "sign in").

For a native speaker I suppose the latter is (marginally?) easier, but for a non-native, it's much harder, not even close.

jsjohnst · on Oct 1, 2022

Your examples are very illustrative, but I would argue that both phrases suffer from being too generic. I wish that there wasn’t such a push for “one word” CTAs. Examples of alternatives I’d advocate for depending on context:

“Create an account”

“Subscribe to mailing list”

“Submit contest entry”

groffee · on Oct 1, 2022

Treating everyone like an idiot is just doing you and them a disservice. The word exists for a reason, use it.

retrac · on Oct 1, 2022

As a native speaker, it's very unpleasant to write like that. I will if necessary for effective communication, but it's like wearing a straitjacket. It can be awful to read, too. Rarer synonyms and the phrasal verbs (both commonly avoided in a simple style) are sometimes necessary to avoid excessive repetition.

It is one of the downsides of being a native speaker. I write pretty much like how I speak. It's hard to use it like a foreign language, but that's what technical international English feels like to me, in a way. In my experience, fluent second language learners master that style more easily than native speakers do.

dlkf · on Oct 1, 2022

> This was a tradeoff to appeal to people whose vocabulary is narrow, not improve comprehension.

”Comprehension” is not some audience-independent property of a text. Comprehension is what happens when a text is well-calibrated for an audience. If your audience is unlikely to know the word ”obviate” (which seems true of many engineering settings, where the audience is international), you absolutely improve comprehension by replacing it with words the audience is more likely to know.

I agree with you that ”obviate” sounds better, and in a setting like a blog post, monograph, or even an hn comment, that’s probably what I’d use. But in e.g. a work email for colleagues in another country, you reduce the risk of miscommunication by following OP’s suggestion.

Timpy · on Oct 1, 2022

Isn't that how communication works? Understanding your audience and "appealing" to them does improve comprehension for them.

mdp2021 · on Oct 1, 2022

> your audience

There lies the spring coil. The less it is established what your audience is, the more you shift from "calibrated expression" to a form of expression which is as absolute and universal as possible. (See my previous post here, at https://news.ycombinator.com/item?id=33045225 .) If you have a message for an audience, you will try and condition its expression according the audience; the more the audience is abstract, the more the expression will be unconditioned, as if intended for an ideal audience.

BurningFrog · on Oct 1, 2022

Yeah, using words you're sure your audience knows just is communication, to me.

That said, there is of course a gray zone here.

vixen99 · on Oct 1, 2022

Nit pick. Fair point in some ways but it depends on your audience. If the latter consists primarily of folk with an extensive vocabulary that enables them to detect the nuance that might be intended by the use of a supposedly unusual word (like obviate) then they won't know what you mean by 'word salad' unless you're using some word inappropriately. Seven letters versus twenty three characters. No competition. If obviate is what you mean then use it. For a general audience however, you're absolutely right.

Occurs to me there's another point in your favor as expressed in Orwell's version of a well-known passage from the Old Testament 'Ecclesiastes' into modern English. He makes the point brilliantly. Aesthetics does come into it!

https://www.orwellfoundation.com/the-orwell-foundation/orwel...

ISO-morphism · on Oct 1, 2022

"If I had more time, I would have written a shorter letter"

marton78 · on Oct 1, 2022

If I had had more time, you mean.

copperx · on Sept 30, 2022

About 15 years ago, at JHU, I heard about an algorithm that detected a writer's gender with more than 90% of accuracy, and the NLP professor considered that problem solved.

matheusmoreira · on Oct 1, 2022

Is this algorithm published?

ben_w · on Sept 30, 2022

I briefly considered making one. My idea was simple — build a Markov chain for each person plus the text with the unknown author, do a dot product of the intersection of the all the chains, pick the author with the best match. Never got around to it. Perhaps this weekend?

dorkwood · on Oct 1, 2022

You can pick me out because I use double hyphens -- the poor man's em dash.

JasonFruit · on Oct 1, 2022

Triple hyphens --- the LaTeX man's em-dash.

bombcar · on Oct 1, 2022

On my phone the failure is manifest — it linebreaks between the hyphens.

geysersam · on Oct 1, 2022

How robust is it though? If you scale to thousands or tens of thousands of people. If it depends on trigrams and uncommon characters it can be countered by incredibly simple measures.

langsoul-com · on Oct 1, 2022

The basic version would be university code detectors that try to match similarities between users code.

That isn't detecting who wrote it, but certainly is the first step.

denton-scratch · on Oct 1, 2022

> most users can be identified

For some value of "identified". However good it is, it can't definitively nail someone as the author of a piece of text.

wsinks · on Sept 30, 2022

I forget that they're called em dashes -- but I also love to use them to offset what I say from other people.

Cupertino95014 · on Sept 30, 2022

They've broken more than one Python script of mine. That'll make you declare everything UTF-8.

pessimizer · on Sept 30, 2022

If you have a compose key, (–) is compose + (-) (-) (.)

Or compose + (-) (-) (-) for (—) which is wider depending on the font.

throwaway0x7E6 · on Oct 1, 2022

yeah. I've read about this many years ago, and it had scared me enough to develop an alternative writing style for my squeaky clean real life persona

I wouldn't doubt for a second that NSA is already using shit like this for their global passive adversary thing

black_puppydog · on Oct 1, 2022

As someone who expects this to have happened years ago (maybe not under the moniker of AI but who cares) I'm more shocked by the fact that they'd publically announce this. The chilling effects of this will be all too real.

If this works it's pretty much the equivalent of a mandatory state ID on every online interaction. If it doesn't work very well, then it's going to be that, plus the risk of randomly being flagged.

As a society I don't think we have anything to gain from it; it's certainly tech that's put of Pandora's box, but that doesn't mean we shouldn't use all our cultural/legal means to prohibit/control it.

Anyone in their basement could build this today, but it only becomes a problem if that person has control over police and intelligence forces. Which means the potential for good regulation is bigger than with other tech.

Edited to add point about regulation

vintermann · on Oct 1, 2022

> The chilling effects of this will be all too real.

Maybe that's the point.

It could also be announced because they plan to use it publicly soon, to "prove" that some person they want to get for political reasons is the same as some evil terrorist/pedophile/serial killer.

christophilus · on Oct 1, 2022

> plus the risk of randomly being flagged

This is 100% going to happen, so my guess is it can’t be authoritative. It’s likely to be used to whittle down a list so that humans can review the results.

Feels like the beginning of a hybrid Minority Report + Enemy of the State movie. I’d watch that. I don’t want to live in that world, though.

teddyh · on Oct 1, 2022

> > plus the risk of randomly being flagged

> This is 100% going to happen, so my guess is it can’t be authoritative.

Why do you think that it won’t be authoritative?

withinboredom · on Oct 2, 2022

How many people write similar to you? I’d imagine it is more than one. Someone even did a show HN to find other accounts that write similar to you, which was entertaining.

teddyh · on Oct 2, 2022

I didn’t claim it would be correct or accurate. I doubted your allegation that it would not be taken as authoritative.

ergodic1 · on Oct 1, 2022

Won't there be a countermeasure? A grammarly for privacy that takes your intent and outputs an un-anonymizable text?

Mordisquitos · on Oct 1, 2022

One relatively straightforward countermeasure would be to feed your text through multiple rounds of machine translation using DeepL, Google Translate, and other similar services and applying corrections when necessary. That should break any personally identifying features in your original text.

That of course has its own risks, if you assume full government access to providers and unlimited surveillance capabilities. If you are fully paranoid you could even run your text through a local instance of (less accurate) Apertium[0] for the first few rounds, or even for the whole process if you find the result is different enough from your original.

[0] https://www.apertium.org/

zimpenfish · on Oct 1, 2022

Wouldn't surprise me if, eventually, it turned out that things like Grammarly were being used to train it in the first place.

kleene_op · on Oct 1, 2022

I can think of a few.

An offensive one would be to post fake offers to trade government/military intelligence on the dark web writen in the style of the politicians backing up this measure, so that they are put on the feds list and thoroughly investigated.

midmagico · on Oct 1, 2022

Yes, there's one already, it just sucks and can't be used from the command-line because the authors were academics. It's called anonymouth.

geysersam · on Oct 1, 2022

> If this works

Seems like an impossible task. There's just not enough signal to noise ratio to make that distinction. Not if the goal is to distinguish among thousands of people.

Among tens or hundreds, might work if they don't take any countermeasures.

classified · on Oct 1, 2022

Even if it doesn't exactly work, like with facial recognition fucking up for non-white skin colors, they will use it anyway and real people will be saddled with real adverse consequences.

Gunax · on Oct 1, 2022

Im not smart enough to know if this is the same thing as what you're saying, but it just doesnt seem like there are millions of writing styles. There might just be too many literary doppelgangers.

PhantomBKB · on Oct 1, 2022

In the name of national security, the agencies do whatever they like, and what's worse, people buy the narrative

lwneal · on Sept 30, 2022

The best protection against this type of de-anonymization is to take measures now, while you still have time, to prevent it. It is possible to change the style of one's writing by using a language model which alters the original text in order to create a new piece with a different style. For example, to translate your text into the grandiose and flowing diction of a bygone era, you might consider the project below.

[1] https://github.com/lwneal/victorianhackernews

MichaelCollins · on Sept 30, 2022

Tools like this probably fool traditional stylometry, but what about de-anonymization tools that find similar ideas, not writing style? Perhaps most people have boring common ideas they got from others, but the sort of people the US Government is most interested in are likely quirkier than most.

black_puppydog · on Oct 1, 2022

Then they'd literally just go after people because they think they might harbor certain ideas?

"sir you are under arrest for maybe thinking about banning dairy production, which is at conflict with national security. We found an anonymous text online that has the same idea."

Why even bother to come up with a pretext then?

iinnPP · on Oct 1, 2022

To provide comfort to the masses that it was fair.

It would be trivial to mask text and change irrelevant opinions to be anonymous again. Though the real problem is related to posing as others.

Reminds me of ilillliillilill. There's a reason people use names like this.

boarnoah · on Sept 30, 2022

There is a case to be made, not just for natural language but code.

AFAIK there is quite a bit of examples from security labs where malware authors aren't necessarily identified but at least fingerprinted based on naming conventions, patterns they use across multiple projects etc...

That sort of fingerprinting could expand to correlating someone's anonymous software projects to other examples of code elsewhere (ex: if they contribute to source available stuff).

re: the example project you mention specifically, it does feel like using tools like that almost as a linter for natural language would be a fingerprint in itself.

EDIT: As far as OPSEC goes, a fun tidbit. A friend of mine identified a PR I submitted anonymously to them, simply because of the style of PR comments I made.

CTDOCodebases · on Oct 1, 2022

I suspect this is why TrueCrypt shut down.

A paper got released in 2015 that claimed 94% accuracy in identifying authors.

I’m sure it would have been quite easy for the NSA to figure out who Satoshi Nakamoto was too considering PRISM was also scooping up everyone’s email.

sn41 · on Sept 30, 2022

I find the examples given in the README to be quite tame for Victorian English. Compare it with the ending lines of A Tale of Two Cities:

"It is a far, far better thing that I do, than I have ever done; it is a far, far better rest that I go to than I have ever known.",

or this from Pride and Prejudice:

"However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters."

ok_dad · on Sept 30, 2022

I someday hope for style transfer for language, so I can write a paragraph and convert it to Michael Creighton's writing style.

KennyBlanken · on Oct 1, 2022

Kennyblanken opens up the fifteen inches of aluminum that is his aging Macintosh Book Pro, produced by a company laughably named after a piece of fruit and formerly headed by a brilliant but somewhat psychopathetic man, now long pushing up the daisies, and taps out the commands to login to his account on the antiquated and increasingly inaccurately named hackernews, bathing in the yellow-orange theme as he stretches across the pale green couch in his small apartment in the big city. Outside, a 2007 Neopolitan Flyer bus roars by, half full with tired commuters.

He expertly massages the worn keys to type out a lengthy and witty response about how he'd imitate the style of a famous novelist whose prose exists mostly to result in the sacrifice of as many innocent trees as possible - demonstrating how to stretch "I'd use Dan Brown's writing style" into nearly two full paragraphs of text.

zimpenfish · on Oct 1, 2022

To be a proper Dan Brown, you'd need to start with "Internet Commentator KennyBlanken" (or some other prefixed descriptor.)

cf [1]

"I think what enabled the first word to tip me off that I was about to spend a number of hours in the company of one of the worst prose stylists in the history of literature was this. Putting curriculum vitae details into complex modifiers on proper names or definite descriptions is what you do in journalistic stories about deaths; you just don't do it in describing an event in a narrative."

[1] http://itre.cis.upenn.edu/~myl/languagelog/archives/000844.h...

doliveira · on Sept 30, 2022

Aren't GANs all about creating both the generator and the discriminator? Seems to me you can also build the "reverser" quite easily.

mike_hock · on Oct 1, 2022

> export OPENAI_API_KEY=hunter2

lel

Animats · on Sept 30, 2022

That was working 20-30 years ago.[1] But not at scale. Mostly it was used on literary works.

[1] https://en.wikipedia.org/wiki/Stylometry

patcon · on Oct 1, 2022

There's a tool called Anonymouth, written by folks at Drexel's PSAL lab, which is intended to help writers with anti-stylometry analysis.

If anyone is looking for an impactful project, Tails OS maintainers and it's author academics seem receptive to bringing it onto that privacy-minded platform:

https://gitlab.tails.boum.org/tails/tails/-/issues/5726

https://github.com/psal/anonymouth/issues/6

https://github.com/spencermwoo/anonymouth

ilamont · on Oct 1, 2022

I suspect it’s how Fake Steve Jobs was unmasked 15 years ago:

The New York Times found Mr. Lyons by looking for writers who fit those two criteria, and then by comparing the writing of “Fake Steve” to a blog Mr. Lyons writes in his own name, called Floating Point

https://www.nytimes.com/2007/08/06/technology/06steve.html

ezekg · on Sept 30, 2022

Don't talk bad about your government, folks. We're going to be entering a new age of technological tyranny.

alexbiet · on Sept 30, 2022

@ezekg, GovAI has detected unlawful talk posted from your account. Your CBDC account is locked for 48 hours.

brippalcharrid · on Sept 30, 2022

Further violations will lead to the balances of your close friends and family being adjusted by -20%, and the balances of acquaintances being adjusted by -5%. Help protect against the threat of misinformation and safeguard your Balance for up to 28 days by reporting anything that you think could lead to harm. Remember, We're All In This Together.

ezekg · on Sept 30, 2022

> We're All In This Together.

Sent chills down my spine.

dmix · on Oct 1, 2022

Move along citizen.

Psychoshy_bc1q · on Sept 30, 2022

bitcoin fixes this.

imglorp · on Sept 30, 2022

The government appears more worried about managing dissent than anything else, like what actual threats people are talking about.

orangepurple · on Sept 30, 2022

@ezekg, GovAI has detected that your post violates community guidelines. Your COVID pass is RED for 72 hours to protect yourself and others.

vmoore · on Sept 30, 2022

If I want to write anonymously, I cycle my text through Google Translate multiple times and keep all the grammatical errors intact. So, English > Italian, and then Italian > French, then back to English.

I also pass it into Hemingway[0] first to make my text lean and non-superfluous.

[0] https://hemingwayapp.com/

tenebrisalietum · on Oct 1, 2022

Fail, now Google and its friends and AI models have a copy of all your original text as well as the new text.

challenger-derp · on Oct 1, 2022

^ In case anyone is interested in learning more, the first part of what vmoore is doing is termed back-translation.

club_tropical · on Oct 1, 2022

Are you satisfied with the level it preserves your "voice"? Humor, style, metaphors, etc?

eftychis · on Sept 30, 2022

https://news.ycombinator.com/item?id=33037319

https://news.ycombinator.com/item?id=33034918

Am I the only one seeing the irony and contradiction here? Not the same people at all -- subgroups at best , but the Director of National Intelligence is part of the administration. Perhaps I am missing something -- feel free to comment -- I am curious what everyone thinks.

Zak · on Sept 30, 2022

I built something like that more than a decade ago to identify alter-egos in an online game from in-game chat. It was reasonably successful and I thought about commercial applications for it, but ultimately decided most things that could be used for are creepy or evil.

I remember hearing DARPA was actively seeking research in the field around that time. In principle, I'm not absolutely against my software being part of the chain of events that leads to the decision to kill someone, but I don't trust the US government (or, realistically, anybody else) to independently verify an identification made by such a system.

I'd be surprised if the three letter agencies aren't using something at least as good as what I wrote by now.

copperx · on Sept 30, 2022

How sophisticated was your system? It sounds like you used cutting edge NLP techniques at the time.

Zak · on Sept 30, 2022

I'd describe it as fairly simple. It was just a classifier where each account name was a category: there was no fancy NLP. It used a single feature type and an algorithm from a well-known family. I don't want to say what either was lest I further proliferate the technique.

I cross checked using statistically improbable words, which helped confirm or exclude weak matches.

PointA2B · on Sept 30, 2022

Content marketers in the digital marketing space commonly put blog posts through "spinners" that take your text and modify it through replacing words/phrases with similar equivalents. This lets you take one article and turning it into 5-10+ unique ones, even though they still discuss the same things. It would be a shame if a service like this was marketed towards those interested in privacy, it would probably break this entire system...

ElementaryElk · on Sept 30, 2022

I’ve found plenty of articles that seem to be run through these spinners, but hand made corrections are likely to be necessary (unless it can be automated with ML for example) as you can almost always tell that something is odd based on context lacking word choices

PointA2B · on Sept 30, 2022

And thats exactly what newer programs do, look at AppSumo and its practically all of them. The older gen simply used a giant dictionary, then picked a random option from the list of acceptable choices.

ElementaryElk · on Sept 30, 2022

Ok, I wasn’t familiar with the newer gen of spinners

dathinab · on Sept 30, 2022

Writers plan to develop AI which makes their text un-umaskable.

ta988 · on Oct 1, 2022

That's indeed the best defense. But the purpose of that one is to catch the easy ones.

swayvil · on Sept 30, 2022

What we need is a speaking style anonymizer. Like a language translator except it translates your text into some kind of stylistic uniformness.

We're flexible on that format. Anything legible and relatively easy to translate into. Call it ANONSPEAK.

It will probably be, aesthetically, horrible.

hoosieree · on Sept 30, 2022

There exists such a practice, and it is known as academic publishing.

Every verb is done in passive voice, punctuation is added - wherever possible - to make sentences appear more complex than they need be, and of course there is an effervescent use of sesquipedalian terms where shorter similes would otherwise suffice.

__jambo · on Sept 30, 2022

Seems very doable given the state of google translate.

Trouble is if you are a revolutionary leader of some kind you are probably going to be saying new things that no-one else talks about - which renders both anonspeak and the ai detection kind of redundant.

I guess the application for this then is in the interim to stop people or online groups becoming revolutionary by tracking and deradicalising them with targeted manipulation.

jongjong · on Oct 1, 2022

I'm sure this has already existed for years. The AI technology needed to implement this has exited for years. You just need to do web crawling and keyword and link matching.

You don't even need a 'writing fingerprint', you just need to parse comments which reveal identifying information such as 'I participated in project x', 'I taught at university y', 'I invested in startup z'... then when you combine all the identifying information, you can narrow down the pool of possible matches to a single person.

You could probably do it with just basic text matching, no AI required.

iinnPP · on Oct 1, 2022

I don't go around as an anonymous user and post a bunch of identification information.

Kukumber · on Sept 30, 2022

They demonized China for doing things like that

Now they'll copy China

I find it very funny

xani_ · on Sept 30, 2022

"How dare they do it before us"

USA next decade:

"Posting bad things on twitter reduces your credit score"

hardnose · on Sept 30, 2022

Credit scores are not determined by the government. If a credit ratings agency took that step, it would harm them because Twitter posts are unlikely to represent a meaningful variable when predicting someone's creditworthiness.

Don't confuse that with "social credit" systems, whereby China prevents you from riding trains if you say something naughty.

Kukumber · on Sept 30, 2022

government, FANG, it's all the same, the same group of lobby talking and coordinating with each other, hiring CIA agents and bunch of 'friends'

https://mronline.org/2022/07/27/national-security-search-eng...

that's why the US wants to ban TikTok asap, because they don't want china to be able to do what they are doing for decades too

> it would harm them because Twitter posts are unlikely to represent a meaningful variable when predicting someone's creditworthiness.

people get fired and arrested already for posting stuff on twitter, in both the US and Europe, so no, it's not just just a "twitter moderation" thing

Ask yourself why they are allowed to exist and still operate despite unable to grow and are loosing money for years, talk about anti-competitive practices, unless it's in reality a government body in disguise

hardnose · on Sept 30, 2022

> government, FANG, it's all the same

If you believe that, then you also support efforts to force big tech to respect freedom of political speech, yes?

Kukumber · on Oct 1, 2022

i support nothing, i only am an observer and analyst

egberts1 · on Sept 30, 2022

Or that VP of Apple quoting from a movie called “Arthur” at an auto show resulting in him being fired.

It has begun right here in the United States and we just are oblivious to these new dastardly form of social credits.

astrange · on Oct 1, 2022

Saying "they" are oppressing an Apple VP is an interesting contrast to the next post over which says Apple (FAANG) is part of "they".

egberts1 · on Oct 1, 2022

Except that it was the bevy of Apple VPs putting pressure on this ouster.

wahnfrieden · on Sept 30, 2022

(citation needed - that’s not actually implemented in china at scale, though it’s a convenient talking point in the west)

challenger-derp · on Oct 1, 2022

> that's not actually implemented

That's my understanding as well.

> it's a convenient talking point in the west

When online comments get worked up about The Social Credit System, a key thing I believe they're trying to do is spread awareness of how disturbing it is that a government is even considering such a thing that, as we understand it, is closely related to being a core technology in an authoritarian dystopia.

While it's not implemented at scale, the unnerving fact is that govt policy makers did a careful enough take on a social credit system to decide that it was worthwhile investing (probably non-trivial amounts of ) money and resources into exploring it and did eventually reach a point where they were a handful of steps short of wide-scale implementation.

wahnfrieden · on Oct 1, 2022

Ok. If we’re talking speculative planning, the US also considers bringing privatized credit score system under government control instead, the very thing OP is criticizing. I have no problem with criticizing authorities but the us vs them look how much worse it is in red china god bless our freedoms here angle is tired bootlicking too

egberts1 · on Sept 30, 2022

It is called 社会信用体系 .

Try and keep up.

https://en.m.wikipedia.org/wiki/Social_Credit_System

wahnfrieden · on Sept 30, 2022

Try and read.

Where exactly does it say it’s ever progressed beyond trials and announcements ie implemented at scale - oh it doesn’t

egberts1 · on Sept 30, 2022

seek and ye shall find

https://nhglobalpartners.com/china-social-credit-system-expl...

wahnfrieden · on Sept 30, 2022

thank you. the most specific citation I could find in your link was this, regarding the 80% rollout statistic:

>As of December 2020, more than 80 percent of all the provinces, autonomous regions, and municipal cities had issued or were preparing to issue local credit laws and regulations.

egberts1 · on Sept 30, 2022

I’m quite sure they are trying to automate this as well.

wahnfrieden · on Sept 30, 2022

your citation still doesn't say much, just that most are at least still "preparing" or have rolled out a limited pilot, which is what I said originally

egberts1 · on Oct 1, 2022

And

https://safeguarddefenders.com/sites/default/files/pdf/110%2...

egberts1 · on Oct 1, 2022

https://safeguarddefenders.com/en/blog/230000-policing-expan...

mike_hock · on Oct 1, 2022

> They demonized China for doing things like that

> Now they'll copy China

Rinse and repeat.

hardnose · on Sept 30, 2022

Developing a technology that can do that doesn't infringe on anyone's liberties.

Failing to develop that technology, leaving it to China or some other authoritarian state to do, would be more likely to harm liberties, wouldn't it?

Seems like you're just "damned if you do, damned if you don't"-ing, no offense.

pessimizer · on Sept 30, 2022

> Developing a technology that can do that doesn't infringe on anyone's liberties.

Why are you making this up?

bitL · on Sept 30, 2022

"Hey Joe, run that article of yours through the anonymizing AI first!"

michaelwww · on Sept 30, 2022

I'm old so I've been planning on this for awhile. I have a folder that contains all my personal data: photos and videos, journal, all my saved social media posts, all my emails and all my anonymous handles leading to everything I've ever written online. My thinking is an AI could create a reasonable facsimile of myself that my descendants could have a conversation with. I think it'd be better than a autobiography since Joyce Carol Oates convinced me by something she tweeted that no one reads autobiographies, not even close family, unless you are famous.

oneoff786 · on Sept 30, 2022

If I had a bot that replicated my great great ancestor I’d probably get bored quickly and then try to prod it into revealing it’s deeply outdated and inappropriate social views

michaelwww · on Sept 30, 2022

You're right, the novelty would wear off quickly, but it doesn't hurt anything for me to organize a data set about myself just in case

gonzo41 · on Oct 1, 2022

Ok, so would the simplest way to solve this be an open source AI that can normalize specific text / writing to force it to be too similar to all other text so it's less identifiable.

Such as, you write a comment or an essay, feed it in and it just dumps all your styles, idioms etc and makes all your stuff sound bland and normal.

Seems like it would only work if you had a targeted population and a writing / style sample ffor a lot of people.

blakesterz · on Sept 30, 2022

There's a great book on this kind of thing, Author Unknown: On the Trail of Anonymous, by Don Foster. This was written way back in 2000.

People have been doing this for decades.

runjake · on Sept 30, 2022

> People have been doing this for decades.

The key point here is that it's AI-driven and at scale -- in other words, mass surveillance.

Personally, I see this as part of the US IC's mission, despite the potential domestic detriment.

pabs3 · on Oct 4, 2022

Some other posts about Stylometry:

https://www1.icsi.berkeley.edu/~sadia/papers/oakland2014-und... https://news.ycombinator.com/item?id=18328270 https://serhack.me/articles/unveiling-anonymous-author-stylo... https://news.ycombinator.com/item?id=30571932

Waterluvian · on Sept 30, 2022

Can’t I just feed my missive into an AI that reshapes the language but retains the data?

bergenty · on Oct 1, 2022

Sure but your output will probably have the same signature as well unless you keep updating the model each time. It’s already hard to do the first thing, almost no one is going to do step 2 for a comment on the internet.

causi · on Sept 30, 2022

Wouldn't it be more accurate to say it links anonymous writings together? If you don't have any writings under your real name there's nothing it can do except indicate two pseudonyms belong to the same person.

andy_ppp · on Sept 30, 2022

You’re assuming they don’t have copies of every email you ever sent…

unethical_ban · on Sept 30, 2022

So we'll have AI that can mask writer's identity in about three months.

DharmaPolice · on Sept 30, 2022

If precision isn't super important you can already run your text through machine translation into another language and then back again. Spammy content sites already seem to do that to avoid copyright detection.

yamtaddle · on Sept 30, 2022

But it'll erase all my anachronistic grammatical and style preferences! Then what's even the point of writing?

(it's rôle and &c., goddamnit)

lm28469 · on Sept 30, 2022

Text to text AI, it should be relatively easy to do too

MengerSponge · on Sept 30, 2022

Basically exists already, right? A lot of college kids are using it for their writing assignments.

ezekg · on Sept 30, 2022

It'll be AIs all the way down.

club_tropical · on Oct 1, 2022

Presumably they can already unmask 99.99% of high-value persons of interest already, this sounds like they want a tool to unmask all the lower-value anons for the journalists to pick off.

astrange · on Oct 1, 2022

If it's an actual ML model then someone can leak it and develop countermeasures. If it's a contractor who claims to be AI but is actually some guy in a room in Maryland, that's not possible but also the results wouldn't be very reliable.

Of course, the article just says they're planning to do this. It doesn't say it's going to work, and our closest examples, forensic science like handwriting and blood spatters and polygraphs and all that, generally don't actually work.

Bender · on Sept 30, 2022

Here [1] is a previous discussion on this as well.

[1] - https://news.ycombinator.com/item?id=33009545

iinnPP · on Oct 1, 2022

If you can use this to identify anonymous people, you can similarly use a tool to change your own text to appear as though someone else typed it.

Then you get to accuse anyone of anything.

StanislavPetrov · on Oct 1, 2022

What's almost as bad as the increased powers of government surveillance is thinking about the people who will be jailed, waterboarded and/or shipped to Gitmo for horrific tortures because our benevolent overseers in government make a mistake and think they are someone else due to the flawed algorithm fingering the wrong person.

frozenlettuce · on Sept 30, 2022

The thing is, AI is a good mask for a "backend process that you don't need to explain how it works". Assuming that the US government already has private conversations on multiple content and messaging platforms, this AI will provide the perfect excuse to connecting a blog post with a given id in a process.

philipkglass · on Sept 30, 2022

My first cynical take was that this will be used for "hunch laundering." There could be no indication that user A is an alias for user B, other than someone's hunch, but getting a computer to say that they match might be good enough to get a warrant when someone's hunch wouldn't be. It would be similar to having drug sniffing dogs affirm their handlers' feelings.

jgalt212 · on Oct 1, 2022

I wonder if we'll finally find out if Harper Lee or Truman Capote wrote To Kill a Mockingbird.

janejeon · on Oct 1, 2022

Hang on, is this rooted in the same technique (obviously they didn't have "AI"/technological processes like this back then) the FBI used for Ted Kaczynski (in which they literally analyzed his text to figure out who he was)?

graderjs · on Oct 1, 2022

Surely folks already have this? Authorship attribution has been a thing in academia for at least 10 years

Tho I guess I could just use gpt3 and say hey can you write this for me in your own words or summarize. Or a few rounds of google translate roulette

spoonjim · on Oct 1, 2022

This already exists. There was actually someone who posted something on HN that would use your public profile to find your anonymous Reddit username and it found mine. That’s when I stopped anonymously using Reddit.

achenatx · on Oct 1, 2022

i thought they had done this with the federalist papers to determine which founding fathers wrote which papers (1996)

https://www.jstor.org/stable/30204514#:~:text=They%20were%20....

This is 2017 https://towardsdatascience.com/hamilton-a-text-analysis-of-t...

club_tropical · on Oct 1, 2022

I always wondered about voice deanonymization as well. I feel like between accent, gap between words, tonal variation, and rhythm, there's enough there to fingerprint speech, even with voice changers.

submeta · on Oct 1, 2022

Being able to be anonymous is actually a valuable thing. Think of people in oppressive regimes.

I believe we need countermeasures, so we need to create algoriths that add fuzzyness to our writing to protect privacy.

bergenty · on Oct 1, 2022

But also think of the anonymous oppressors

IYasha · on Oct 3, 2022

they're usually regime-controlled, so it's ok.

allisdust · on Oct 1, 2022

This seems to be a particularly bad time to implement this dumb idea. If it does become successful and prevalent, who do they think is going to be the writers apart from gpt3 clones.

whitexn--g28h · on Oct 1, 2022

So South Park episode?

RadixDLT · on Oct 1, 2022

I think they will secretly use this service https://www.typingdna.com/

hoosieree · on Sept 30, 2022

Creepy factor aside, a similar tool for attribution would be very useful for content creators (or copyright holders) currently worried about stable diffusion.

midmagico · on Oct 1, 2022

Well, luckily it turns out that literally every person who attempts it can fool humans' best stylometric fingerprinting, so, meh?

culi · on Oct 1, 2022

Wouldn't it be comparatively trivial to train an AI that can minimally modify your writing to be immune to this other AI? Basically a GAN

Zigurd · on Sept 30, 2022

Moreover, a stylometry analysis will reveal when I started to limit myself to one "moreover" for every two or three chapters.

xani_ · on Sept 30, 2022

I'm sure it will not be used in malicious way

golem14 · on Oct 1, 2022

Of course, there now is a business opportunity to make style obfuscating AI transformers ...

glitchc · on Sept 30, 2022

We can get around this by writing something and sending it through GPT-3 for “style correction”.

fareesh · on Oct 1, 2022

The US inners bossmang want unmask but we know better sasa ke for beltalowda

Perhaps it is time for Creole

throwamon · on Sept 30, 2022

Didn't they already use something like this to supposedly unmask Satoshi Nakamoto?

Grimburger · on Sept 30, 2022

Nakamoto Satoshi has not been unmasked despite the very short list of people capable/interested in creating what he made.

Stylometric analysis did suggest a single person on that list. The easier thing for governments to do at the time would have been to just spin up a node in the first year and look at the IP addresses.

He had no desire to become known back then and likely never will. It's only more dangerous now compared to the threat before of being locked up like the LibertyCoin guy (who just got released a year ago).

NS is happy to stay in the shadows, nearly everyone respects that decision especially in a world of crypto scams and ponzis. Surprised they never linked the domain name purchase to him though.

ortusdux · on Sept 30, 2022

I wonder how things like Gmail's smart auto-complete would affect these efforts.

meltyness · on Oct 1, 2022

As long as it doesn't run a foul of reasonable expectation of privacy.

PointyFluff · on Oct 1, 2022

They already do this, have since at least WWII.

It's known as your "hand".

paulpauper · on Oct 1, 2022

One could possibly insert random invisible characters to fool this

IYasha · on Oct 3, 2022

Plot twist: anonymous writer is also AI.

enviclash · on Oct 1, 2022

Zipf-based maths could already say lots.