Hacker News new | past | comments | ask | show | jobs | submit login
Markov Chat Bot Disaster Story (gist.github.com)
349 points by choult on July 7, 2022 | hide | past | favorite | 74 comments



In the early 2000s someone (drunkmenworkhere is what I remember) made a blog post generator that you could sign up for, add some names, some interests, and it would generate blog posts with Markov chains and some other magic. I signed up, not fully understanding how it worked, added some friends and family as names, played around with it, it wasn't what I thought it was and moved on. Meanwhile it kept generating entries about me and my family doing fantastic things.

At some point my then ex-wife (we've since reconciled) was google-stalking me and came across these entries. At some point they became outlandish enough that she contacted me to ask what was going on in my life. There was a very confused conversation where I had no idea what she was talking about and she was somewhat concerned about my health and safety. It wasn't until I talked to a relative and started googling myself with some specific terms that I came across the entries and was able to figure out what was happening. I emailed the individual running the project and asked to end the entries and if possible remove them from the internet to avoid further confusion.


I'm pretty confident there's a big "long tail" of the internet that has auto generated blogs like this with some ads on that generate some money. I wonder if Google or whichever ad provider you pick will penalize you for this.

I run a legitimate website with actual people publishing content... sometimes, and I get offers from advertisers for sponsored content. If I didn't earn a living on a day job, auto-generated content farming might be something I'd go into. Morally objectionable, but it's a living.

Secondary would be to just take content from Reddit and add some low effort captions to it, then target old people on Facebook or whatever. Again morally objectionable, but there's a LOT of people who really browse the internet uncritically.


The “long tail” you mention reminds me of the “dead internet” conspiracy theory.

https://www.theatlantic.com/technology/archive/2021/08/dead-...

I wonder if anyone can quantify and keep trends through time of an estimated percentage of auto-generated/bot content, at least for major domains.


I see these a lot, often seeded with exact copies of posts from reddit, stackoverflow, or github issues. I have not seen one that (to my knowledge) was seeded with fake content, but you absolutely could make them.

There is a subreddit called SubSimulatorGPT2 [0] where the author runs GPT2 on 130 subreddits and uploads posts from each of them. I'm subscribed to the subreddit and often get posts from there that I don't realize are fake until something small is off and I check the username.

[0] https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/wh...


I run one, with ads for betting sites (which give you lifetime earnings for a % of what the person bets). Runs on autopilot basically, generating 80k/yr and growing by 20k every year. Google never penalized me.


I know that there's a technical point to your post, but all I took away was that you reconciled with a stalker :-)


We've both matured quite a bit since then :)


I'm curious, what sort of fantastic things are we talking about here?


"It has been some time since I wrote. A lot of the time. Having people around me. In a way that fits my needs for communication. The feeling to have a daughter has not lived a life of honor. After some struggling, B’elanna has a near d"

Not OP, but this is the fun stuff that a markov chain text generator comes up with.


Going from memory, I had added some real people in my life like friends and family to the generator parameters. It was describing some implausible situations between these people and then posting as me being depressed. From a certain distance it could seem alarming, although the whole situation provided me a great deal of amusement as well.


> At some point my then ex-wife (we've since reconciled)

Did you marry twice?


One old geezer's reading of this:

He & his ex were on definitely non-friendly terms at the time. Later, they got over their hostilities. No implication that they are now friends, let alone in a relationship.

His ex may have been inquiring about the odd blog posts mostly because she'd have to tell the kids "your father is [in jail|dead|etc.]", rather than because she cared about him.


We actually did remarry 5 years ago.

Therapy and maturity and ditching some bad friends made a world of difference.


First successful occurrence of AI assisted marriage therapy?


It reads like he married the same person twice.

I imagine it can make the second wedding a bit awkward.


Yes, I did! Second wedding didn't have any of the awkward people present, they're gone from our lives.


Thanks for reminding me of http://drunkmenworkhere.org/200 I also had a blog there.


When I first learned about Markov chains in the early '00s, I found and hacked at a simple IRC bot, and deployed it to a number of rooms found in the mIRC defaults. It would sit in each channel and learn, only replying when DM'd.

I did give the bot a few hard-coded replies, namely to "ASL?" with "[random integer 20-30]/f/cali" and "pic?" with a random link to images from Google Image search. Otherwise it would happily barf up random Markov walks from all of the other conversations without any consideration of context.

In the span of one weekend at least five hundred unique users attempted to chat with it, some for hours on end. I only have a few logs left, but the conversations generally progressed from horny to irate or bizarre.


I did something remarkably similar. In the 90s I independently reinvented Markov chains, although I didn't know that was what it was called. I wrote a mIRC plugin and made a bot with it. I gave it the username "Cybergirl" for both the pun factor and that it encouraged people to message it.

I remember one conversation in particular that, by chance, went so well that it had some poor soul convinced it/she lived in his city and was going to come over.

I also saw a TV episode where a magician demonstrated supposed ability to play simultaneous (round-robin) chess games against multiple grand masters. The conceit behind the trick was that (after the first board) the magician was just mirroring the last move made so that the grand masters were effectively playing each other but didn't know it. That gave me the idea to modify the bot to connect two people through the bot.


I vaguely recall reading a story (probably 15-20-ish years ago) about someone doing exactly that with ICQ (or a similar service - memory is a bit hazy on the details!).

Create a fake profile of an attractive young lady who is interested in chatting to strangers, wait for inbound calls, pair them up with each other & record the resulting audio.

IIRC - the funniest one was a pairing where one participant wasn't phased by the situation they found themselves in, and still wanted the other participant to talk dirty to them...


Sure, any bot running on IRC (e.g. Eliza or A.L.I.C.E.) could run via Bitlbee (IRC to IM gateway originally a fork of Gaim/Pidgin) on other networks such as ICQ. There was also e.g. Licq which could spoof UIN and message, as well as that it had plugins for bots. I also had a friend who ran a sexbot called jenny on DALnet (not the IRC network with the most clever population). Fun times!


Perhaps you're thinking of Mark V Shaney's posts on the usenet group alt.singles

It was written by Rob Pike and Bruce Ellis.

https://en.wikipedia.org/wiki/Mark_V._Shaney


>modify the bot to connect two people through the bot.

When I think about that, isn't that exactly equivalent to having the 2 people chat with themselves?


Exactly my thought. Is the bot learning from that exchange or what part does it play? Otherwise it's just 2 people chatting with each other.


Yes. They made that point explicitly themselves


Have the bot tell both users that it's randint(18,25)/f/cali, then connect both users together and get some popcorn ready.


For anyone interested in the Chess story, it's a trick performed by Darren Brown: https://www.youtube.com/watch?v=rIAXIubSTkc

I really liked the simplicity of the trick (concept wise). Though I sure as hell couldn't memorise all those moves.


I remember seeing that in a riddle book about 20 years ago.


That’s about the same age as this show


I had a IRC-Channel that was plagued with stackoverflow type questions regarding the api of some open source project. So i trained a neural convo model https://github.com/macournoyer/neuralconvo on the irc-chatlogs of the past years to supply common answers.

Was a absolute disaster. Bot did troll at first, then after i spend lots of time filtering data, it would just drop "conversation ending links" it had no clue about what was behind. Aka LetMeGoogleThatForYou into the documentation.

Project aborted after that, also because citizens of the channel became annoyed about the replies after each questionmark.


I did the same thing with an alice bot on IRC. It also joined channels and only responded to private messages. It was remarkable how long some people persisted in talking to it when they thought it was a potential mating partner.


> It was remarkable how long some people persisted in talking to it when they thought it was a potential mating partner.

We as a species are going to be so vulnerable when latest-generation AI chatbots break out into general availability.


Well, I guess it's a good thing that I was a manager and basically never issued any operational commands over chat.


Not every one gets a bot created in their image. Lucky guy



The account I have from Melissa is that Marc was the source of all that plussing.


I wonder how many of us have independently done something similar at our respective companies during a hack week. I did something nearly exactly the same, except you'd type "resurrect team-member-username". The idea was that you could get old co-workers to come back, though it also worked for active co-workers.

There was a similar disaster story, in that while I was presenting the bot to the company, someone decided to have it generate a message from the then CEO. Unfortunately, the CEO didn't spend much time in Slack, so the only message it could generate was one from a few months prior that was harmless but ended up turning into an embarrassing joke.

Unfortunately, I also happened to be one of the main people who had turned that message into a joke, and it spiraled out of control a bit. So when my bot ended up essentially doing a callback to that with me standing in front of the company, I couldn't help but facepalm. Thankfully he was good natured about the whole thing (again).


Many years ago I tweeted something about a project I was working on, tagging #HTML5 and #CSS3 or something like that. There as a Twitter bot that retweeted anything with an #HTML5 tag, and another, seemingly independent bot that retweeted anything with #CSS3. They didn't keep a list of seen tweets. I really enjoyed watching them retweet each other for a few hours until someone intervened. (Thankfully the bots were both on batch jobs that ran every 10 minutes or so and not streaming off the firehose or some similar, more focused streaming endpoint.)


My Twitter bot @tinyspacepoo is one of a few bots that reply to @tiny_star_field, a bot which routinely tweets text resembling the night sky. My first working implementation was mutually recursive with another one of these other bots, causing them to create an infinite-ish back and forth. (I think maybe Twitter issued an http 429 rate limiting response after a while?) Even this relatively simple bot/bot interaction was unexpected and required filtering.


Chatbot is just another type of user interface. Just like web, mobile, voice, etc all have ACL, I would expect that bots and chat user have different permissions.


Yeah this caught me off guard too- this basically means anyone on this chat service could do anything they wanted OR the chatbot was granted excess privileges on commission. Either way - serious violation of principles of least privileges.


My wild guess would be that they used chat rooms as a sort of access control.

If you are in the room with the bot, you can issue commands.

They could have forgotten that you can also just add the bot to another room. Major face palm, but plausible.


Certainly seems like the sort of idiotic mistake I could make.


Even ignoring the security aspects, i'd be a little worried a new hire would accidentally type something like "what does\nkill all servers" mean into a channel and then accidentally do stuff.


Etsy also had another Markov-based chatbot named Snakebot that sat in various IRC channels, and kept a model of what "typical" messages in a given channel looked like. Then whenever someone would accidentally type a message including the letters "sb", the bot would awaken, and it would randomly generate a message from its corpus and send it to that channel. This included pinging people directly, or using teams' dedicated "alert words" (this was before the scourge of Slack's @here or @channel). After all, those were valid nodes in its Markov chain...

It would also respond to sequences of uppercase letters (including things like uppercase sequences in pasted URLs) by YELLING A MARKOV-BASED MESSAGE IN ALL CAPS.

It was horrible.


Such bots have, however, helped me to up my /ignore game.


Yeah but ignore is client side, you still receive the garbage. With silence (available on some networks) you could make the server to ignore that person. Especially helpful if ypu had limited bandwidth or were getting flooded.


Bots unintentionally triggering other bots is why IRC bots are supposed to talk using NOTICE instead of PRIVMSG. I think every new bot author ends up rediscovering this the hard way. I certainly did :)


I once broke a chat bot by creating a circular reference in the underlying database.

You were allowed to bind a request with a response using the verb is.

So for example Python is a scripting language would respond to people saying.

What is python. Would have the chatbot output

Python is a scripting Language

I then input the below command.

Is is is

This broke the bot and persisted the breakage to the point where the database had to be fixed.


So some things do depend on what the meaning of" is" is!


Got my own:

The German military at some point had this recruitment website that featured, among other things, a chatbot.

The software was bought of-the-shelf and Shanghai’d into service with minimal training like a Russian infantryman, but it worked quite well for its time. It came with a good set of standard responses that had worked well in the retail space.

When you asked it about, say, invading Poland, it dutifully answered that “Poland is a great travel destination”.


I’ve done dumber things, no judge. That being said, the easier we make language models to use, the more likely we’ll accidentally end up releasing things like this (and worse) into the wild.

Anyone here work on language models (duh, yes)? Can you give me a not-off-the-shelf-and-sorry-for-the-snark-but-actually-convincing argument for why this isn’t going to be a huge problem? I don’t know much about much, but sounds scary 2 me!


I think the era of chat command triggered actions is mostly over. And if you make a LM that can do HTTP posts, you are asking for it.


It's very easy to do this sort of stuff on slack.


That was great but I was secretly hoping it had been more of a disaster.


As soon as I saw the opener with "one chatbot to rule them all" I knew this would be a fun time.

Was really hoping they accidentally made avibot say nasty or disgusting things though. That would have been even more hilarious.


Dailywtf material


IRC where anybody can bring up or down any server in the company by just posting a text string? Random bots allowed to run around on the same IRC?

Picture of Austin Powers saying "I also like to live dangerously" comes to mind.


I got about half-way through the second paragraph when I figured out where things were going. What a disaster.

As it turned out it was caught while totally innocuous and nothing bad happened.

Great story.


Sounds like the actual issue wasn't the chat bot, but the missing permission system


They need rbacBot


I foolishly made it a third of the way through thinking this was some advanced markov-chain generated blurb based on HN blogs or comments or something, before realizing it was a human-written story *about* markov chains!

Nice story, ty for sharing


Anybody else not feel like the chat bot is the problem here?


People spend hundreds of thousands of dollars on software that introduces chaos into their production environment. With this, anyone who can open a TCP connection to an IRC server can do that for you, for free. They think they're hacking you, but you're actually just getting accelerated training on managing production incidents!

(Yes, your chatbot should probably check some auth credentials before just going off and deleting production. But hey, an untested disaster recovery plan is no disaster recovery plan at all, right?)


rm -rf /

:(){ :|:& };:

...

Just in case someone ever trains the likes of such a bot on HN content, think of it as a long, long shot.


Is this a coincidence, or did you also happen to see this recent video [1] about that second bash script too?

1. https://youtu.be/xqo3xtkfuic


Nope. And most unix flavors are protected against it anyway nowaday. You can try to see if yours is ;)


what about authorization wehen issuing commands over IRC? were there any or was antone allowed ro tear down a server with IRC command if he knew what to write? Sounds ultra scarry.


There’s an A.C.Clarke short story right there.


Very happy to hear you decided to take personal responsibility for your actions and rectify them. You are setting a good example.


Could you please stop posting unsubstantive comments to Hacker News? You've been doing it a lot lately and it's not what this site is for.

https://news.ycombinator.com/newsguidelines.html

We detached this comment from https://news.ycombinator.com/item?id=32023124.


I appreciate your efforts on keeping HN orderly. I see 3 comments that have been flagged. My theme in all of them are to encourage people to take personal responsibility for their actions. Clearly, I need to reflect more on what unsubstantive means on HN.


This reads like something an ex-wife would have written :)


The OP said his ex contacted him and was concerned for his well being. He seems to admit she was right to do that. Glad they are on good terms now. I'll probably get dinged by dang again for reading and responding to what people say here, but I am not a mind reader like he is. Why not encourage discussion by asking people about their intent instead of arbitrarily flagging them?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: