In the early 2000s someone (drunkmenworkhere is what I remember) made a blog post generator that you could sign up for, add some names, some interests, and it would generate blog posts with Markov chains and some other magic. I signed up, not fully understanding how it worked, added some friends and family as names, played around with it, it wasn't what I thought it was and moved on. Meanwhile it kept generating entries about me and my family doing fantastic things.
At some point my then ex-wife (we've since reconciled) was google-stalking me and came across these entries. At some point they became outlandish enough that she contacted me to ask what was going on in my life. There was a very confused conversation where I had no idea what she was talking about and she was somewhat concerned about my health and safety. It wasn't until I talked to a relative and started googling myself with some specific terms that I came across the entries and was able to figure out what was happening. I emailed the individual running the project and asked to end the entries and if possible remove them from the internet to avoid further confusion.
I'm pretty confident there's a big "long tail" of the internet that has auto generated blogs like this with some ads on that generate some money. I wonder if Google or whichever ad provider you pick will penalize you for this.
I run a legitimate website with actual people publishing content... sometimes, and I get offers from advertisers for sponsored content. If I didn't earn a living on a day job, auto-generated content farming might be something I'd go into. Morally objectionable, but it's a living.
Secondary would be to just take content from Reddit and add some low effort captions to it, then target old people on Facebook or whatever. Again morally objectionable, but there's a LOT of people who really browse the internet uncritically.
I see these a lot, often seeded with exact copies of posts from reddit, stackoverflow, or github issues. I have not seen one that (to my knowledge) was seeded with fake content, but you absolutely could make them.
There is a subreddit called SubSimulatorGPT2 [0] where the author runs GPT2 on 130 subreddits and uploads posts from each of them. I'm subscribed to the subreddit and often get posts from there that I don't realize are fake until something small is off and I check the username.
I run one, with ads for betting sites (which give you lifetime earnings for a % of what the person bets). Runs on autopilot basically, generating 80k/yr and growing by 20k every year. Google never penalized me.
"It has been some time since I wrote. A lot of the time. Having people around me. In a way that fits my needs for communication. The feeling to have a daughter has not lived a life of honor. After some struggling, B’elanna has a near d"
Not OP, but this is the fun stuff that a markov chain text generator comes up with.
Going from memory, I had added some real people in my life like friends and family to the generator parameters. It was describing some implausible situations between these people and then posting as me being depressed. From a certain distance it could seem alarming, although the whole situation provided me a great deal of amusement as well.
He & his ex were on definitely non-friendly terms at the time. Later, they got over their hostilities. No implication that they are now friends, let alone in a relationship.
His ex may have been inquiring about the odd blog posts mostly because she'd have to tell the kids "your father is [in jail|dead|etc.]", rather than because she cared about him.
When I first learned about Markov chains in the early '00s, I found and hacked at a simple IRC bot, and deployed it to a number of rooms found in the mIRC defaults. It would sit in each channel and learn, only replying when DM'd.
I did give the bot a few hard-coded replies, namely to "ASL?" with "[random integer 20-30]/f/cali" and "pic?" with a random link to images from Google Image search. Otherwise it would happily barf up random Markov walks from all of the other conversations without any consideration of context.
In the span of one weekend at least five hundred unique users attempted to chat with it, some for hours on end. I only have a few logs left, but the conversations generally progressed from horny to irate or bizarre.
I did something remarkably similar. In the 90s I independently reinvented Markov chains, although I didn't know that was what it was called. I wrote a mIRC plugin and made a bot with it. I gave it the username "Cybergirl" for both the pun factor and that it encouraged people to message it.
I remember one conversation in particular that, by chance, went so well that it had some poor soul convinced it/she lived in his city and was going to come over.
I also saw a TV episode where a magician demonstrated supposed ability to play simultaneous (round-robin) chess games against multiple grand masters. The conceit behind the trick was that (after the first board) the magician was just mirroring the last move made so that the grand masters were effectively playing each other but didn't know it. That gave me the idea to modify the bot to connect two people through the bot.
I vaguely recall reading a story (probably 15-20-ish years ago) about someone doing exactly that with ICQ (or a similar service - memory is a bit hazy on the details!).
Create a fake profile of an attractive young lady who is interested in chatting to strangers, wait for inbound calls, pair them up with each other & record the resulting audio.
IIRC - the funniest one was a pairing where one participant wasn't phased by the situation they found themselves in, and still wanted the other participant to talk dirty to them...
Sure, any bot running on IRC (e.g. Eliza or A.L.I.C.E.) could run via Bitlbee (IRC to IM gateway originally a fork of Gaim/Pidgin) on other networks such as ICQ. There was also e.g. Licq which could spoof UIN and message, as well as that it had plugins for bots. I also had a friend who ran a sexbot called jenny on DALnet (not the IRC network with the most clever population). Fun times!
I had a IRC-Channel that was plagued with stackoverflow type questions regarding the api of some open source project. So i trained a neural convo model https://github.com/macournoyer/neuralconvo on the irc-chatlogs of the past years to supply common answers.
Was a absolute disaster. Bot did troll at first, then after i spend lots of time filtering data, it would just drop "conversation ending links" it had no clue about what was behind. Aka LetMeGoogleThatForYou into the documentation.
Project aborted after that, also because citizens of the channel became annoyed about the replies after each questionmark.
I did the same thing with an alice bot on IRC. It also joined channels and only responded to private messages. It was remarkable how long some people persisted in talking to it when they thought it was a potential mating partner.
I wonder how many of us have independently done something similar at our respective companies during a hack week. I did something nearly exactly the same, except you'd type "resurrect team-member-username". The idea was that you could get old co-workers to come back, though it also worked for active co-workers.
There was a similar disaster story, in that while I was presenting the bot to the company, someone decided to have it generate a message from the then CEO. Unfortunately, the CEO didn't spend much time in Slack, so the only message it could generate was one from a few months prior that was harmless but ended up turning into an embarrassing joke.
Unfortunately, I also happened to be one of the main people who had turned that message into a joke, and it spiraled out of control a bit. So when my bot ended up essentially doing a callback to that with me standing in front of the company, I couldn't help but facepalm. Thankfully he was good natured about the whole thing (again).
Many years ago I tweeted something about a project I was working on, tagging #HTML5 and #CSS3 or something like that. There as a Twitter bot that retweeted anything with an #HTML5 tag, and another, seemingly independent bot that retweeted anything with #CSS3. They didn't keep a list of seen tweets. I really enjoyed watching them retweet each other for a few hours until someone intervened. (Thankfully the bots were both on batch jobs that ran every 10 minutes or so and not streaming off the firehose or some similar, more focused streaming endpoint.)
My Twitter bot @tinyspacepoo is one of a few bots that reply to @tiny_star_field, a bot which routinely tweets text resembling the night sky. My first working implementation was mutually recursive with another one of these other bots, causing them to create an infinite-ish back and forth. (I think maybe Twitter issued an http 429 rate limiting response after a while?) Even this relatively simple bot/bot interaction was unexpected and required filtering.
Chatbot is just another type of user interface. Just like web, mobile, voice, etc all have ACL, I would expect that bots and chat user have different permissions.
Yeah this caught me off guard too- this basically means anyone on this chat service could do anything they wanted OR the chatbot was granted excess privileges on commission. Either way - serious violation of principles of least privileges.
Even ignoring the security aspects, i'd be a little worried a new hire would accidentally type something like "what does\nkill all servers" mean into a channel and then accidentally do stuff.
Etsy also had another Markov-based chatbot named Snakebot that sat in various IRC channels, and kept a model of what "typical" messages in a given channel looked like. Then whenever someone would accidentally type a message including the letters "sb", the bot would awaken, and it would randomly generate a message from its corpus and send it to that channel. This included pinging people directly, or using teams' dedicated "alert words" (this was before the scourge of Slack's @here or @channel). After all, those were valid nodes in its Markov chain...
It would also respond to sequences of uppercase letters (including things like uppercase sequences in pasted URLs) by YELLING A MARKOV-BASED MESSAGE IN ALL CAPS.
Yeah but ignore is client side, you still receive the garbage. With silence (available on some networks) you could make the server to ignore that person. Especially helpful if ypu had limited bandwidth or were getting flooded.
Bots unintentionally triggering other bots is why IRC bots are supposed to talk using NOTICE instead of PRIVMSG. I think every new bot author ends up rediscovering this the hard way. I certainly did :)
The German military at some point had this recruitment website that featured, among other things, a chatbot.
The software was bought of-the-shelf and Shanghai’d into service with minimal training like a Russian infantryman, but it worked quite well for its time. It came with a good set of standard responses that had worked well in the retail space.
When you asked it about, say, invading Poland, it dutifully answered that “Poland is a great travel destination”.
I’ve done dumber things, no judge. That being said, the easier we make language models to use, the more likely we’ll accidentally end up releasing things like this (and worse) into the wild.
Anyone here work on language models (duh, yes)? Can you give me a not-off-the-shelf-and-sorry-for-the-snark-but-actually-convincing argument for why this isn’t going to be a huge problem? I don’t know much about much, but sounds scary 2 me!
I foolishly made it a third of the way through thinking this was some advanced markov-chain generated blurb based on HN blogs or comments or something, before realizing it was a human-written story *about* markov chains!
People spend hundreds of thousands of dollars on software that introduces chaos into their production environment. With this, anyone who can open a TCP connection to an IRC server can do that for you, for free. They think they're hacking you, but you're actually just getting accelerated training on managing production incidents!
(Yes, your chatbot should probably check some auth credentials before just going off and deleting production. But hey, an untested disaster recovery plan is no disaster recovery plan at all, right?)
what about authorization wehen issuing commands over IRC? were there any or was antone allowed ro tear down a server with IRC command if he knew what to write? Sounds ultra scarry.
I appreciate your efforts on keeping HN orderly. I see 3 comments that have been flagged. My theme in all of them are to encourage people to take personal responsibility for their actions. Clearly, I need to reflect more on what unsubstantive means on HN.
The OP said his ex contacted him and was concerned for his well being. He seems to admit she was right to do that. Glad they are on good terms now. I'll probably get dinged by dang again for reading and responding to what people say here, but I am not a mind reader like he is. Why not encourage discussion by asking people about their intent instead of arbitrarily flagging them?
At some point my then ex-wife (we've since reconciled) was google-stalking me and came across these entries. At some point they became outlandish enough that she contacted me to ask what was going on in my life. There was a very confused conversation where I had no idea what she was talking about and she was somewhat concerned about my health and safety. It wasn't until I talked to a relative and started googling myself with some specific terms that I came across the entries and was able to figure out what was happening. I emailed the individual running the project and asked to end the entries and if possible remove them from the internet to avoid further confusion.