I like this a lot. We need more hard records of personal correspondence. It would be cool to do this as a service.
Honestly when I read the title I thought it was going to be about using message history as a basis for generating a narrative account of the events using an LLM.
One of the great disadvantages of private emails (& texts) is the massive amount of correspondence that is lost to future historians. I have books of letters published by Feynmann, Feyeraband, Einstein, etc. Everything is now email that is behind a password, which means we'll likely never have troves of personal letters from which to contextualize modern people who become historical figures in the future.
I love it as art, but it's usefulness is questionable. If you want a hard copy, just copy it to a microsd. Or three if you are worried about losing it.
> I love it as art, but it's usefulness is questionable. If you want a hard copy, just copy it to a microsd. Or three if you are worried about losing it.
Hard copy means paper. Also microsd is a terrible for long term storage.
Flash memory relies on cells keeping charged, but the electrons can slowly leak and discharge the cells over time. It looks like the commonly claimed number is 10 years, but there's no clear answer. Hard drives also aren't great as a "set and forget" method. In either case you should refresh the data regularly (~yearly). Optical media is a great option for digital long-term storage, but paper is a very tried-and-true method, if stored in the right conditions.
Get a tape backup system and be happy. I rotate three tapes between offsite, my safe, and an active one in the tape drive. Don’t overthink “offsite”, your work office is good enough. Or your parents house. Or your neighbor on the other side of the neighborhood.
Or backup once and throw the tape in the back of the closet. At least you can restore the magnetic backup. I would not be happy with the error rate of transcribing data from paper :-)
"Hard copy" means "a collection of paper sheets bound in some fashion" in the context of books. So they probably didn't mean it in a way where microSD is equivalent.
With the EU's DMA law and the preceding GDPR, some services have to offer an API so that your hypothetical service can pull this data. However, iMessage was notably excluded from this law, and then there's the encryption thing where you can't just pull data from e.g. whatsapp.
Thank you for this! I recently was digging into the sqlite files with an idea to monitor them for changes indicating new messages and then extract them. My initial prototype seemed to mostly work, with a few hacks. Next time I look at that idea I’ll switch to your library. Any suggestions or tips around near-time accessing?
My Aunt has done a wonderful job at preserving the letters and diary entries between my grandfather + grandmother during WWII. My immediate thought is how our children and grandchildren will not have the same joy!
Ha, I’m not sure it’s quite the same. If my understanding of most couples is correct, this would be more akin to preserving all of their sticky notes e.g. “Pick up milk on the way home”, “You’re getting the kids, right?”, “see you in 20”, etc.
Yet I suppose there’s a certain charm to that, so I hope I don’t sound like too much of a wet blanket.
Now to make this work for Whatsapp for the brits... Got excited at the idea of a project and then realised I will have to learn Rust if I was to fork this haha.
Anyway, this is definitely a cool idea. Reading my chat history with friends is actually very nostalgic.
Whatsapp on Android lets you export a chat with or without media (images etc), but it limits the number of messages. With media you get the last 10k messgaes, and without you get the last 40k. Emojis are preserved though.
The limits are supposedly due to email size limits but, as they also apply when exporting to non-email endpoints like Google Drive, I suspect they're more to do with preventing people from moving their chats to other services.
Since some months (years?) ago, WhatsApp lets you set up your own encryption password for the DB backup. I set one up and used https://github.com/ElDavoo/wa-crypt-tools to get access to the decrypted SQLite and run some analytics over my messages :)
I was going to say that but I then remembered all the many many other apps that a lot of other countries used, and therefore I didn't want to act like I wasn't aware of those. For example, WeChat, Line, KakaoTalk, and more. Whatsapp is not at all universal, even if it might be the most common in many European countries.
There is a python library for Wechat, but I'm having problems using it. Also, 1) getting and then 2) decrypting the Wechat database isn't easy. Because my phone is not rooted, I had to use an android emulator on my PC, transfer all my chats over there, extract the DB and all media. After installing all the fickle dependencies of the bruteforce decrypter, it took three days straight on my laptop from 2014 to decrypt, and now I can finally open it in an SQLite viewer. But that still leaves the major step of getting formatted messages out of there like in the OP. The HTML-conversion script that I used produced half-decent results, but hasn't been maintained for a while and thus chokes on certain messages so that the conversion of large chats invariably breaks down before being finished. Anyway. Maybe it is time to learn Python...
This desperately needs to happen — in a way that all messages and media can be exported in a sensibly reviewable format. Heck, I'd just like to be able to archive a backup that I know can be restored in the future on another device.
the publisher is the business of spitting ink on paper. you should be more unsettled by being MITM'd by data mining companies whose job it is to change behavior via ads or other consensus-building tools.
I might not do this if I were a high public official or celebrity on the off-chance that someone in the printing and packaging chain might happen to notice. But, for an average person, it seems pretty harmless. (Personally, the last thing I need is more paper but I get the attraction.)
You can always print it out, then run to Kinko's and use their comb binder that they usually have out. Not as elegant as real binding, but enough to make it work on my shelves.
I would not be surprised if the files were not stored in a particularly secure manner. The publisher probably never anticipated PII being in the PDFs for print. They probably have an indefinite data retention period.
Perhaps an unpopular opinion, but this is slightly creepy.
I never understood why people care to keep their private conversation history in the first place. IMO private messages (as opposed to public posts, blogs, etc) are supposed to be temporary ("ephemeral") - one does not record every face-to-face conversation or phone call after all.
I think this is interesting, and not necessarily unpopular. It seems different people just think about this issue differently. I do everything I can to preserve every single chat history that I can. And I would like to have every face-to-face conversation and phone call recorded and easily accessible for that matter. I have a sense that I am the sum of my experiences and I don't want to forget those experiences - it feels like I am somehow less than myself if I don't remember them.
But I've seen that episode of Black Mirror, too. So I wrestle with the desire to perfectly remember everything that I've ever experienced vs the mental and emotional health benefits that clearly come from being able to forget things.
I read your reply a while ago, but still can't wrap my head around "t feels like I am somehow less than myself if I don't remember them.". I forgot many things, and it's quite ok with me, so I'm trying to understand your view.
Are you trying to remember everything all of the time? Including all of the new memories?
Suppose you are able to record calls and face-to-face interactions. Are you going to spend hours of your life re-watching or fast-forwarding through mundane everyday things?
I forget things all the time, but I don't like it. I suppose what I imagine I want is perfect recall of everything I've seen, heard, experienced. I get great joy from looking at old photos or videos, and I hate that there are millions of experiences over my lifetime that I can't recall and that there is no record of.
So, yes, I do want to remember everything all the time.
At the same time, I recognize that there are mental health benefits to forgetting things, so I will settle for being able to easily flip through all of my text messages with friends and loved ones.
I agree. But it's more to do with the part of me that cringes at messages I sent and exist forever. The person I was 10 years ago is so different. It feels so jarring reading old messages.
I can see both sides. I did actually correspond with people using written letters up until maybe 2004 or so, and in many of those cases, especially old girlfriends and letters from my little sisters when I first went to college, reading them years later was intensely nostalgic.
On the other hand, when I left the Army I moved into a much smaller place, put most of my stuff in storage, then three years later figured anything I hadn't touched or used for three years was something I didn't actually need, and let the facility have all of it. That seems to have included both all those old letters and all of my old photographs. I can't say I actually miss those things. People in here are saying they don't want to forget the past but the reality of forgetting is you don't know you forgot it so it has no perceivable effect once it happens.
To be honest, I'm nostalgic enough as is and don't think I need even more things to hold onto. I already don't watch new television or listen to new music. I'm mentally stuck in 1999 and not sure that's healthy.
Unless you write your text messages deliberately composed, in multiple paragraphs, and it takes days to send and receive, the comparison is very flawed. More long-form digital communication, like email, is a bit closer to letters, though.
Which brings me back to my point that text chats are equivalent to a spoken conversation and should be treated as such, and not be kept forever. Especially not printed out as a gift. You wouldn't give someone close a video of them sleeping or leaving for work over last 3 years (even if you saw them do it), nor a map of their movements from a GPS tracker (even if they told you where they are going), because that does not respect their boundaries nor privacy.
Although after reading some responses, no doubt some people will think of those as "cherished memories".
> I never understood why people care to keep their private conversation history in the first place.
One reason that's understandable without relying sentimentality, is they're a record of what you were doing or thinking at a particular time, much like a private diary.
There's been a few times where I've gone back though stuff like chat history to better understand something that I didn't realize the significance of at the time.
I never was disciplined enough to keep a diary when I was younger, but I started using messenger when I was about 14. It’s pretty amazing to be able to go back and see my interactions, the way I communicated, the way I saw the world (am now 30). I feel lucky to be able to have that window into my past life.
Really depends on the mindset when creating the message. If I message on a platform that keeps history, then I write with that expectation, or at least possibility, in mind. Now, this begs the question - is this modification of behavior problematic, does it detract perhaps from the meaning of the communication itself? Maybe.
The barrier, to me, is broken the moment we use technology. I work with this shit, I know how the sausage is made. I know that the phone calls are as encrypted as HTTP, that everyone can always keep record without you knowing, that even if they promise that something will go away, it may be won't, especially because it's a juicier target now just that promise alone. As soon as something is electronic, then it's a record.
If the message was something I found interesting, important, or funny, I would usually copy or screenshot it. Or remember it. Although I don't proactively delete old messages, I never intentionally backed up or transferred message history between devices either.
As for privacy and permanency - if data stops existing, it is definitely private now :)
This is really cool, and also seems like it could be a great gift to a loved one.
I was playing around with Nomic Atlas (https://docs.nomic.ai/) recently and dumped a bunch of my chat history in there, and it was pretty interesting to visualize and browse my messages as clusters around topics.
Which leads me to think that you could bring the searchability of digital to the physical format by generating embeddings for the messages and running topic modeling on them; then, you could create an index of topics at the end of the physical book with page number references to messages about that topic.
In 2000 years, your books may be the only thing left to study how we lived in the 21st century, because all ephemeral information (tweets, chat, SMS, emails, digital photos on people's devices) may have vanished.
I wasn't familiar with BN Press for personal use. I've done some research into KDP and Lulu, but I've decided that ebooks would be my main focus after getting a Kindle and loving it. For a limited/test run, BN Press seems fantastic. $30 for 1300 pages is fantastic.
The thing I like the least though is the table of contents, it's so dry with just the months and years. Despite the skepticism I have about latest AI use and abuse, generating a one-liner from the contents of each month seems like it would be a fitting usage for it.
I love this idea! I think this would be a fun idea, except 1) not sure how it would handle pictures, and 2) there are probably some texts which should not be published!
Also - noto emoji is great. It is also nice to use for 3d printing/laser cutting
I often listen to Pocket TTS on the train or when I can't access my device to skip or do much other than play/pause, and oh my god this gets me everytime haha. I am actually thinking of DIY'ing my own web-scraper thing to do a better job at it because especially for scientific articles, it's really rough when it gets to any LaTeX. And then I'm sitting there listening to some very automated sounding voice read off cryptic numbers and greek letters and code and math notation like some kind of Soviet number station (which is kinda cool at first, but gets annoying haha).
I want some kind of local document host that I can run a summarization or filtering script over to extract the portions that are legible to TTS, pipe it into something nice like ElevenLabs (if I was rich) or whatever, and then host a OGG for me to listen to on the go...
If been thinking about doing just this for a bit now. I plan on showing thumbnails plus a QR code for animations and videos; I have yet to figure out how to make the files accessible in a private and durable manner though.
Filtering full-color images down to a halftone suitable for book publishing is a mature technology, setting up an ImageMagick pipeline to do so would not be among the hard parts of preparing a book like this. Picking the right still frame out of gifs and video is a bit trickier, but not by much.
Very cool. A while ago I took a trip down memory lane with my partner to take a look at the first messages we sent each other, it was very neat and the memories definitely came back, even though it has been years since we met! A little bit like looking at a photograph and remembering the location and feeling in that moment.
I'm not sure if anyone know but I would like to ask about Signal.
I have an Android backup version of Signal message around 2020. Of course I have decrypt key. Since I can't restore it with current version, or the 2020 version of Signal (on Github release), how can I decrypt and extract all the message? Thank you.
I remember looking into a similar problem and learned that on desktop it was just an encrypted SQLite DB. It was readable with the standard SQLite library.
Not sure of the situation for the mobile backups though!
Because my backup version comes from old version of Android Signal app. I try to restore with current version of Android Signal but it doesn't work, even with old version.
I somehow initially thought that the iMessages went through some LLM which retold them in nice Brothers Grim style. But from another perspective it also makes sense to have the originals, although the author is perhaps much better than me in writing messages which may one day be worth reading…
This is fantastic. I've done some semi-similar things, have a tool I wrote for generating nice documents from Facebook Messenger conversations, for archiving important personal conversations. But I didn't take it so far as to generate a book yet! What a great idea!
Would the next level be to use an LLM to take the essence of each set of exchanges and present it in a format of a play or movie script? Perhaps novelize it?
Ya bro and then the next level would be to like feed that AI generated play or script and feed it into an AI movie maker. It would totally be a game changer! I'm super stoked and pumped about it
A book full of years of texts would be an interesting artifact, but how often would you pick it up? How interesting could it really be? Would you even want anyone else to see it?
Now suppose each exchange comes with a haiku summary, a fresh high-level look at the conversation that condenses its vague essence into a little linguistic locket, portable and easily recalled. The interplay between the mundane raw material and the poetic take would render it much more interesting, and tend to reward repeated examination.
A human poet would undoubtedly do a better job at this than an LLM, if any human poet could be persuaded to start and complete such a project. But having an LLM do it would be a very low-cost, low-effort way to at least try for interesting results.
Honestly when I read the title I thought it was going to be about using message history as a basis for generating a narrative account of the events using an LLM.