More

mieubrisse · 2025-06-12T20:23:26 1749759806

Just finished reading both Paul, and I'm very interested to know how you implemented the reader that filters?

PaulHoule · 2025-06-12T21:55:18 1749765318

Ok, those articles are from a thing I made long ago that worked on titles, I did it with some scripts and Jupyter notebooks.

I think you are talking though about my YOShInOn RSS reader uses the same scikit-learn library with the training and classification happening in a script that runs side-by-side with the web server with which I look at articles and make my judgements which is also written in Python. It is research code but it is also production in that I use it every day and I'm never afraid to demo it.

It uses the arangodb database which has a terrible license so I am making a library called "system of objects" that emulates some aspects of arangodb collections and documents over postgres tables and columns. At some point I move it to postgres and can put down that beast and feel free to either open-source or commercialize it.

At the core of it is a classification model that predicts the probability for "will I like this item?" and sampling N documents by taking the top N/k documents from each cluster. I would also blend in 30% of randomly sampled documents to keep the training data representative. The batch job looks up my judgements in the database, writes them into an numpy matrix and uses scikit-learn to train a model, it does the inference and puts its recommendations into the database, which I can see with my web front end which is done with flask and HTMX. I like this style for research/production code because you can build applications a "screen" at a time where a "screen" is a few python functions that answer a few URLs that make a web page work. The happy path of making judgements has to be very fast and easy, think TikTok or Tinder, because you will have to do that 1000s of time to make good models.

As a classification problem it's boring because it is a fuzzy problem. I might like an article today and hate it tomorrow so there is an upper ceiling to the accuracy.

So I am thinking the centaur use case that there is a stream of documents that you classify together with the model and the classification is something better defined, where the power of a more complex model to understand the document and determine something like "was the author angry?", "is this an account of a sports game?", "did the home team win?", etc.

That has me thinking about a general-purpose text classification kit which would have a small number of models chosen with practicality in mind and setting up some kind of benchmark against data sets from Kaggle.

I am not thinking about about better recommendations seriously because the problem is so vast and includes everything from: "reject anything from YouTube out of hand" to a nuanced analysis of what exactly "quality" means, not least a real-time instead of batch system that will tell me about a sports game today as opposed to next week and also push it to the front of any outbound queues -- yet, articles about carbon capture or video games or fast cars or circular economy or rural sociology can wait.

I'm interested more now in applying filtering based on people's emotional characteristics to social media, I mean maybe microblogging is dead, but it is just so much more fun if you can avoid the bottom 5% of bad behavior.

mieubrisse · 2025-06-12T20:17:15 1749759435

Thank you so much for linking these! Exactly the sort of thing I'm looking for; still making my way through the first article but fingers crossed there's an easy-to-use implementation at the bottom.

mieubrisse · 2025-06-12T20:14:50 1749759290

Thank you for the points; the second article in particular looks exactly like what I'm trying to accomplish!

mieubrisse · 2025-06-04T13:25:14 1749043514

Thanks for checking it out; let me know how you like it!

And yep, Cmd-k looks to be clear buffer.. I decided to go with Cmd-k rather than Cmd-P because:

- Cmd-k seems to be the standard most of the world has settled on (Notion, Todoist, Slack, Spotify, etc.)

- I've never used the Cmd-k clear buffer binding personally (I just use `clear`)

- I'm not a big fan of VSC using Cmd-p as "find file", since Cmd-p for me is always "Print"

That said, if you prefer Cmd-P you can use that instead! The fuzzy find functionality is in a Bash function called `cmdk`, and you can bind whatever hotkey you'd like so long as it sends the string `cmdk\n`.

mieubrisse · 2025-06-03T14:54:30 1748962470

I've been frustrated with how slow terminal filesystem navigation feels in comparison with modern apps like Notion, Slack, Discord, etc.

I discovered the amazing https://github.com/junegunn/fzf , and realized I could build ⌘-k for the terminal.

mieubrisse · 2025-05-28T17:44:38 1748454278

I wonder if this is actually true, or if it's an instance of selection bias - the instances of things not working draw our attention more than the instances of things working. "This thing is working just fine" doesn't draw eyeballs.

mieubrisse · 2025-05-28T17:42:54 1748454174

I'm not sure if I'm convinced, but I find this perspective very interesting to think about - that Tiktok might actually be about finding signal in the noise. My objection is around whether "getting the point" really means "getting to the most dopamine-producing thing" (which can still be bullshit).

mieubrisse · 2025-05-28T17:39:50 1748453990

I think the problem is signal-to-noise. For every thing that actually turns out to matter, there are hundreds of thousands of things that you're told are Important but turn out not to be. It's basically impossible to filter "Which remote events are actually important vs just ragebait?" until after the fact.

mieubrisse · 2025-05-21T13:00:30 1747832430

Exactly this. I suspect that "us vs them" is sweet poison: it feels good in the moment ("Yeah, stick it to The Man!") but it long-term keeps you trapped in a victim mindset.

mieubrisse · 2025-05-21T12:57:26 1747832246

I was looking for exactly this comment. Everybody's gloating, "Wow look how dumb AI is! Haha, schadenfreude!" but this seems like just a natural part of the evolution process to me.

It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

roxolotl · 2025-05-21T13:20:53 1747833653

The question though is what is the time horizon of “eventually”. Very different decisions should be made if it’s 1 year, 2 years, 4 years, 8 years etc. To me it seems as if everyone is making decisions which are only reasonable if the time horizon is 1 year. Maybe they are correct and we’re on the cusp. Maybe they aren’t.

Good decision making would weigh the odds of 1 vs 8 vs 16 years. This isn’t good decision making.

rsynnott · 2025-05-21T13:26:59 1747834019

Or _never_, honestly. Sometimes things just don't work out. See various 3d optical memory techs, which were constantly about to take over the world but never _quite_ made it to being actually useful, say.

ecb_penguin · 2025-05-21T13:41:06 1747834866

> This isn’t good decision making.

Why is doing a public test of an emerging technology not good decision making?

> Good decision making would weigh the odds of 1 vs 8 vs 16 years.

What makes you think this isn't being done?

Qem · 2025-05-21T14:09:24 1747836564

> It's going to look stupid... until the point it doesn't. And my money's on, "This will eventually be a solved problem."

AI can remain stupid longer than you can remain solvent.

disqard · 2025-05-21T23:33:38 1747870418

Haha, I like your take!

My variation was:

"Leadership can stay irrational longer than you can stay employed"

grewsome · 2025-05-21T13:46:28 1747835188

Sometimes the last 10% takes 90% of the time. It'll be interesting to see how this pans out, and whether it will eventually get to something that could be considered a solved problem.

I'm not so sure they'll get there. If the solved problem is defined as a sub-standard but low cost, then I wouldn't bet against that. A solution better than that though, I don't think I'd put my money on that.

disqard · 2025-05-21T23:38:32 1747870712

You just inspired a thought:

What if the goalpost is shifted backwards, to the 90% mark (instead of demanding that AI get to 100%)?

* Big corps could redefine "good enough" as "what the SotA AI can do" and call it good.

* They could then layoff even more employees, since the AI would be, by definition, Good Enough.

(This isn't too far-fetched, IMO, seeing how we're seeing calls for copyright violation to be classified as legal-when-we-do-it)

spacemadness · 2025-05-21T20:11:21 1747858281

People seem like they’re gloating as the message received in this period of the hype cycle is that AI is as good as a junior dev without caveats and it in no way is suppose to be stupid.

Workaccount2 · 2025-05-21T14:12:58 1747836778

To some people, it will always look stupid.

I have met people who believe that automobile engineering peaked in the 1960's, and they will argue that until you are blue in the face.