Ask HN: Is anyone doing anything cool with tiny language models?

kaspermarstal · 2025-01-22T07:48:47 1737532127

I built an Excel Add-In that allows my girlfriend to quickly filter 7000 paper titles and abstracts for a review paper that she is writing [1]. It uses Gemma 2 2b which is a wonderful little model that can run on her laptop CPU. It works surprisingly well for this kind of binary classification task.

The nice thing is that she can copy/paste the titles and abstracts in to two columns and write e.g. "=PROMPT(A1:B1, "If the paper studies diabetic neuropathy and stroke, return 'Include', otherwise return 'Exclude'")" and then drag down the formula across 7000 rows to bulk process the data on her own because it's just Excel. There is a gif on the readme on the Github repo that shows it.

[1] https://github.com/getcellm/cellm

7734128 · 2025-01-22T11:37:21 1737545841

You could have called it CellMate

vdm · 2025-01-22T15:39:12 1737560352

https://x.com/Suhail/status/1882069209129340963

afro88 · 2025-01-22T08:38:24 1737535104

How accurate are the classifications?

kaspermarstal · 2025-01-22T09:02:33 1737536553

I don't know. This paper [1] reports accuracies in the 97-98% range on a similar task with more powerful models. With Gemma 2 2b the accuracy will certainly be lower.

[1] https://www.medrxiv.org/content/10.1101/2024.10.01.24314702v...

indolering · 2025-01-22T11:24:33 1737545073

Y'all definitely need to cross validate a small number of samples by hand. When I did this kind of research, I would hand validate to at least P < .01.

kaspermarstal · 2025-01-22T11:59:23 1737547163

She and one other researcher has manually classified all 7000 papers as per standard protocol. Perhaps for the next article they will measure how this tool agreed with them against them and include it in the protocol if good enough.

beernet · 2025-01-22T12:26:05 1737548765

> I don't know.

HN in a nutshell: I've built some cool tech but have no idea if it is helpful or even counter productive...

corobo · 2025-01-22T14:16:26 1737555386

Real HN in a nutshell: People who don't build stuff telling people who do build stuff that the thing they built is useless :P

It's a hacker forum, let people hack!

If anything have a dig at OP for posting the thread too soon before the parent commenter has had the chance to gather any data, haha

Breza · 2025-01-24T18:15:37 1737742537

Great attitude! I recently built a tool for my wife that uses an LLM to automate a task. Is it production ready? Definitely not. But it saves her time even in its current state.

greenavocado · 2025-01-22T15:51:52 1737561112

Just because you can, doesn't mean you should

corobo · 2025-01-22T16:03:14 1737561794

If you're building a dinosaur sanctuary sure

stackghost · 2025-01-22T17:53:34 1737568414

Or an Internet surveillance-capitalism panopticon.

kaspermarstal · 2025-01-22T17:04:41 1737565481

I am not going to claim or report any kind of accuracy, especially with such a small model and such a specific, context dependent use case. It is the user’s responsibility to cross validate if it’s accurate enough for their use case and upgrade model or use another approach if not.

jbs789 · 2025-01-22T17:40:07 1737567607

A user buys a car because it gets them from point A to point B. I get what you’re saying though - we are earlier along the adoption curve for these models and more responsibility sits with the user. Over time the expectations will no doubt increase.

dzamo_norton · 2025-01-24T07:23:15 1737703395

Offer a 100% money back guarantee if the user finds that the software is not fit for purpose :)

rasmus1610 · 2025-01-22T13:26:02 1737552362

Sometimes people just like to build stuff for the sake of it.

jajko · 2025-01-22T13:53:30 1737554010

Almost like hackers, doing shit just for the heck of it because they can (mostly)

sidcool · 2025-01-22T16:48:04 1737564484

Sometimes it's the joy of creation. Utility and optimization come later. It's fun. Like a hobby.

ddddqqqq · 2025-01-24T14:05:28 1737727528

Seems very nice and useful.

I'd like to have something similar integrated with Zotero to get an easy interaction and get answers about papers I added as references.

TeamDman · 2025-01-23T16:56:03 1737651363

Tried it out, very cool! Fun to see it chugging on a bunch of rows. Had a weird issue where it would recompute values endlessly when I used it in a table, but I had another table it worked with so not sure what that was about

kaspermarstal · 2025-01-24T12:02:28 1737720148

Glad you tried it out! Excel triggers recalculation when a referenced cell updates, just like with any other formula. This is also why responses are not streamed, as every update would trigger recalculation. But if the async behavior of responses messes with the recalculation logic I am very interested in looking into it and you are most welcome to open an issue in the repo with steps to reproduce.

basmok · 2025-01-22T19:24:29 1737573869

Can someone hack this together as pure matrix multiplication?

Like either as table in the background or as regular script?

On most computers you can't compile or add add-ons without administrative rights and LLM Chat sites are blocked to prevent usage of company data.

It should run on native Excel or GSheets.

I mean, pure without compilation, just like the do the matrix calculations here straight in Excel without admin rights:

Lesson 1: Demystifying how LLMs work, from architecture to Excel

https://youtu.be/FyeN5tXMnJ8

As far as i know in GSheet the scripts also run on the Google Servers and are not limited by the local computer power. So there larger models could be deployed.

Someone can hack this into Excel/GSheet?

7734128 · 2025-01-22T11:37:21 1737545841

You could have called it CellMate b

relistan · 2025-01-22T08:01:08 1737532868

Very cool idea. I’ve used gemma2 2b for a few small things. Very good model for being so small.

upcoming-sesame · 2025-01-24T11:28:40 1737718120

Could it be adapted for Google Sheets ?

kaspermarstal · 2025-01-24T11:56:31 1737719791

Yes, and it will be

donbreo · 2025-01-22T09:34:22 1737538462

Requirements: -Windows

Looks like I'm out... Would be great if there was a google apps script alternative. My company gave all devs linux systems and the business team operates on windows. So I always use browser based tech like Gapps script for complex sheet manipulation

jkman · 2025-01-22T17:05:56 1737565556

Well it's an excel add-in, how else would it work?

NotMichaelBay · 2025-01-23T12:45:05 1737636305

Excel add-ins can be written with the Office JS API so that they can run on web as well as desktop for Windows and Mac. But I don't think OP's add-in is possible with that API unless the local model can be run in JS.

antonok · 2025-01-21T23:57:18 1737503838

I've been using Llama models to identify cookie notices on websites, for the purpose of adding filter rules to block them in EasyList Cookie. Otherwise, this is normally done by, essentially, manual volunteer reporting.

Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and then you can grab their `innerText` and filter out false positives with a small LLM. I've found the 3B models have decent performance on this task, given enough prompt engineering. They do fall apart slightly around edge cases like less common languages or combined cookie notice + age restriction banners. 7B has a negligible false-positive rate without much extra cost. Either way these things are really fast and it's amazing to see reports streaming in during a crawl with no human effort required.

Code is at https://github.com/brave/cookiemonster. You can see the prompt at https://github.com/brave/cookiemonster/blob/main/src/text-cl....

GardenLetter27 · 2025-01-22T14:21:05 1737555665

It's funny that this is even necessary though - that great EU innovation at work.

kalaksi · 2025-01-22T15:50:24 1737561024

Tracking, tracking cookies, banners etc. are a choice done by the website. There are browser addons for making it simpler, though.

The transparency requirements and consent for collecting all kinds of PII (this is the regulation) actually is a great innovation.

docmars · 2025-01-22T16:22:30 1737562950

I think I'd rather see cookie notices handled by a browser API with a common UI, where the default is always "No." Provide that common UI in a popover accessed in the address bar, or a side pane in the browser itself.

If a user logs in or does something requiring cookies that would otherwise prevent normal functionality, prompt them with a Permissions box if they haven't already accepted it in the usual (optional) UI.

kalaksi · 2025-01-22T16:35:03 1737563703

Cookies for normal functionality don't require consent anyway.

But yes, I think just about everybody would like the UX you described. But the entities that track you don't want to make it that easy. You probably know of the do-not-track header too.

YetAnotherNick · 2025-01-22T20:43:15 1737578595

There isn't any way EU didn't knew this was possible and is a better choice. There already was DNT header that they can regulate. It also knew the harm to ad industry.

Fraaaank · 2025-01-22T21:10:49 1737580249

There isn't any rule that requires websites to use a cookie banner. Your required to obtain explicit consent before reading/setting any cookies that aren't strictly necessary. The web came up with the cookie banner.

Google could've implemented a consent API in Chrome, but they didn't. Guess why.

vvillena · 2025-01-22T19:10:30 1737573030

Bear in mind, those arcane cookie forms are probably not compliant with EU laws. If there's not a "reject" button next to the "accept" button, the form is almost definitely not to spec.

pornel · 2025-01-22T17:20:24 1737566424

The legislation has been watered down by lobbying of the trillion-dollar tracking industry.

The industry knows ~nobody wants to be tracked, so they don't want to let tracking preferences to be easy to express. They want cookie notices to be annoying to make people associate privacy with a bureaucratic nonsense, and stop demanding to have privacy.

There was P3P spec in 2002: https://www.w3.org/TR/P3P/

It even got decent implementation in Internet Explorer, but Google has been deliberately sending a junk P3P header to bypass it.

It has been tried again with a very simple DNT spec. Support for it (that barely existed anyway) collapsed after Microsoft decided to make Do-Not-Track on by default in Edge.

bazmattaz · 2025-01-22T00:03:08 1737504188

This is so cool thanks for sharing. I can imagine it’s not technically possible (yet?) but it would be cool if this could simply be run as a browser extension rather than running a docker container

antonok · 2025-01-22T00:05:48 1737504348

I did actually make a rough proof-of-concept of this! One of my long-term visions is to have it running natively in-browser, and able to automatically fix site issues caused by adblocking whenever they happen.

The PoC is a bit outdated but it's here: https://github.com/brave/cookiemonster/tree/webext

MarioMan · 2025-01-22T07:00:59 1737529259

There are a couple of WebGPU LLM platforms available that form the building blocks to accomplish this right from the browser, especially since the models are so small.

https://github.com/mlc-ai/web-llm

https://huggingface.co/docs/transformers.js/en/index

You do have to worry about WebGPU compatibility in browsers though.

https://caniuse.com/webgpu

throwup238 · 2025-01-22T03:53:55 1737518035

It should be possible using native messaging [1] which can call out to an external binary. The 1password extensions use that to communicate with the password manager binary.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

rpastuszak · 2025-01-22T14:24:15 1737555855

Tangentially related, I worked on something similar, using LLMs to find and skip sponsored content in YT videos:

https://butter.sonnet.io/

binarysneaker · 2025-01-22T00:03:00 1737504180

Maybe it could also send automated petitions to the EU to undo cookie consent legislation, and reverse some of the enshitification.

antonok · 2025-01-22T00:09:07 1737504547

Ha, I'm not sure the EU is prepared to handle the deluge of petitions that would ensue.

On a more serious note, this must be the first time we can quantitatively measure the impact of cookie consent legislation across the web, so maybe there's something to be explored there.

pk-protect-ai · 2025-01-22T09:09:39 1737536979

why don't you spam the companies who want your data instead? The sites can simply stop gathering your data, then they will not require to ask for consent ...

frail_figure · 2025-01-22T09:58:58 1737539938

It’s the same comments on HN as always. They think EU setting up rules is somehow worse than companies breaking them. We see how the US is turning out without pesky EU restrictions :)

GardenLetter27 · 2025-01-22T14:22:32 1737555752

The US has 3x higher salaries, larger houses and a much higher quality of life?

I work as a senior engineer in Europe and make barely $4k net per month... and that's considered a "good" salary!

Lutger · 2025-01-22T15:09:58 1737558598

It has higher salaries for privileged people like senior engineers. Try making ends meet in a lower class job.

And you have (almost) free and universal healthcare in Europa, good food available everywhere, drinking water that doesn't poison you, walkable cities, good public transport, somewhat decent police and a functioning legal system. The list goes on. Does this not impact your quality of life? Do you not care about these things?

How can you have a higher quality of life as a society with higher murders, much lower life-expectancy, so many people in jail, in debt, etc.

macinjosh · 2025-01-22T15:33:08 1737559988

Touch grass. The US is a big place and is nothing like you seem to think it is.

Europe on the other hand can't even manage to defend itself and relies on the US for their sheer existence.

pona-a · 2025-01-22T17:05:56 1737565556

Can you enlighten me of a state where none of parent's points apply? I'd be glad to be educated.

whywhywhywhy · 2025-01-22T10:19:13 1737541153

Because they have no reason to care about what you think or feel or they wouldn't be doing it in the first place.

Cookie notices just gave them another weapon in the end.

K0balt · 2025-01-22T00:43:30 1737506610

I think there is real potential here, for smart browsing. Have the llm get the page, replace all the ads with kittens, find non-paywall versions if possible and needed, spoof fingerprint data, detect and highlight AI generated drivel, etc. The site would have no way of knowing that it wasn’t touching eyeballs. We might be able to rake back a bit of the web this way.

antonok · 2025-01-22T00:58:30 1737507510

You probably wouldn't want to run this in real-time on every site as it'll significantly increase the load on your browser, but as long as it's possible to generate adblock filter rules, the fixes can scale to a pretty large audience.

K0balt · 2025-01-22T02:30:26 1737513026

I was thinking running it in my home lab server as a proxy, but yeah, scaling it to the browser would require some pretty strong hardware. Still, maybe in a couple of years it could be mainstream.

Tepix · 2025-01-22T16:39:50 1737563990

Depends on your machine and on the LLM. Could be doable.

sebastiennight · 2025-01-22T04:08:17 1737518897

To me this take is like smokers complaining that the evil government is forcing the good tobacco companies to degrade the experience by adding pictures of cancer patients on cigarette packs.

kortilla · 2025-01-22T07:56:00 1737532560

Those don’t really work: https://jamanetwork.com/journals/jamanetworkopen/fullarticle...

shiftingleft · 2025-01-22T12:41:02 1737549662

Do they help deter people from becoming smokers in the first place?

kortilla · 2025-01-23T04:50:34 1737607834

Not sure if much serious research has been put into it. I would be suspicious of it deterring them because a lot of initial smoking happens in social situations where friends pass out individual cigarettes.

By the time someone buys their own pack they are probably hooked.

I suspect the obscene taxes blocking out young folks is one of the most effective strategies

Evidlo · 2025-01-22T00:00:46 1737504046

I have ollama responding to SMS spam texts. I told it to feign interest in whatever the spammer is selling/buying. Each number gets its own persona, like a millennial gymbro or 19th century British gentleman.

http://files.widloski.com/image10%20(1).png

http://files.widloski.com/image11.png

celestialcheese · 2025-01-22T00:31:20 1737505880

Given the source, I'm skeptical it's not just a troll, but found this explanation [0] plausible as to why those vague spam text exists. If true, this trolling helps the spammers warm those phone numbers up.

0 - https://x.com/nikitabier/status/1867029883387580571

stogot · 2025-01-22T01:16:20 1737508580

Why does STOP work here?

inerte · 2025-01-22T01:22:32 1737508952

Carriers and SMS service providers (like Twillio) obey that, no matter what service is behind.

There are stories of people replying STOP to spam, then never getting a legit SMS because the number was re-used by another service. That's because it's being blocked between the spammer and the phone.

yawgmoth · 2025-01-22T13:15:25 1737551725

STOP works thanks to the Telephone Consumer Protection Act (“TCPA”), which offers consumers spam protections and senders a framework on how to behave.

(Edit: It's relevant that STOP didn't come from the TCPA itself, but definitely has teeth due to it)

https://www.infobip.com/blog/a-guide-to-global-sms-complianc...

celestialcheese · 2025-01-22T01:21:39 1737508899

https://x.com/nikitabier/status/1867069169256308766

Again, no clue if this is true, but it seems plausible.

blackeyeblitzar · 2025-01-22T01:46:21 1737510381

You realize this is going to cause carriers to allow the number to send more spam, because it looks like engagement. The best thing to do is to report the offending message to 7726 (SPAM) so the carrier can take action. You can also file complaints at the FTC and FCC websites, but that takes a bit more effort.

thegabriele · 2025-01-22T08:25:13 1737534313

Yes, the very last thing to do is respond to spam (calls, email, text...) and inform that you are eligible to more solicitation.

merpkz · 2025-01-22T06:55:26 1737528926

Calling Jessica an old chap is quite a giveaway that it's a bot xD Nice idea indeed, but I have a feeling that it's just two LLMs now conversing with each other.

lacoolj · 2025-01-22T18:07:13 1737569233

Most spam are just verifying you exist as a person, then from there you become an actual "target" if you respond.

This feels like an in-between that both wastes their time and adds you to extra lists.

Send the results somewhere! Not sure if "law enforcement" is applicable (as in, would be able/willing to act on the info) but if so, that's a great use of this data :)

RVuRnvbM2e · 2025-01-22T00:04:18 1737504258

This is fantastic. How have your hooked up a mobile number to the llm?

Evidlo · 2025-01-22T00:15:11 1737504911

Android app that forwards to a Python service on remote workstation over MQTT. I can make a Show HN if people are interested.

SuperHeavy256 · 2025-01-22T14:56:48 1737557808

I am so SO interested, please make a Show HN

Evidlo · 2025-01-22T19:28:46 1737574126

https://news.ycombinator.com/item?id=42796496

gaudystead · 2025-01-22T19:41:39 1737574899

Sweeeeet, thank you! :)

deadbabe · 2025-01-22T00:31:34 1737505894

I’d love to see that. Could you simulate iMessage?

great_psy · 2025-01-22T01:33:56 1737509636

Yes it’s possible, but it’s not something you can easily scale.

I had a similar project a few years back that used OSX automations and Shortcuts and Python to send a message everyday to a friend. It required you to be signed in to iMessage on your MacBook.

Than was a send operation, the reading of replies is not something I implemented, but I know there is a file somewhere that holds a history of your recent iMessages. So you would have to parse it on file update and that should give you the read operation so you can have a conversation.

Very doable in a few hours unless something dramatic changed with how the messages apps works within the last few years.

dewey · 2025-01-22T11:05:21 1737543921

They are all in a SQLite db on your disk.

Evidlo · 2025-01-22T00:55:05 1737507305

If you mean hook this into iMessage, I don't know. I'm willing to bet it's way harder though because Apple

dambi0 · 2025-01-22T08:23:08 1737534188

If you are willing to use Apple Shortcuts on iOS it’s pretty easy to add something that will be trigged when a message is received and can call out to a service or even use SSH to do something with the contents, including replying

potamic · 2025-01-22T11:32:16 1737545536

Why MQTT over HTTP for a low volume, small scale integration?

c0wb0yc0d3r · 2025-01-22T14:00:44 1737554444

I’m not OP, but I would hazard a guess that those are the tools that OP has at hand.

0xedd · 2025-01-23T08:38:04 1737621484

Good, cheap design that takes care of dead letters vs implementing a failover endpoint that would require extra hardware.

MQTT is plug and play in Python. No more costly than a HTTP server.

dkga · 2025-01-22T00:59:54 1737507594

Yes, I'd be interested in that!

sainib · 2025-01-22T11:50:48 1737546648

Interested for sure.

spiritplumber · 2025-01-22T00:11:37 1737504697

For something similar with FB chat, I use Selenium and run it on the same box that the llm is running on. Using multiple personalities is really cool though. I should update mine likewise!

zx8080 · 2025-01-22T00:12:35 1737504755

Cool! Do you consider the risk of unintentional (and until some moment, an unknown) subscription to some paid SMS service and how do you mitigate it?

Evidlo · 2025-01-22T00:18:24 1737505104

I have to whitelist a conversation before the LLM can respond.

potatoman22 · 2025-01-22T18:41:16 1737571276

You probably just get more spam texts since you're replying. Maybe that's a good thing tbh

hackergirl88 · 2025-01-22T16:42:19 1737564139

Where was this during the election

bripkens · 2025-01-22T18:14:51 1737569691

You should put all these interactions on the web. For education purposes ofc.

thecosmicfrog · 2025-01-22T00:41:44 1737506504

Please tell me you have a blog/archive of these somewhere. This was such a joy to read!

metadat · 2025-01-22T05:42:17 1737524537

I love this, more please!!!

behohippy · 2025-01-21T20:57:37 1737493057

I have a mini PC with an n100 CPU connected to a small 7" monitor sitting on my desk, under the regular PC. I have llama 3b (q4) generating endless stories in different genres and styles. It's fun to glance over at it and read whatever it's in the middle of making. I gave llama.cpp one CPU core and it generates slow enough to just read at a normal pace, and the CPU fans don't go nuts. Totally not productive or really useful but I like it.

ipython · 2025-01-21T22:46:32 1737499592

That's neat. I just tried something similar:

    FORTUNE=$(fortune) && echo $FORTUNE && echo "Convert the following output of the Unix `fortune` command into a small screenplay in the style of Shakespeare: \n\n $FORTUNE" | ollama run phi4

watermelon0 · 2025-01-22T08:01:26 1737532886

Doesn't `fortune` inside double quotes execute the command in bash? You should use single quotes instead of backticks.

Uehreka · 2025-01-21T21:20:48 1737494448

Do you find that it actually generates varied and diverse stories? Or does it just fall into the same 3 grooves?

Last week I tried to get an LLM (one of the recent Llama models running through Groq, it was 70B I believe) to produce randomly generated prompts in a variety of styles and it kept producing cyberpunk scifi stuff. When I told it to stop doing cyberpunk scifi stuff it went completely to wild west.

o11c · 2025-01-21T21:35:42 1737495342

You should not ever expect an LLM to actually do what you want without handholding, and randomness in particular is one of the places it fails badly. This is probably fundamental.

That said, this is also not helped by the fact that all of the default interfaces lack many essential features, so you have to build the interface yourself. Neither "clear the context on every attempt" nor "reuse the context repeatedly" will give good results, but having one context producing just one-line summaries, then fresh contexts expanding each one will do slightly less badly.

(If you actually want the LLM to do something useful, there are many more things that need to be added beyond this)

dotancohen · 2025-01-21T22:50:29 1737499829

Sounds to me like you might want to reduce the Top P - that will prevent the really unlikely next tokens from ever being selected, while still providing nice randomness in the remaining next tokens so you continue to get diverse stories.

coder543 · 2025-01-22T02:33:13 1737513193

Someone mentioned generating millions of (very short) stories with an LLM a few weeks ago: https://news.ycombinator.com/item?id=42577644

They linked to an interactive explorer that nicely shows the diversity of the dataset, and the HF repo links to the GitHub repo that has the code that generated the stories: https://github.com/lennart-finke/simple_stories_generate

So, it seems there are ways to get varied stories.

TMWNN · 2025-01-22T05:13:20 1737522800

> Do you find that it actually generates varied and diverse stories? Or does it just fall into the same 3 grooves?

> Last week I tried to get an LLM (one of the recent Llama models running through Groq, it was 70B I believe) to produce randomly generated prompts in a variety of styles and it kept producing cyberpunk scifi stuff.

100% relevant: "Someday" <https://en.wikipedia.org/wiki/Someday_(short_story)> by Isaac Asimov, 1956

janalsncm · 2025-01-21T22:56:07 1737500167

Generate a list of 5000 possible topics you’d like it to talk about. Randomly pick one and inject that into your prompt.

jaggs · 2025-01-22T19:32:57 1737574377

https://old.reddit.com/r/LocalLLaMA/comments/1i615u1/the_fir...

behohippy · 2025-01-22T12:40:00 1737549600

It's a 3b model so the creativity is pretty limited. What helped for me was prompting for specific stories in specific styles. I have a python script that randomizes the prompt and the writing style, including asking for specific author styles.

greenavocado · 2025-01-22T15:55:24 1737561324

Set temperature to 1.0

keeganpoppen · 2025-01-21T22:44:19 1737499459

oh wow that is actually such a brilliant little use case-- really cuts to the core of the real "magic" of ai: that it can just keep running continuously. it never gets tired, and never gets tired of thinking.

Dansvidania · 2025-01-21T21:05:19 1737493519

this sounds pretty cool, do you have any video/media of it?

behohippy · 2025-01-22T12:40:41 1737549641

I don't have a video but here's a pic of the output: https://imgur.com/ip8GWIh

sky2224 · 2025-01-23T02:00:24 1737597624

The next step is to format it so it looks like an endless starwars intro.

bithavoc · 2025-01-21T21:11:56 1737493916

this is so cool, any chance you post a video?

behohippy · 2025-01-22T12:41:02 1737549662

Just this pic: https://imgur.com/ip8GWIh

droideqa · 2025-01-22T01:53:47 1737510827

That's awesome!

nozzlegear · 2025-01-21T23:30:18 1737502218

I have a small fish script I use to prompt a model to generate three commit messages based off of my current git diff. I'm still playing around with which model comes up with the best messages, but usually I only use it to give me some ideas when my brain isn't working. All the models accomplish that task pretty well.

Here's the script: https://github.com/nozzlegear/dotfiles/blob/master/fish-func...

And for this change [1] it generated these messages:

    1. `fix: change from printf to echo for handling git diff input`
    
    2. `refactor: update codeblock syntax in commit message generator`
    
    3. `style: improve readability by adjusting prompt formatting`

[1] https://github.com/nozzlegear/dotfiles/commit/0db65054524d0d...

relistan · 2025-01-22T08:18:54 1737533934

Interesting idea. But those say what’s in the commit. The commit diff already tells you that. The best commit messages IMO tell you why you did it and what value was delivered. I think it’s gonna be hard for an LLM to do that since that context lives outside the code. But maybe it would, if you hook it to e.g. a ticketing system and include relevant tickets so it can grab context.

For instance, in your first example, why was that change needed? It was a fix, but for what issue?

In the second message: why was that a desirable change?

rane · 2025-01-22T13:52:58 1737553978

Most of the time you are not able to fit the "Why?" in the summary.

That's what the body of the commit message is for.

nozzlegear · 2025-01-22T15:36:45 1737560205

Typically I put the "why" of the commit in the body unless it's a super simple change, but that's a good point. Sometimes this function does generate a commit body to go with the summary, and sometimes it doesn't. It also has a habit of only looking at the first file in a diff and basing its messages off of that, instead of considering the whole patch.

I'll tweak the prompt when I have some time today and see if I can get some more consistency out of it.

lnenad · 2025-01-22T11:15:09 1737544509

I disagree. When you look at the git history in x months you're gonna have a hard time understanding what was done following your example.

Draiken · 2025-01-22T14:42:57 1737556977

I disagree. If you look back and all you see are commit messages summarizing the diff, you won't get any meaningful information.

Telling me `Changed timeout from 30s to 60s` means nothing, while `Increase timeout for slow <api name> requests` gives me an actual idea of why that was done.

Even better if you add meaningful messages to the commit body.

Take a look at commits from large repositories like the Linux kernel and we can see how good commit messages looks like.

lnenad · 2025-01-23T09:00:01 1737622801

I mean you're not op but his comment was saying

> Interesting idea. But those say what’s in the commit. The commit diff already tells you that. The best commit messages IMO tell you why you did it and what value was delivered.

Which doesn't include what was done. Your example includes both which is fine. But not including what the commit does in the message is an antipattern imho. Everything else that is added is a bonus.

Draiken · 2025-01-23T10:48:32 1737629312

Many changes require multiple smaller changes, so this is not always possible.

For me the commit message should tell me the what/why and the diff is the how. It's great to understand if, for example, a change was intentional or a bug.

Many times when searching for the source of a bug I could not tell if the line changed was intentional or a mistake because the commit message was simply repeating what was on the diff. If you say your intention was to add something and the diff shows a subtraction, you can easily tell it was a mistake. Contrived example but I think it demonstrates my point.

This only really works if commits are meaningful though. Most people are careless and half their commits are 'fix this', 'fix again', 'wip', etc. At that point the only place that can contain useful information on the intentions are the pull requests/issues around it.

Take a single commit from the Linux kernel: https://github.com/torvalds/linux/commit/08bd5b7c9a2401faabd... It doesn't tell me "add function X, Y and boolean flag Z". It tells us what/why it was done, and the diff shows us how.

relistan · 2025-01-22T13:23:42 1737552222

By adding more context? I’m not sure who you’re replying to or what your objection is.

zanderwohl · 2025-01-22T16:18:07 1737562687

> The commit diff already tells you that.

When you squash a branch you'll have 200+ lines of new code on a new feature. The diff is not a quick way to get a summary of what's happening. You should put the "what" in your commit messages.

mystified5016 · 2025-01-22T18:07:23 1737569243

That's actually pretty useful. This could be a big help in betting back into the groove when you leave uncommitted changes over the weekend.

A summary of changes like this might be just enough to spark your memory on what you were actually doing with the changes. I'll have to give it a shot!

mentos · 2025-01-22T03:53:17 1737517997

Awesome need to make one for naming variables too haha

lionkor · 2025-01-22T08:51:51 1737535911

Those commit messages are pretty terrible, please try to come up with actual messages ;)

sidravi1 · 2025-01-22T03:04:47 1737515087

We fine-tuned a Gemma 2B to identify urgent messages sent by new and expecting mothers on a government-run maternal health helpline.

https://idinsight.github.io/tech-blog/blog/enhancing_materna...

proxygeek · 2025-01-22T04:19:11 1737519551

Such a fun thread but this is the kind of applications that perk up my attention!

Very cool!

Mumps · 2025-01-22T15:06:49 1737558409

lovely application!

Genuine question: why not use (Modern)BERT instead for classification? (Is the json-output explanation so critical?)

Mashimo · 2025-01-22T08:09:09 1737533349

Oh that is a nice writeup. We have something similar in mind at work. Will forward it.

Mukina · 2025-01-23T06:53:50 1737615230

Super cool. What a simple and powerful way to help mothers in need. Thanks for sharing.

flippyhead · 2025-01-21T22:12:50 1737497570

I have a tiny device that listens to conversations between two people or more and constantly tries to declare a "winner"

mkaic · 2025-01-22T00:27:08 1737505628

This reminds me of the antics of streamer DougDoug, who often uses LLM APIs to live-summarize, analyze, or interact with his (often multi-thousand-strong) Twitch chat. Most recently I saw him do a GeoGuessr stream where he had ChatGPT assume the role of a detective who must comb through the thousands of chat messages for clues about where the chat thinks the location is, then synthesizes the clamor into a final guess. Aside from constantly being trolled by people spamming nothing but "Kyoto, Japan" in chat, it occasionaly demonstrated a pretty effective incarnation of "the wisdom of the crowd" and was strikingly accurate at times.

eddd-ddde · 2025-01-21T23:41:02 1737502862

I love that there's not even a vague idea of the winner "metric" in your explanation. Like it's just, _the_ winner.

jjcm · 2025-01-21T22:37:56 1737499076

Are you raising a funding round? I'm bought in. This is hilarious.

oa335 · 2025-01-21T22:18:10 1737497890

This made me actually laugh out loud. Can you share more details on hardware and models used?

pseudosavant · 2025-01-21T22:17:26 1737497846

I'd love to hear more about the hardware behind this project. I've had concepts for tech requiring a mic on me at all times for various reasons. Always tricky to have enough power in a reasonable DIY form factor.

econ · 2025-01-21T22:34:50 1737498890

This is a product I want

hn8726 · 2025-01-21T23:17:31 1737501451

What approach/stack would you recommend for listening to an ongoing conversation, transcribing it and passing through llm? I had some use cases in mind but I'm not very familiar with AI frameworks and tools

econ · 2025-01-23T20:04:31 1737662671

Tell me it also does sports style commentary on the ongoing debate. My mental image requires it.

amelius · 2025-01-21T22:36:52 1737499012

You can use the model to generate winning speeches also.

prakashn27 · 2025-01-22T04:14:44 1737519284

wifey always wins. ;)

nejsjsjsbsb · 2025-01-22T02:28:43 1737512923

All computation on device?

deivid · 2025-01-22T06:51:23 1737528683

what model do you use for speech to text?

TechDebtDevin · 2025-01-22T11:47:13 1737546433

Your SO must really love that lmao

simonjgreen · 2025-01-21T22:33:53 1737498833

Micro Wake Word is a library and set of on device models for ESPs to wake on a spoken wake word. https://github.com/kahrendt/microWakeWord

Recently deployed in Home Assistants fully local capable Alexa replacement. https://www.home-assistant.io/voice_control/about_wake_word/

yzydserd · 2025-01-22T06:34:10 1737527650

Nice idea.

kortilla · 2025-01-22T07:58:37 1737532717

Make sure your meeting participants know you’re transcribing them. Has similar notification requirements as recording state to state.

RhysU · 2025-01-21T20:37:07 1737491827

"Comedy Writing With Small Generative Models" by Jamie Brew (Strange Loop 2023)

https://m.youtube.com/watch?v=M2o4f_2L0No

Spend the 45 minutes watching this talk. It is a delight. If you are unsure, wait until the speaker picks up the guitar.

100k · 2025-01-21T20:40:26 1737492026

Seconded! This was my favorite talk at Strange Loop (including my own).

prettyblocks · 2025-01-22T23:05:53 1737587153

Excellent share - nice to see people doing cool things with the tech while not taking themselves too seriously.

azhenley · 2025-01-21T20:50:42 1737492642

Microsoft published a paper on their FLAME model (60M parameters) for Excel formula repair/completion which outperformed much larger models (>100B parameters).

https://arxiv.org/abs/2301.13779

andai · 2025-01-21T21:30:28 1737495028

This is wild. They claim it was trained exclusively on Excel formulas, but then they mention retrieval? Is it understanding the connection between English and formulas? Or am I misunderstanding retrieval in this context?

Edit: No, the retrieval is Formula-Formula, the model (nor I believe tokenizer) does not handle English.

coder543 · 2025-01-22T03:31:10 1737516670

That paper is from over a year ago, and it compared against codex-davinci... which was basically GPT-3, from what I understand. Saying >100B makes it sound a lot more impressive than it is in today's context... 100B models today are a lot more capable. The researchers also compared against a couple of other ancient(/irrelevant today), small models that don't give me much insight.

FLAME seems like a fun little model, and 60M is truly tiny compared to other LLMs, but I have no idea how good it is in today's context, and it doesn't seem like they ever released it.

aDyslecticCrow · 2025-01-22T18:31:37 1737570697

I would like to disagree with its being irrelevant. If anything, the 100B models are irrelevant in the context and should be seen as a "fun inclusion" rather than a serious addition worth comparing against. It out-performing a 100B model at the time becomes a fun bragging point, but it's not the core value of the method or paper.

Running a prompt against every single cell of a 10k row document was never gonna happen with a large model. Even using a transformer model architecture in the first place can be seen as ludicrous overkill but feasible on modern machines.

So I'd say the paper is very relevant, and the top commenter in this very thread demonstrated their own homegrown version with a very nice use-case (paper abstract and title sorting for making a summary paper)

coder543 · 2025-01-22T18:36:52 1737571012

> Running a prompt against every single cell of a 10k row document was never gonna happen with a large model

That isn’t the main point of FLAME, as I understood it. The main point was to help you when you’re editing a particular cell. codex-davinci was used for real time Copilot tab completions for a long time, I believe, and editing within a single formula in a spreadsheet is far less demanding than editing code in a large document.

After I posted my original comment, I realized I should have pointed out that I’m fairly sure we have 8B models that handily outperform codex-davinci these days… further driving home how irrelevant the claim of “>100B” was here (not talking about the paper). Plus, an off the shelf model like Qwen2.5-0.5B (a 494M model) could probably be fine tuned to compete with (or dominate) FLAME if you had access to the FLAME training data — there is probably no need to train a model from scratch, and a 0.5B model can easily run on any computer that can run the current version of Excel.

You may disagree, but my point was that claiming a 60M model outperforms a 100B model just means something entirely different today. Putting that in the original comment higher in the thread creates confusion, not clarity, since the models in question are very bad compared to what exists now. No one had clarified that the paper was over a year old until I commented… and FLAME was being tested against models that seemed to be over a year old even when the paper was published. I don’t understand why the researchers were testing against such old models even back then.

3abiton · 2025-01-21T22:05:43 1737497143

But I feel we're going back full circle. These small models are not generalist, thus not really LLMs at least in terms of objective. Recently there has been a rise of "specialized" models that provide lots of values, but that's not why we were sold on LLMs.

colechristensen · 2025-01-21T22:16:06 1737497766

But that's the thing, I don't need my ML model to be able to write me a sonnet about the history of beets, especially if I want to run it at home for specific tasks like as a programming assistant.

I'm fine with and prefer specialist models in most cases.

zeroCalories · 2025-01-21T23:44:10 1737503050

I would love a model that knows SQL really well so I don't need to remember all the small details of the language. Beyond that, I don't see why the transformer architecture can't be applied to any problem that needs to predict sequences.

dr_kiszonka · 2025-01-22T01:02:48 1737507768

The trick is to find such problems with enough training data and some market potential. I am terrible at it.

Suppafly · 2025-01-21T23:01:34 1737500494

Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.

janalsncm · 2025-01-21T23:12:03 1737501123

I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.

barrenko · 2025-01-21T21:14:18 1737494058

This is really cool. Is this already in Excel?

computers3333 · 2025-01-22T08:24:16 1737534256

https://gophersignal.com – I built GopherSignal!

It's a lightweight tool that summarizes Hacker News articles. For example, here’s what it outputs for this very post, "Ask HN: Is anyone doing anything cool with tiny language models?":

"A user inquires about the use of tiny language models for interesting applications, such as spam filtering and cookie notice detection. A developer shares their experience with using Ollama to respond to SMS spam with unique personas, like a millennial gymbro or a 19th-century British gentleman. Another user highlights the effectiveness of 3B and 7B language models for cookie notice detection, with decent performance achieved through prompt engineering."

I originally used LLaMA 3:Instruct for the backend, which performs much better, but recently started experimenting with the smaller LLaMA 3.2:1B model.

It’s been cool seeing other people’s ideas too. Curious—does anyone have suggestions for small models that are good for summaries?

Feel free to check it out or make changes: https://github.com/k-zehnder/gophersignal

tinco · 2025-01-22T11:05:48 1737543948

That's cool, I really like it. One piece of feedback: I am usually more interested in the HN comments than in the original article. If you'd include a link to the comments then I might switch to GopherSignal as a replacement for the HN frontpage.

My flow is generally: Look at the title and the amount of upvotes to decide if I'm interested in the article. Then view the comments to see if there's interesting discussion going on or if there's already someone adding essential context. Only then I'll decide if I want to read the article or not.

Of course no big deal if you're not interested in my patronage, just wanted to let you know your page already looks good enough for me to consider switching my most visited page to it if it weren't for this small detail. And maybe the upvote count.

computers3333 · 2025-01-22T11:38:50 1737545930

Hey, thanks a ton for the feedback! That was super helpful to hear about your flow—makes a lot of sense and it's pretty similar to how I browse HN too. I usually only dive into the article after checking out the upvotes and seeing what context the comments add.

I'll definitely add a link to the comments and the upvote count—gotta keep my tiny but mighty userbase (my mom, me, and hopefully you soon) happy, right? lol

And if there's even a chance you'd use GopherSignal as your daily driver, that's a no-brainer for me. Really appreciate you taking the time to share your ideas and help me improve.

computers3333 · 2025-01-23T07:56:44 1737619004

EDIT: Apologies for breaking things earlier while trying to fix it! I’ve been working on updating it and got the upvote count and comment link in there. Wondering what you think about these updates—appreciate any feedback! Thanks again for helping me improve it!

https://gophersignal.com

goodklopp · 2025-01-22T17:51:58 1737568318

I would love this feature. Regardless, what you have built is really cool

computers3333 · 2025-01-22T18:02:23 1737568943

Hey thanks a ton for checking out GopherSignal! From the feedback I’m getting, it seems like comments and upvotes are the secret sauce I’ve been missing—appreciate you helping me get that through my thick skull lol. The pressure’s on now—I’ll do my best to deliver.

sainib · 2025-01-22T12:11:17 1737547877

May be even rate each post on the comments activity level.

computers3333 · 2025-01-22T17:56:33 1737568593

Great call! That’s a really solid idea—using the LLMs to rate posts based on comment activity could totally work and would be fun.

Were you thinking something like a “DramaLlama,” deciding if it’s a slow day or a meltdown-worthy soap opera in the comments? Or maybe something more valuable, like an “Insight Index” that uses the LLM to analyze comments for links, explanations, or phrases that add context or insight—basically gauging how constructive or meaningful the discussion is?

I also saw an idea in another post on this thread about an LLM that constantly listens to conversations and declares a winner. That could be fun to adapt for spicier posts—like the LLM picking a “winner” in the comments. Make the argument GopherSignal official lol. If it helps bring in another user, I’m all in!

Appreciate the feedback.

sainib · 2025-01-22T12:08:35 1737547715

Agreed..great suggestions. Id consider switching as well.

tonymet · 2025-01-24T02:00:00 1737684000

can you install this into a discord? i volunteer to help. I've been wanting a text-based hackernews chat with alternative moderation.

computers3333 · 2025-01-25T21:31:30 1737840690

Hey, thanks for reaching out! The idea of integrating GopherSignal with Discord as a bot or feature is super cool, and I’d love to make that happen. I haven’t worked with Discord bots or automation before, so I’d definitely take you up on your offer to help out with that. If you want to connect, my email is kjzehnder3 [at] gmail [dot] com. Thank u!

jkmcf · 2025-01-25T02:57:17 1737773837

RSS plz?

computers3333 · 2025-01-25T21:09:23 1737839363

Hey, thanks for checking out GopherSignal! RSS is a great idea—I’ll be starting on it this weekend. Appreciate the suggestion!

deet · 2025-01-21T21:28:44 1737494924

We (avy.ai) are using models in that range to analyze computer activity on-device, in a privacy sensitive way, to help knowledge workers as they go about their day.

The local models do things ranging from cleaning up OCR, to summarizing meetings, to estimating the user's current goals and activity, to predicting search terms, to predicting queries and actions that, if run, would help the user accomplish their current task.

The capabilities of these tiny models have really surged recently. Even small vision models are becoming useful, especially if fine tuned.

bendews · 2025-01-22T10:55:50 1737543350

Is this along the lines of rewind.ai, MSCopilot, screenpipe, or something else entirely?

mettamage · 2025-01-21T20:17:43 1737490663

I simply use it to de-anonymize code that I typed in via Claude

Maybe should write a plugin for it (open source):

1. Put in all your work related questions in the plugin, an LLM will make it as an abstract question for you to preview and send it

2. And then get the answer with all the data back

E.g. df[“cookie_company_name”] becomes df[“a”] and back

sitkack · 2025-01-21T22:09:06 1737497346

So you are using a local small model to remove identifying information and make the question generic, which is then sent to a larger model? Is that understanding correct?

I think this would have some additional benefits of not confusing the larger model with facts it doesn't need to know about. My erasing information, you can allow its attention heads to focus on the pieces that matter.

Requires further study.

mettamage · 2025-01-22T07:50:13 1737532213

> So you are using a local small model to remove identifying information and make the question generic, which is then sent to a larger model? Is that understanding correct?

Yep that's it

sundarurfriend · 2025-01-22T03:51:12 1737517872

You're using it to anonymize your code, not de-anonymize someone's code. I was confused by your comment until I read the replies and realized that's what you meant to say.

kreyenborgi · 2025-01-22T07:51:01 1737532261

I read it the other way, their code contains eg fetch(url, pw:hunter123), and they're asking Claude anonymized questions like "implement handler for fetch(url, {pw:mycleartrxtpw})"

And then claude replies

fetch(url, {pw:mycleartrxtpw}).then(writething)

And then the local llm converts the placeholder mycleartrxtpw into hunter123 using its access to the real code

mettamage · 2025-01-22T14:55:13 1737557713

It's that yea

Flow would be:

1. Llama prompt: write a console log statement with my username and password: mettamage, superdupersecret

2. Claude prompt (edited by Llama): write a console log statement with my username and password: asdfhjk, sdjkfa

3. Claude replies: console.log('asdfhjk', 'sdjkfa')

4. Llama gets that input and replies to me: console.log('mettamage', 'superdupersecret')

sundarurfriend · 2025-01-22T13:04:58 1737551098

> Put in all your work related questions in the plugin, an LLM will make it as an abstract question for you to preview and send it

So the LLM does both the anonymization into placeholders and then later the replacing of the placeholders too. Calling the latter step de-anonymization is confusing though, it's "de-anonymizing" yourself to yourself. And the overall purpose of the plugin is to anonymize OP to Claude, so to me at least that makes the whole thing clearer.

mettamage · 2025-01-22T14:58:28 1737557908

I could've been a bit more clear, sorry about that.

politelemon · 2025-01-21T20:24:08 1737491048

Could you recommend a tiny language model I could try out locally?

mettamage · 2025-01-21T20:40:29 1737492029

Llama 3.2 has about 3.2b parameters. I have to admit, I use bigger ones like phi-4 (14.7b) and Llama 3.3 (70.6b) but I think Llama 3.2 could do de-anonimization and anonimization of code

RicoElectrico · 2025-01-21T21:21:27 1737494487

Llama 3.2 punches way above its weight. For general "language manipulation" tasks it's good enough - and it can be used on a CPU with acceptable speed.

seunosewa · 2025-01-21T22:17:05 1737497825

How many tokens/s?

iamnotagenius · 2025-01-22T12:58:28 1737550708

10-15t/s on 12400 with ddr5

OxfordOutlander · 2025-01-21T20:52:48 1737492768

+1 this idea. I do the same. Just do it locally using ollama, also using 3.2 3b

sauwan · 2025-01-21T22:19:29 1737497969

Are you using the model to create a key-value pair to find/replace and then reverse to reanonymize, or are you using its outputs directly? If the latter, is it fast enough and reliable enough?

jwitthuhn · 2025-01-22T01:45:27 1737510327

I've made a tiny ~1m parameter model that can generate random Magic the Gathering cards that is largely based on Karpathy's nanogpt with a few more features added on top.

I don't have a pre-trained model to share but you can make one yourself from the git repo, assuming you have an apple silicon mac.

https://github.com/jlwitthuhn/TCGGPT

ata_aman · 2025-01-21T23:08:27 1737500907

I have it running on a Raspberry Pi 5 for offline chat and RAG. I wrote this open-source code for it: https://github.com/persys-ai/persys

It also does RAG on apps there, like the music player, contacts app and to-do app. I can ask it to recommend similar artists to listen to based on my music library for example or ask it to quiz me on my PDF papers.

nejsjsjsbsb · 2025-01-22T02:34:41 1737513281

Does https://github.com/persys-ai/persys-server run on the rpi?

Is that design 3d printable? Or is that for paid users only.

ata_aman · 2025-01-22T03:41:58 1737517318

I can publish it no problem. I’ll create a new repo with instructions for the hardware with CAD files.

Designing a new one for the NVIDIA Orin Nano Super so it might take a few days.

nejsjsjsbsb · 2025-01-22T10:38:50 1737542330

Up to you! Totally understand if you want to hold something back for a paid option!

deivid · 2025-01-21T23:14:11 1737501251

Not sure it qualifies, but I've started building an Android app that wraps bergamot[0] (the firefox translation models) to have on-device translation without reliance on google.

Bergamot is already used inside firefox, but I wanted translation also outside the browser.

[0]: bergamot https://github.com/browsermt/bergamot-translator

deivid · 2025-01-22T05:10:17 1737522617

I would be very interested if someone is aware of any small/tiny models to perform OCR, so the app can translate pictures as well