More

nighthawk454 · 2025-09-26T03:43:31 1758858211

Yeah, but it’s ’quantization aware’ during training too, which presumably is what allows the quantization at inference to work

nighthawk454 · 2025-09-21T18:21:21 1758478881

Recently someone shared a method for encrypting the timestamp portion as well:

https://news.ycombinator.com/item?id=45275973

nighthawk454 · 2025-09-10T01:08:56 1757466536

Also discussed at:

https://news.ycombinator.com/item?id=45070019

nighthawk454 · 2025-09-03T18:04:46 1756922686

How To Make Everything on YouTube is along those lines

https://youtube.com/@htme

nighthawk454 · 2025-08-31T22:58:02 1756681082

That’s Leland McInnes - author of UMAP, the widely-used dimension reduction tool

Lerc · 2025-08-31T23:11:32 1756681892

I know, I mentioned his name in a post last week, Figured doing so again might seem a bit fanboy-ish. I am kind-of a fan but mostly a fan of good explanations. He's just self-selecting for the group.

nighthawk454 · 2025-08-29T04:47:57 1756442877

Another fun one :)

    import antigravity

nighthawk454 · 2025-08-29T02:19:33 1756433973

I think a lot of it is things have shifted away from the raw language. Less and less you’re dealing with Python, and more an assortment of libraries or frameworks. Pandas, numpy, torch, fastapi, …, and a dozen others.

Packaging has been a nightmare. PyPI has had its challenges. Dependency management is vastly improved thanks to uv - recently, and with a graveyard of tools in its wake.

The modern Python script feels more like loosely combining a temperamental set of today’s latest library apis, and then dealing with the fallout. Sometimes parallels the Node experience.

I think an actual Python project - using only something remotely modern like 3.2+ standard library and maybe requests - is probably just as clean, resilient, and reliable as it ever was.

A lot of these things are and/or have been improving tremendously. But think to your point the language (or really the ecosystem) is scaling and evolving a ton and there’s growing pains.

kev009 · 2025-08-29T03:08:22 1756436902

I can see that. A little while ago I was working at a startup and we had a node.js thing that was really crucial to the business that did some gray hat browser automation stuff to scrape TikTok (the users opted into it, but TikTok itself was less permissive). For some reason a person wanted to move part of it to Python to orthogonally solve some other actual problem. They passed me the code and Pandas was there to effectively do an HTTP request and parse JSON and I thought to myself "woah, I'm not in Kansas anymore" -- I ended up not having to worry about it because I ported the idea back to the more mature node system and that turned out to be viable over time.

Libraries can overtake aspects of a language for better and worse. Ruby seemed really tied to Rails and that was great for it as an example.

nighthawk454 · 2025-08-29T04:20:42 1756441242

Ha, Pandas just to parse a website is a bit extra, I’d say. But yeah, it’s weird that you need libraries and api endpoints to do basic tasks these days.

It feels like something broke around 2015-ish. Going back, you could make a whole app and gui with Basic. You could make whole websites simply with HTML+PHP, sometimes using nothing but Notepad. You could make portable apps in Java with no libraries - even Swing or whatever was built in.

Now…? Electron, a few languages, a few frameworks, and a few dozen libraries. Just to start.

Bizzare.

WD-42 · 2025-08-29T05:54:11 1756446851

Everyone wants to be a programmer but nobody wants to write any damn code

— Ronnie Coleman

nighthawk454 · 2025-08-10T02:09:27 1754791767

Essentially, when you buy in bulk you trade upfront commitment for a discounted price. Which isn’t a good deal unless you’re confident you’re going to use all the units/seats you bought.

This is the same logic as over buying at the grocery store. The unit cost of bulk items may be less, but if the surplus is just gonna spoil you’ve wasted money in the difference.

1.0 * N * discount_rate * price <= certainty * N * 1.0 * price

—> discount_rate / certainty <= 1.0

—> discount_rate <= certainty

In the event your confidence/usage is lower than the discounted rate - say discounted to 80% of sticker price but you expect 60% utilization - this might suggest you buy 60% of your capacity at the bulk rate and fill any further demand with on-demand full-price option.

jerryjappinen · 2025-08-10T10:06:30 1754820390

I saw Unsure Calculator on HN some time ago. Seems like the perfect use case. https://filiph.github.io/unsure/

nighthawk454 · 2025-06-28T02:17:31 1751077051

You can even still mail a brick! As long as it’s addressed. Apparently other things too including flip flops, inflated beach balls, and potatoes.

They do call out that you can no longer mail enough to build a building haha.

https://facts.usps.com/sending-bricks-in-the-mail/#:~:text=I...

nighthawk454 · 2025-06-26T19:36:51 1750966611

Seems to be a trend away from mean-pooling into a single embedding. But instead of dealing with an embedding per token (lots) you still want to reduce it some. This method seems to cluster token embeddings by random partitioning, mean pool for each partition, and concatenate the resulting into a fixed-length final embedding.

Essentially, full multi vector comparison is challenging performance wise. Tools and performance for single vectors are much better. To compromise, cluster into k chunks and concatenate. Then you can do k-vector comparison at once with single-vector tooling and performance.

Ultimately the fixed length vector comes from having a fixed number of partitions, so this is kind of just k-means style clustering of the token level embeddings.

Presumably a dynamic clustering of the tokens could be even better, though that would leave you with a variable number of embeddings per document.