Hacker News new | past | comments | ask | show | jobs | submit | more fhdsgbbcaA's comments login

I’m sure there’s absolutely zero chance that Sam Altman would lie about that, especially now that he’s gutted all oversight and senior-level opposition.

I have to say showing you content from blocked channels is the most user hostile thing I encounter on a daily basis.

The contempt for one’s users is such a defining feature of this era of late-stage tech.


DoubleClick slowly killed Google search because the best way to make money in display ads is to run clickbait.

In the one hand, Google paid good quality websites more money for trash content and engagement bait than quality content. So they adapted to that new market reality.

Meanwhile, the real money maker - Search - gradually got filled up with lower quality content and now it’s imploding.

Google buying DoubleClick has a lot of parallels to what happened with Boeing.


Freelancers have existed since the dawn of journalism.

Even prestige publications like The New Yorker use freelancers. This is the same thing, it’s just lower brow content.

That’s not a fair comparison, The New Yorker has always had a different relationship with its writers. A freelancer who writes for The New Yorker is likely a highly respected journalist/author/other luminary. Their staff writers are, I believe, technically contractors as they’re not W2 employees.

Contractor-written slop at these content farms, as described by TFA, have nothing in common with how content works at The New Yorker.


The New Yorker gets high tier freelancers, other outlets get dogshit freelancers. It’s the same underlying model.

This is not at all the same thing. The New Yorker pays its freelancers. In the example in the article, the money is flowing from the content producer to the publisher, meaning it's an ad.

They have literally run “native ads” for a decade which are ads specifically designed to appear to be content from New Yorker writers.

https://www.marketingdive.com/news/the-new-yorker-jumps-into...


Also not good, but also not at all like freelancing. Freelancers are paid. Advertisers pay for placement.

You mentioned what is or isn’t an ad, my point is the distinction is a lot less clear than you think, and it always has been.

While it’s good more people understand the business of news, this is all out in the open and has been for years.


I’m super confused as to why this is worth a blog post, let alone the conspiratorial tone.

This seems to be a case of knowledge without context being a dangerous thing in the wrong hands.


Looks like LLM inference will follow the same path as Bitcoin: CPU -> GPU -> FPGA -> ASIC.

I really doubt it. Bitcoin mining is quite fixed, just massive amounts of SHA256. On the other hand, ASICs for accelerating matrix/tensor math are already around. LLM architecture is far from fixed and currently being figured out. I don't see an ASIC any time soon unless someone REALLY wants to put a specific model on a phone or something.

Google's TPU is an ASIC and performs competitively. Also Tesla and Meta is building something AFAIK.

Although I doubt you could get lot better as GPUs already have half the die area reserved for matrix multiplication.


It depends on your precise definition of ASIC. The FPGA thing here would be analogous to an MSIC where m = model.

It's clearly different to build a chip for a specific model than what a TPU is.

Maybe we'll start seeing MSICs soon.


LLMs and many other models spend 99% of the FLOPs in matrix multiplication. And TPU initially had just single operation i.e. multiply matrix. Even if the MSIC is 100x better than GPU in other operations, it would just be 1% faster overall.

You can still optimize various layers of memory for a specific model, make it all 8 bit or 4 bit or whatever you want, maybe burn in a specific activation function, all kinds of stuff.

No chance you'd only get 1% speedup on a chip designed for a specific model.


Apple has Neural Engine and it really speeds up many CoreML models - if most operators are implemented in NPU inference will be significantly faster than on GPU on my Macbook m2 max (and they have similar NPU as those in e.g. iPhone 13). Those ASIC NPU just implements many typical low level operators used in most ML models.

99% of the time is spent on matrix matrix or matrix vector calculation. Activation functions, softmax, RoPE, etc basically cost nothing in comparison.

Most NPUs are programmable, because the bottleneck is data SRAM and memory bandwidth instead of instruction SRAM.

For classic matrix matrix multiplication, the SRAM bottleneck is the number of matrix outputs you can store in SRAM. N rows and M columns get you N X M accumulator outputs. The calculation of the dot product can be split into separate steps without losing the N X M scaling, so the SRAM consumed by the row and column vectors is insignificant in the limit.

For the MLP layers in the unbatched case, the bottleneck lies in the memory bandwidth needed to load the model parameters. The problem is therefore how fast your DDR, GDDR, HBM memory and your NoC/system bus lets you transfer data to the NPU.

Having a programmable processor that controls the matrix multiplication function unit costs you silicon area for the instruction SRAM. For matrix vector multiplication, the memory bottleneck is so big, it doesn't matter what architecture you are using, even CPUs are fast enough. There is no demand for getting rid of the not very costly instruction SRAM.

"but what about the area taken up by the processor itself?"

HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA. Nice joke

Wait..., you were serious? The area taken up by an in order VLIW/TTA processor is so insignificant I jammed it in-between the routing gap of two SRAM blocks. Sure, the matrix multiplication unit might take up some space, bit decoding instructions is such an insignificant cost that anyone opposing programmability must have completely different goals and priorities than LLMs or machine learning.


As far as I understand, the main issue for LLM inference is memory bandwidth and capacity. Tensor cores are already an ASIC for matmul, and they idle half the time waiting on memory.

You forgot to place "vertically-integrated unobtanium" after ASIC.

Soooo.... TPUv4?

Yes, but the kinds that aren't on the market.

LLM inference is a small task built into some other program you are running, right? Like an office suite with some sentence suggestion feature, probably a good use for an LLM, would be… mostly office suite, with a little LLM inference sprinkled in.

So, the “ASIC” here is probably the CPU with, like, slightly better vector extensions. AVX1024-FP16 or something, haha.


would be… mostly office suite, with a little LLM inference sprinkled in.

No, it would be LLM inference with a little bit of an office suite sprinkled in.


Yeah the most confusing thing to me is how the hell code that bad ever got so big.

It’s literally my go-to example of why the MVC design pattern is so good for web development. There is basically no view/controller separation - you can change core behavior of the backend in a template.


Simple: Programmers think code quality matters.

In the real world, nobody gives a darn about your code quality. Zero interest. It just needs to work, keep working, and that’s it.

I assure you probably >90% of code in deployment right now doesn’t even have CI/CD.


WordPress has always had a phenomenal admin editing experience relative to what else existed. That on top of a dogmatic adherence to backwards compatibility made the free (and paid) theme/plugin ecosystem thrive.

Hey now, he spent at least 15 hours a week telling various groups of people at OpenAI petty lies in between dinners with Saudi Princes and sundry well-heeled, gullible, low-lifes.

That time is worth money!


He does strike me as the kind of guy who would eat Fido in one gulp if he saw any monetary advantage in it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: