> A 4090 amortized over 4 years, working days & hours, is 20 cents per working h...

mlyle · on Sept 5, 2023

> It's still two orders of magnitude more expansive than any other SaaS business.

Really? Some SaaS businesses have users doing things that generate tens of thousands of IOs per user request across spinning storage, or even far more.

> ChatGPT is mostly being used by people who use it a few minutes per day, which is a nice place to be, but:

I think you basically completely misunderstood everything I said. Here, the point was that someone using it is generating tokens a very large proportion of the time they're sitting in front of the service compared to most use cases-- but it's still only like 20% of the time.

We all have a pretty good understanding of the tradeoffs between owning hardware vs. elastic usage of a utility. We know that "peek usage" [sic] is higher than average (which is why there's a duty cycle correction in the calculation in the first place).

> - when you start integrating LLMs in tools you use routinely (an IDE being the typical example, then the token generation amount skyrockets).

It all depends. The system I just built and deployed does not need to be immediately responsive to end-users (users can tolerate a delay of a couple of minutes), with a few thousand tokens per user per week, and usage smeared pretty well over a several hour per day window. There's a lot of reasons (beyond economics) why moving it to a consumer GPU is attractive, but it won't be happy with a 1B parameter model.

littlestymaar · on Sept 5, 2023

> "peek usage" [sic]

You are very smart indeed…

mlyle · on Sept 5, 2023

This whole subthread is based on you misreading the original assertion (from someone else) and being off by a couple of orders of magnitude-- then pretty badly misreading me.

There's plenty of reasons why firms will want to run this stuff on-prem, both for their own usage and as a service. It probably will not be the majority of usage or zero, but instead a noticeable small chunk.

Yes, it's more expensive than many things, but not anywhere close to the most expensive service that people choose to run on-prem. And you can still support a decent userbase from a few computers, depending upon what you're doing.