I expect people to come to their senses when LLM companies stop subsidizing cost and start charging customers what it actually costs them to train and run these models.
People don't want to guess which sized model is right for a task and current systems are neither good or efficient at trying to estimate that automatically. I see only the power users tweaking more and more as performance plateaus and the average user only changing when it's automatic.
> 7,32b models work perfectly fine for a lot of things
Like what? People always talk about how amazing it is that they can run models on their own devices, but rarely mention what they actually use them for. For most use cases, small local models will always perform significantly worse than even the most inexpensive cloud models like Gemini Flash.
Gemma 3n E4B has been crazy good for me - fine tune running on Google Cloud Run via Ollama, completely avoiding token based pricing at the cost of throughput limitations
Take a computer screen with a full wash of R, G, or B. Sync the RGB display with your 2FA token, but run it at 15FPS instead of one code per minute.
Point the monitor at the wall, or desk, or whatever. Notice the radiosity and diffuse light scattering on the wall (and on the desk, and on the reflection on the pen cap, and on their pupils).
Now you can take a video that was purported to be taken at 1:23pm at $LOCATION and validate/reconstruct the expected "excess" RGB data and then compare to the observed excess RGB data.
What they say they've done as well is to not just embed a "trace" of expected RGB values at a time but also a data stream (eg: a 1FPS PNG) which kindof self-authenticates the previous second of video.
Obviously it's not RGB, but "noise" in the white channels, and not a PNG, but whatever other image compression they've figured works well for the purpose.
In the R, G, B case you can imagine that it's resistant (or durable through) most edits (eg: cuts, reordering), and it's interesting they're talking about detecting if someone has photoshopped in a vase full of flowers to the video (because they're also encoding a reference video/image in the "noise stream").
I'm not part of Oxide, but I think you're assuming everyone is okay with not controlling the hardware they run on.
There is plenty of addressable market where a public cloud or even a colocation of hardware doesn't make sense for whatever bespoke reason.
I don't think their target customer is startups, and that's okay. You likely aren't their target customer. But they've identified a customer profile, and want to provide a hardware + software experience that hasn't been available to that customer before.
It's in the ballpark if you include all energy source for the family.
100 rooms times, say, 50W (5kW) is 43,000kWh. That's over 10 UK families of 4-5 (4100kWh/yr) for electricity, or 2 if you include gas usage. So for Americans, it's probably must closer to parity.
The fridge does dump heat into the room, so it has a small additional penalty for the aircon in hot countries, but a small, but inefficient compared to a heat-pump, heating offset in cold countries.
As an American. I'd rather a single well established player get a large contract and actually deliver, than 20 disjointed companies each get 1/20th of the problem, have to work in concert, and possibly not even deliver at all.
There's margin (on BOM cost), and then there's profit margin (above the design cost).
Design costs are probably an order of magnitude higher in USA than in China, and can't be spread over hundreds of thousands of units. I'd bet the profit isn't that great.
How else does an LLM distinguish what is widely known, given there are no statistics collected on the general populations awareness of any given celebrities vices? Robo-apologetics in full force here.
But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau
reply