float8 got a mention! x2 more FLOPs! Also xformers has 2:4 sparsity support now so another x2? Is Llama3 gonna use like float8 + 2:4 sparsity for the MLP, so 4x H100 float16 FLOPs? Pytorch has fp8 experimental support, whilst attention is still complex to do in float8 due to precision issues, so maybe attention is in float16, and RoPE / layernorms in float16 / float32, whilst everything else is float8?
I was thinking why is this one guy on HN so deeply interested and discussing technical details from a minor remark. Then I clocked the name. Great work on Gemma bugs
A LUT could be a significant performance penalty would it not? Instead of a float8 (potentially multiple in simd case) in a register, you’re now having to head out to at least L1 cache to dereference the value in the LUT.
Plain uint8 wouldn’t allow for the same accuracy range as float8 and it’s the accuracy not the precision (which uint would win for the largest values it can represent) that counts most.
Oh oh was just gonna comment as well, but saw this! I think x86 has like pshufb for LUTs (used them like ages ago, but forgot now :() I think also some game (was it Spiderman) used loads of lookup tables.
The issue with LUTs is don't you have to update the LUT itself? You can select which memory address to load up, but the LUT itself has to be differentiable maybe? TBH I'm not an expert on LUTs.
On fixed point - similarly ye you have to fix the precision ranges as well, so again I'm unsure on how one changes the fixed point numbers over time. I'll have to read more on fixed point.
Maybe 1.58bit using (-1, 0, 1) which gets rid of multiplications and just additions might be more useful, although you'll only get a 2x FLOP boost since you still need fp8 or fp16 addition.
Nope. Moreover, simulating it even with AVX-512 is quite an experience. Been postponing it for 2 years now... But first of all, you need to choose the version of float8 you want to implement, as the standards differ between GPU vendors.
Well, those smaller floats require less BW to transfer back and forth as well. Perhaps not a reduction linear in the size of the float, as maybe smaller floats require more iterations and/or more nodes in the model graph to get an equivalent result.
But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!
The impact on bandwidth is the main reason smaller is better I belive, certainly when it's the bottleneck. I'm only really familiar with CPU but with say FP16 you might convert back to FP32 when you're doing the actual multiplication (so conversion plus multiplication is actually slower) but because you're moving half the data in and off you still get a huge speedup.
I can't remember some research paper somewhere even if you do float32 multiplications, but keep the data in bfloat16 by just simply truncating the lower mantissa bits, and doing packing, you still get speedups, since matrix multiplication is bound both by compute and cache access. If you can optimize on the cache side of things, speedups are definitely there.
I'm not sure exactly on how NVIDIA calculates FLOPs, but I do know for Intel's FLOPs, it's calculated from how many FMA units, how many loads can be done in tandem, and what the throughput is. And ye fp8 requires 2x less space. Sparse 2:4 might be less pronounced, since the matrix first needs to be constructed on the fly, and there is like a small matrix of indicator values.
Oh so float8's L2 Norm from float32 is around I think 1e-4, whilst float16 is 1e-6. Sadly attention is quite sensitive. There are some hybrid methods which just before the attention kernel which is done in fp8, upcasts the Q and K from the RoPE kernel to become float16, then also leaves V to be in float8. Everything is done in fp8 on the fly, and the output is fp8. This makes errors go to 1e-6.
Yes, but it's a bit more complicated. There are 2 FP8 formats: E5M2 and E4M3.
E5M2 is like an IEEE 754. But to compensate the smaller exponent, "E4M3’s dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs".
Some people reported E4M3 is better for the forward pass (small range, more precision) and E5M2 is better for the backward pass (bigger range, less precision). And most implementations have some sort of scaling or other math tricks to shrink the error.
Fair points! Ye Pytorch's fp8 experimental support does scaling of the gradients. Interesting point on a larger range for the forward pass, and a small range for the gradients! I did not know that - so learnt something today!! Thanks! I'll definitely read that paper!
Having lived through the dot-com era, I find the AI-era slightly dispiriting because of the sheer capital cost of training models. At the start of the dot-com era, anyone could spin up an e-commerce site with relatively little infrastructure costs. Now, it seems, only the hyper-scale companies can build these AI models. Meta, Google, Microsoft, Open-AI, etc.
I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today
Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time
> I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today
To make your own competing LLM today you need hundreds of millions of dollars, the "very expensive" of this is on a whole different level. You could afford the things you talked about on a software engineering salary, it would be a lot of money for that engineer but at least he could do it, no way anyone but a billionaire could fund a new competing LLM today.
I think the foundation models are a commodity, anyway. The bulk of the economic value, as usual, will be realized at the application layer. Building apps that use LLMs, including fine-tuning them for particular purposes, is well within reach even of indie/solo devs.
That’s why Sam Altman makes so much noise about “safety” - OpenAI would really like a government-backed monopoly position so they can charge higher rents and capture more of that value for themselves. Fortunately, I think that llama has already left the barn.
I think openai/anthropic/etc are banking on foundation models being the equivalent of the "datacenters" or AWS-equivalents of AI - there'll be PaaSes (eg replicate), and most businesses will just pay the "rent"
Only if you're creating a foundation model. The equivalent would be competing with a well-funded Amazon, back in 1999. You can compete in building LLM-powered products with much, much less money - less than a regular web app in 99
If the government can stay back far enough that more than one AI company can train their models, it will end up working like steel mills - barely enough profit to pay the massive cost of capital due to competition. If the government regulates the industry into a monopoly, all bets are off. Their investors are going to push hard for shutting the door behind them so watch out.
The only question is - what tactic? I don't really know, but one trick I am aware of is "specifying to the vendor." In other words, the introduction of regulatory requirements that are at every step in the process a description of the most favored vendor's product. As the favored players add more features, potentially safety features, those features are required in new regulations, using very specific descriptions that more or less mandate that you reproduce the existing technology, to use a software engineer's term, bug-for-bug. If your product is better in some ways but worse in others, you might have a chance in the market - but to no avail, if the regulations demand exactly the advantages of the established suppliers.
They were on top for a while, but later fell behind because they didn't invest. There were heavy tarrifs in place to "protect" the monopoly from foreign competition.
If privately arising monopolies could only be kept from buying out their regulators, they'd privately break down before they became too odious... for example Google, which for years was the only remotely good search engine, is now merely one of the better search engines. If there had been a "department of consumer information safety," staffed by the best industry professionals status can buy, that might have not happened.
This is typically called a high fixed cost business, like airlines, hotels/apartments, SpaceX, etc.
The dream may be barriers to entry that allow high margins (“rents” if you prefer the prejudicial), but all too often these huge capital costs bankrupt the company and lose money for investors (see: WeWork, Magic Leap). It is high risk, high return. Which seems fair.
Nothing in the WeWork business model is inherently capital intensive. Fundamentally they just take out long-term office leases at low rates, then sublease the space to short-term tenants at higher rates. They don't really own major assets and have no significant IP.
I understand the economics concept. I'm not sure WeWork was a great example it had significant other challenges such as a self-dealing founder and, frankly, a poor long term model.
I would wager that the concept needs a bit of a refresh as historically it has referred to high capital costs for the production of a hard good though in this case there is more than just a good produced theres a fair bit of influence and power associated with the good and a ton of downstream businesses that are reliant upon it if it goes according to plan.
Agreed, and Magic Leap had its own problems. My point was just that “invest huge amounts of capital to create a moat and then monetize in the long run”
Is an inherently risky strategy. Business would not work if society insisted that large, high risk investments could not product higher long term margins than less risky investments.
It's more like "disrupting the market". The problem is that it's a whole market.
Uber just now turned its first profit since 2009, and I would wager that if not for the newly found appreciation of efficiency and austerity, it would still be burning through money like a drunken socialist sailor.
Classic approach required basic math. "Here is my investment, here is what I am going to charge for rent". You actually can figure out when your investment starts paying off.
This new "model" requires tall, loud, truth-massaging founders to "charm" VCs into giving away billions, with the promise of trillions, I guess. The founders do talk about conquering the world, like, a lot.
I do not know what the WeWork investors were thinking when they expected standard real estate to "10x" their money while the tenants were drinking free beer on tap. The whole thing screamed "scam" even to a lay-person.
So far it's been pretty "democratic" - I feel in no way disadvantaged because I can't train a foundation model myself. Actually the ecosystem is a lot better than 25 years ago - there are open source (or source available) versions of basically everything you'd want to participate in modern AI/ML.
I too went through the dot com era: as in when Sun Microsystems had the tag line "we are the dot in dot com".
I assure you that before Apache and Linux took over that "dot" in the .com was not cheap!
Fortunately it only really lasted maybe 1993-1997 (I think Oracle announced Linux support in 1997, and that allowed a bunch of companies to start moving off Solaris).
But it wasn't until after the 2001 crash that people started doing sharded MySQL and then NoSQL to scale databases (when you needed it back then!).
It's early. You can do LORA training now on home systems, and for $500 you can rent enough compute to do even more meaningful fine-tuning. Lets see where we are in 5 and 10 years time.
(Provided the doomers don't get LLMs banned of course!)
Another way to compete with the big tech incumbents is instead of hardware, try maths and software hacks to level the playing field! Training models is still black magic, so making it faster on the software side can solve the capital cost issue somewhat!
That's labour and human capital intensive, not capital intensive. And I don't mean this as a technically correct nitpick: in terms of economics it's more accurate to call it the exact opposite of capital intensive.
That’s a good point, I wanted to make the point that doing the research is also incredibly expensive because it requires some of the smartest people around, and the right background (and what even is that background?)
Ye not a bad point - also agree with djhn on stuff.
It's true it'll still be relatively expensive - but I would propose its relatively inexpensive if people want to make it faster, and have the drive to do it :) On the other hand, capital expenditures requires large amounts of money, which also works.
I guess some general CUDA, some maths, knowing how to code transformers from scratch, some Operating systems and hardware knowledge, and the constant drive to read new research papers + wanting to make things better.
I just think as humans, if you have drive, we can do it no matter the contraints!
Yes, I agree with the general idea that it's not easy. Yet at least to some extent it might allow people and/or nations with (some degree of, relative) lack of capital but high levels of education and innovation to benefit and catch up.
I find the market way more open and competitive than dot-com. Everyone is throwing up a chatbot or RAG solution. There are tradesmen and secretaries and infinite 19 year olds who are now able to wire together a no-code app or low-code bot and add value to real businesses. The hyper scalars are making some money but absolutely don't have this locked up. Any Groq or Mistral could wander in and eat their lunch, and we haven't really started the race yet. The next decade will be ridiculous.
Could not have said it better. Nobody has won the race yet and things are getting better. Building a foundation model is not cheap but not out of reach still for a startup.
We will probably get there, it's just going to take time for hardware supply chains to catch up. I feel it's more comparable to mainframe eras - it took time for general purpose computing to become commoditised.
I know we won't get it this from FB, but I'd be really interested to see how the relationship of compute power to engineering hours scales.
They mention custom building as much as they can. If FB magically has the option to 10x the compute power, would they need to re-engineer the whole stack? What about 100x? Is each of these re-writes just a re-write, or is it a whole order of magnitude more complex?
My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?
I'm not 100% sure but I would.make an educated guess that that cluster in the first image for example is a sample of scalable clusters, so throwing more hardware at it could bring improvements but sooner or later the cost to improvements will call for an optimization or rewrite as you call it, so a bit of both usually. It seems a bit of a balancing act really!
The cost of training quickly outpaces the cost of development as context length increases. So hardware is cheap until it isn't anymore, by orders of magnitude.
But there is still significant cost in the physical buildouts of new pods/DCs, whatever and the human engineering hours to physically build, even though its a mix of resources across the vendors and FB? - it still would be interesting to know man hours into the physical build of the HW.
So, I'd love to work on optimizing pipelines like this. How does one "get into" it? It seems a ML scientist with some C/C++ and infra knowledge just dips down into the system when required? Or is it CUDA/SIMD experts who move "up" into ML?
I know someone who works on this in Meta. His resume is computer science heavy, with a masters in Machine Learning. On the previous experience side, before getting into Meta, he had about a decade working as a Software Engineer with Machine Learning system in multiple languages, such as Go, C++ and Python.
To get the job he applied for a spot I'm Software Engineer applied in Machine Learning, he went through the multiple step interview process, and then when he got the job he did a few weeks of training and interviewing teams. One of the teams in charge of optimizing ML code in Meta picked him up and now he works there.
Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.
In sum:
- Get a formal education in the area
- Get work experience somewhere
- Apply for a big tech job in Software Engineer applied with ML
- Hope they hire you and have a spot in one of the teams in charge of optimizing stuff
This is helpful thank you. There's always some luck.
I have a PhD in CS, and lots of experience in optimization and some in throughput/speedups (in an amdahl sense) for planning problems. My biggest challenge is really getting something meaty with high constraints or large compute requirements. By the time I get a pipeline set up it's good enough and we move on. So it's tough to build up that skillset to get in the door where the big problems are.
A lot of the optimisation at this level is getting data into the right place at the right time, without killing the network.
Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.
So you need a good scheduler, that understand dependencies (no, the k8s scheduler(s) are shit for this, plus it wont scale past 1k nodes without eating all of your network bandwidth), then you need a dataloader that can provide the dataset access, then you need the IPC that allows sharing/joining of GPUs together.
all of that needs to be wrapped up into a python interface that fairly simple to use.
Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.
can you say more about the network issues with thousands of k8s nodes? I'm regularly running 2-3000 nodes in a GKE cluster, majority have GPUs, is this something I need to be worrying about?
Only if you are paying for the network bandwidth. for example if there are nodes spanning more than one zone, and you pay for that traffic, you might want to think about moving stuff to a single zone.
For other settings, moving to something like opencue might be better (caveats apply)
at the end of the day, you are still moving, storing and manipulating 1's and 0's, whether you are a front end engineer or a backend engineer or systems engieer or an ML engineer or an infra engineer
well at least I fit my resume to match the 'job description' because at the end of the day it's all hallucinations and 'real' software engineers that has core computer science skills can literally do anything
Our group works on some of this stuff at Meta, and we have a pretty good diversity of backgrounds - high performance computing (the bulk), computer systems, compilers, ML engineers, etc. We are hiring.
Significantly more than that; MFN pricing for NVIDIA DGX H100 (which has been getting priority supply allocation, so many have been suckered into buying them in order to get fast delivery) is ~$309k, while a basically equivalent HGX H100 system is ~$250k, coming to a price per GPU at the full server level being ~$31.5k. With Meta’s custom OCP systems integrating the SXM baseboards from NVIDIA, my guess is that their cost per GPU would be in the ~$23-$25k range.
It is a real loophole in the economy. If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.
The thing is, this could be considered basic research, right? Basic research IS setting money on fire until (and if) that basic research turns into TCP/IP, Ethernet and the Internet.
Funnily enough Arpanet and all that Xerox stuff were like <$50 million (inflation adjusted!) total. Some real forward thinkers were able to work the system by breaking off a tiny pittance of a much larger budget.
Where as I think this more appropriately can be considered the meta PR budget. They simply can't not spend it, would look bad for Wall Street. Have to keep up with the herd.
Funny you pick a company that has very little to answer to the markets, out of all the large tech companies, META is the rare one that does not need to answer because Zuckerberg controls the company.
> Funnily enough Arpanet and all that Xerox stuff were like <$50 million (inflation adjusted!) total.
That doesn't say much. The industry was in utter infancy. How much do you think it cost to move Ethernet from 100Mbit/sec to 1GBbit/sec to 10GB to 100GB to 400GB to 800GB? At least one or two orders of magnitude.
How about the cost to build a fab for the Intel 8088 versus a fab that produces 5nm chips running @ 5GHz. Again, at least one or two orders of magnitude.
This suffers from hindsight bias, at the time it was impossible to know if Arpanet or flying cars was the path forward. A better comparison would be the total sum of investment : payoff ratio, and is not something we can see from where we are now. Only in the future does it make sense to evaluate the success of something. Unfortunately, comparison between eras is difficult to do fairly because conditions are so different between now and Xerox.
> If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.
You don’t think earning increasing amounts of tens of billions of dollars in net income per year at some of the highest profit margins in the world at that size for 10+ years has anything to do with market cap?
$1T Market Cap lets it be known it will invest $10B a year into $current-hype that will change everything. P/E loosens speculatively on sudden new unbounded potential, Market Cap $1.1T. Hype funded. PR as innovator cemented.
Which is a fourth of what they spent in VR/AR in a year. And Gen AI is something they could easily get more revenue as it has now become proven technology, and Meta could possibly leapfrog others because of the data moat.
Proven technology, maybe, but proven product-market fit for the kinds of things Facebook is using it for? Their linked blog about AI features gives examples "AI stickers" and image editing... cool, but are these potential multi-billion dollar lifts to their existing business? I guess I'm skeptical it's worthwhile unless they're able to unseat ChatGPT with a market-leading general purpose assistant.
I have a few group chats just that devolve into hours of sending stickers or image generation back and forth, lately we've been "writing a book together" with @Meta AI as the ghost writer, and while it utterly sucks, its been a hilarious shared experience.
I don't think anyone else has gotten that group chat with AI thing so nailed.
On the podcast TrashFuture, November Kelly recently described AI systems as “garbage dispensers” which is both a funny image (why would anyone make a garbage dispenser??) and an apt description. Certainly these tools have some utility, but there are a load of startups claiming to “democratize creativity” by allowing anyone to publish AI generated slop to major platforms. On the podcast this phrase was used during discussion of a website which lets you create AI generated music and push it to Spotify, a move which Spotify originally pushed back on but has now embraced. Garbage dispenser indeed.
> unseat ChatGPT with a market-leading general purpose assistant.
It's not impossible. The prediction from many(not that I believe it) is that over long run modelling tricks would become common knowledge and only thing that matters is compute and data, both of which Meta has.
Also there could be a trend of LLMs for ads or feed recommendation in the future as they has large completely unstructured dataset per user across multiple sites.
Compute, data, and most importantly distribution/users.
IMO standalone AI companies like OpenAI might be successful by providing infrastructure to other companies, but I can’t imagine ChatGPT remaining #1 many years from now.
The web is still trending towards being a walled garden. Maybe not right now, but long term I think people will use whatever AI is most convenient which probably will be AI built into a giant company with established user base (FB, GOOG, MSFT, and Apple if they ever get around to launching - would love Siri 2.0 if it meant not needing to open the ChatGPT iOS app)
What moat exactly? Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well. I'm not even sure they can use historical data.
Meta certainly has an edge in engineer count, undoubtedly. But I'd say they really, really want the metaverse to succeed more to have their on walled garden (i.e. equivalent power of Apple and Google stores, etc.). There's a reason they gave a hard pass to a Google partnership.
> There's a reason they gave a hard pass to a Google partnership.
AIUI, Google required Meta to basically cede control of a partnered OS to them:
"After years of not focusing on VR or doing anything to support our work in the space, Google has been pitching AndroidXR to partners and suggesting, incredibly, that WE are the ones threatening to fragment the ecosystem when they are the ones who plan to do exactly that.
"We would love to partner with them. They could bring their apps to Quest today! They could bring the Play store (with its current economics for 2d apps) and add value to all their developers immediately, which is exactly the kind of open app ecosystem we want to see. We would be thrilled to have them. It would be a win for their developers and all consumers and we’ll keep pushing for it.
"Instead, they want us to agree to restrictive terms that require us to give up our freedom to innovate and build better experiences for people and developers—we’ve seen this play out before and we think we can do better this time around."
I think the raw text inside Facebook groups is at least as valuable as Reddit data. Even if demographics data is restricted under European law, the raw text of people interacting is quite valuable.
That ignores all the user groups that are on Facebook. From apartment communities aka Nextdoor to grief support counseling to the mindfulness therapy groups, there’s a wealth of user comments a tad bit higher than Uncle John’s racist rants.
facebooks downfall will be their lock in. every other social media platform lets you view a public profile, discussion groups etc. it's all locked inside facebook.
> Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well.
Source would be appreciated, because this is opposite of obvious. Regulations against using public first party would be a big news and I haven't heard of anything like that. They use my data for recommending feed so why not for answering my question?
First party data alone can't tell you whether an ad resulted in a sale, unless you own the entire process on your platform. Contrast this with what Apple has via its app store; the fees do more than generate money.
It’s often forgotten now, but just a few years NVidia was cancelled production batches and writing down inventory when the GPU shortage cleared. No one needed more GPUs. It also happens to be when Meta first announced they were going to increase CapEx spending on compute.
I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.
I don’t think it was that nobody needed GPUs. It was that nvidia thought they could get scalper margins by restricting supply after the shortage showed people were willing to pay scalper prices.
Semi analysis posted recently noting that Meta locked in these purchases a while ago; something like a year or more. So they probably didn’t pay today’s spot rate.
I think it’s always useful to pay attention to the history on stuff like this and it’s a rare pleasure to be able to give some pointers in the literature along with some color to those interested from first-hand experience.
I’d point the interested at the DLRM paper [1]: that was just after I left and I’m sad I missed it. FB got into disagg racks and SDN and stuff fairly early, and we already had half-U dual-socket SKUs with the SSD and (increasingly) even DRAM elsewhere in the rack in 2018, but we were doing huge NNs for recommenders and rankers even for then. I don’t know if this is considered proprietary so I’ll play it safe and just say that a click-prediction model on IG Stories in 2018 was on the order of a modest but real LLM today (at FP32!).
The crazy part is they were HOGWILD trained on Intel AVX-2, which is just wild to think about. When I was screwing around with CUDA kernels we were time sharing NVIDIA dev boxes, typically 2-4 people doing CUDA were splitting up a single card as late as maybe 2016. I was managing what was called “IGML Infra” when I left and was on a first-name basis with the next-gen hardware people and any NVIDIA deal was still so closely guarded I didn’t hear more than rumors about GPUs for training let alone inference.
350k Hopper this year, Jesus. Say what you want about Meta but don’t say they can’t pour concrete and design SKUs on a dime: best damned infrastructure folks in the game pound-for-pound to this day.
The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.
FB does not have the flywheel of running data centres - all three of those mentioned run hyper scale datacentres that they can then juice by “investing” billions in AI companies who then turn around and put those billions as revenue in the investors
OpenAI takes money from MSFT and buys Azure services
Anthropic takes Amazon money and buys AWS services (as do many robotics etc)
I am fairly sure it’s not illegal but it’s definitely low quality revenue
How is it free equity? Spending money to invest it somewhere involves risks. You might recover some of it if the investment is valued by others, but there is no guarantee.
You do not need cash in hands to invest. Instead, you print your own money (AWS credit) and use that to drive up the valuation, because this money costs you nothing today.
It might cost tomorrow though, when the company starts to use your services. However depending the deal structure they might not use all the credit, go belly up before credit is used or bought up by someone with real cash.
Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity. Meta could absolutely do the same, but in the short term, I think they find using that capacity more valuable than selling it.
> Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity.
This is a myth. It simply isn't true. AWS was conceived as a greenfield business by its first CEO. Besides, S3 and SQS were the first AWS services; EC2 didn't appear till a few years later. And it wasn't built from excess Amazon server capacity; it was totally separate.
Unless you've worked at Amazon, Microsoft, Google, and Facebook, or a whole bunch of datacenter providers, I'm not sure how you could make that claim. They don't really share that information freely, even in their stock reports.
Heck I worked at Amazon and even then I couldn't tell you the total datacenter space, they don't even share it internally.
This would be an interesting dataset to use for trading decisions (or sell to hedge funds).
But I wonder how much of their infrastructure is publicly mappable, compared to just the part of it that's exposed to the edge. (Can you map some internal instances in a VPC?)
That said, I'm sure there are a lot of side channels in the provisioning APIs, certificate logs, and other metadata that could paint a decently accurate picture of cloud sizes. It might not cover everything but it'd be good enough to track and measure a gradual expansion of capacity.
Then you should be aware that, for the longest time, Google was against multiple floors, until they suddenly switched to four floors in many locations:
A decade ago, there was a burst in construction and in some places the bottleneck was not getting the machines or electricity, but how fast they could deliver and pour cement, even working overnight.
To date, facebook has built, or is building, 47,100,000 sq ft of space, totaling nearly $24bn in investment. Based on available/disclosed power numbers and extrapolating per sqft, I get something like 4770MW.
Last I updated my spreadsheet in 2019, Google had $17bn in investments across their datacenters, totaling 13,260,000 sq ft of datacenter space. Additional buildings have been built since then, but not to the scale of an additional 30mil sq ft.
Amazon operates ~80 datacenter buildings in Northern Virginia, each ~200,000 sq ft -- about 16,000,000sq ft total in that region, the other regions are much much smaller, perhaps another 4 mil sq ft. When I'm bored I'll go update all my maps and spreadsheets.
Does the square footage take into account multiple floors? What's the source? It can be misleading, because you don't know the compute density of what's inside. Using just public data, power is a more accurate proxy. Until at least 5-6 years ago, Google was procuring more electricity than Amazon. Before that, it had a further advantage from lower PUE, but I bet the big names are all comparable on that front by now. Anyone that has worked at several of them can infer that FB is not the largest (but it's still huge).
As for the dollars, were they just in 2019 or cumulative? The Google ones seem low compared to numbers from earnings.
Google certainly has more compute density than Amazon, the numbers I was able to find from the local power company was 250MW at Council Bluffs back in 2015 or so.
Amazon builds out 32MW shells, and the most utilized as of 5 or 6 years ago was 24MW or so, with most being much less than that.
At this point Power Companies (ala PG&E, etc) should be investing in AI companies in a big way. THen they make money off the AI companies to build out power infra - and vice versa.
I am surprised we havent heard about private electrical grid built out by such companies.
Surely they all have some owned power generation, but then if they do, the local areas where they DO build out power plants - they should have to build capacity for the local area, mayhaps in exchange for the normal tax subsidies they seek for all these large capital projects.
Cant wait until we pods/clusters in orbit. With radioisotope batteries to power them along with the panels. (I wonder how close to a node a RI battery can be? Can each node have its own RI?) (sas they can produce upto "several KW" -- but I cant find a reliable source for max wattage of an RI...)
SpaceX should build an ISS module thats an AI DC cluster.
And have all the ISS technologies build its LLM there based on all the data they create?
I updated my map for AWS in Northern Virginia -- came up with 74 buildings (another source says 76, so i'll call it directionally correct). If I scale my sq ft by ~5% to account for missing buildings, we get 11,500,000sq ft in the northern virginia area for AWS.
Yeah, Google buys servers in public datacenters like those from Equinix. One "region" needn't be one datacenter, and sometimes AWS and GCP will even have computers in the same facility. It's actually quite annoying that "region" is such an opaque construct and they don't have any clear way to identify what physical building is hosting the hardware you rent from them.
Those are almost lost in the noise, compared to the big datacenters. (I've been inside two Atlanta facilities, one leased and one built from scratch, and the old Savvis one in Sunnyvale).
Meta could build their own cloud offering. But it would take years to match the current existing offerings of AWS, Azure and GCP in terms of scale and wide range of cloud solutions.
And then there's sales. All of those three - and more you haven't considered, like the Chinese mega-IT companies - spend huge amounts on training, partnerships, consultancy, etc to get companies to use their services instead of their competitors. My current employer seems all-in on Azure, previous one was AWS.
There was one manager who worked at two large Dutch companies and sold AWS to them, as in, moving their entire IT, workloads and servers over to AWS. I wouldn't be surprised if there was a deal made there somewhere.
The real question is: why aren't they? They had the infrastructure needed to seed a cloud offering 10 years ago. Heck, if Oracle managed to be in 5th (6th? 7th?) place, Facebook for sure could have been a top 5 contender, at least.
Because they make more money using their servers for their own products than they would renting them to other people. Meta has an operating margin of 41% AFTER they burn a ton on Reality Labs, while AWS has a 21% margin with more disciplined spending. Social media is a more profitable business than infrastructure.
> Advertising (over 97.8% of revenues): the company generated over $131 billion in advertising, primarily consisting of displaying ad products on Facebook, Instagram, Messenger, and third-party.
Tensorflow and keras have gotten better, but pytorch historically had better flexibility than keras and was much easier to debug/develop in than tensorflow.
aww, those existing offerings are overcomplicated as hell, a fresh look could yield substantially simpler cloud developer experience and this would compete well against those other cloud offerings on simplicity alone
A company moving away from Nvidia/CUDA while the field is developing so rapidly would result in that company falling behind. When (if) the rate of progress in the AI space slows, then perhaps the big players will have the breathing room to consider rethinking foundational components of their infrastructure. But even at that point, their massive investment in Nvidia will likely render this impractical. Nvidia decisively won the AI hardware lottery, and that's why it's worth trillions.
I don't think they could; nvidia has tons of talent, Meta would have to steal that. Meta doesn't do anything in either consumer or datacenter hardware that isn't for themselves either.
Meta is a services company, their hardware is secondary and for their own usage.
I'm more concerned to avoid nvidia (et al.) market domination, than chasing the top-edge of the genAI benefits sigmoid. This will prevent much broad-based innovation.
This space is so compeitive, even if Nvidia is asleep at the wheel a competitor will come and push them before too long. AMD has a history of noticing when their competitors are going soft and rapidly being compeitive.
It's not the hardware keeping NVidia ahead, it's the software. Hardware-wise AMD is competitive with NVidia, but their lack of a competitive CUDA alternative is hurting adoption.
I still, for the life of me, can't understand why Google doesn't just start selling their TPUs to everyone. Nvidia wouldn't be anywhere near their size if they only made H100s available through their DGX cloud, which is what Google is doing only making TPUs available through Google Cloud.
Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.
It is an absolutely massive amount of work to turn something designed for your custom software stack and data centers (custom rack designs, water cooling, etc) into a COTS product that is plug-and-play; not just technically but also things like sales, support, etc. You are introducing a massive amount of new problems to solve and pay for. And the in-house designs like TPUs (or Meta's accelerators) are cost effective in part because they don't do that stuff at all. They would not be as cheap per unit of work if they had to also pay off all that other stuff. They also have had a very strong demand for TPUs internally which takes priority over GCP.
Do you mean, sell TPU hardware to other companies that would run it in their data centers? I can't imagine that would ever really work. The only reason TPUs work at Google is because they have huge teams across many different areas to keep them running (SRE, hardware repair, SWE, hardware infra) and it's coupled to the design of the data centers. To vend and externalize the software would require google to setup similar teams for external customers (well beyond what Google Cloud provides for TPUs today) just to eke out some margin of profit. Plus, there is a whole proprietary stack running under the hood that google wouldn't want to share with potential competitors.
Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.
> Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.
We had a GSA for intranet search and other than the paint this was a standard Dell server. I remember not being impressed by what the GSA could do.
We also had Google Urchin for web analytics, it wasn't a hardware appliance but the product wasn't very impressive either. They then killed that and tried to get you onto Google Analytics.
They just didn't commit to these on premise enterprise products.
And undercut what they'd like to use as a huge motivator in people moving to GCP? Not likely. Even if they wanted to they can't keep up with their own internal demand.
Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.
>Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.
Once you go out of your heavily curated hardware stack, the headaches multiply exponentially.
The impression I got from this thread yesterday is that Google's having difficulty keeping up with the heavy internal demand for TPUs: https://news.ycombinator.com/item?id=39670121
Well it was wasn’t it? There was a massive boom where loads of companies over promised what they would achieve, followed by a crash when everyone realised lots of them couldn’t, followed by stability for the smaller number that could.
It was the very definition of a hype cycle as far as I can see. Hype cycle doesn’t mean “useless and will go away”, you have the second upward curve and then productivity.
I don’t disagree, but a lot of “analysis” was not that nuanced. At one time I worked for a company where 90% of revenue was from printed periodicals. Smart, capable executives assured the whole company that the internet was not a threat, just something college kids used for fun.
Colloquial, dismissive use of “hype cycle” does not usually mean “this will change the world but foolish things, soon forgotten, will also be done in the short term”. Though I agree a deeper understanding of the term can suggest that.
It will be ironic if Meta sinks all this money into the new trend and finds out later that it has been a huge boondoggle, just as publishers followed Facebook's "guidance" on video being the future, subsequently gutting the talent pool and investing into video production and staff - only to find out it was all a total waste.
It already paid off. When the world moved from determinisic to probablistic ad modeling. That's why their numbers are so good right now compared to every other advertiser
Apple turned off a lot of "signals" used by advertisers for precisely targeted ads via persistent user-beacons. Facebook ad placement quality (and revenue) cratered in the immediate aftermath.
Meta has since gotten better at it- likely with lots of AI-assistance and their revenue numbers reflect this. The targeting is now likely probabilistic in that the advertiser now makes educated guesses on the best ads to serve based on limited or non-existent identity information.
So the AI efforts would have paid back by way of higher revenues.
There are reports [1] that a bunch of companies like "College Humor" were convinced to switch to producing native video for facebook (instead of directing users to their own sites) on the basis of bullshit metrics from facebook, and had an extremely bad time as a result, with some companies going bankrupt.
Something like counting an autoplaying video that ran for 3 seconds as a 'view' IIRC
Thankfully, Dropout (a spin-off of College Humor) is alive and well, and producing some of the best D&D Actual Play series as well as other non-D&D comedy shows. One of the entertainment services that I happily pay for because I want to support what they're doing.
Said the butter churner, cotton ginner, and petrol pumper.
I work in film. I've shot dozens of them the old fashioned way. I've always hated how labor, time, and cost intensive they are to make.
Despite instructions from the luminaries to "just pick up a camera", the entire process is stone age. The field is extremely inequitable, full of nepotism and "who you know". Almost every starry-eyed film student winds up doing drudge work for the rest of their lives. Most will never make a feature to match their ambition.
If the whole task was to simply convey my thoughts and dreams to others, why am I scrambling around to sign location rights, capture photons on expensive glass, and then smear and splice things together for months on end? This is ceremonial and soon to be anachronistic. I'm glad that whole mess is going to be replaced. It's a farce.
To phrase it another way - would you like to be hand-writing assembly on punch cards? To only gain entrance into the field with your mathematics PhD?
To speak of the liberty and the economics, why should I have to sell the rights to my idea to a studio so I can get it off the ground? Why should I have to obey the studio's rules and mind their interference?
This whole Gen AI thing is going to be the biggest liberating moment for filmmaking creatives. I know, because I am one.
And if you think any Jack or Jill can just come in and text prompt a whole movie, you're crazy. It's still hard work and a metric ton of good taste.
Art will never die. It's the human soul. It'll take more than some tech bros with GPUs to kill it.
AI is just another tool for the artist. A "bicycle for the mind" to quote Jobs, and a rocket ship for the imagination to convey my own direct experience.
> And if you think any Jack or Jill can just come in and text prompt a whole movie, you're crazy. It's still hard work and a metric ton of good taste.
If you want anything good, yes. If you just want something… I reckon it'd take a week to assemble an incomprehensible-nonsense-film pipeline, after which it's just a matter of feeding the computer electricity.
Short-term, this is going to funnel resources away from the people with good taste. Long-term, it might help collapse the entire "creative industry", after which we might get some of that artist liberation stuff you're talking about – but we might just end up with new gatekeeping strategies from the wealthy and connected, and business as usual.
The idea that AI isn't going to be used as a creative tool too and that it won't lead to more and better art is a defeatist, Luddite attitude.
Similarly shaped people thought that digital cameras would ruin cinema and photography.
> Short-term, this is going to funnel resources away from the people with good taste.
On the contrary - every budding film student will soon [1] be able to execute on their entire visions straight out of the gates. No decades of clawing their way to a very limited, almost impossible to reach peak.
> it might help collapse the entire "creative industry"
The studio system. Not the industry.
> new gatekeeping strategies from the wealthy and connected, and business as usual.
Creatives have more ways of building brands and followings for themselves than ever before. It's one of the largest growing sectors of the economy, and lots of people are earning livings off of it.
You'll be able to follow that steampunk vampire creator that's been missing from the world until now. Every long tail interest will be catered to. Even the most obscure and wild tastes, ideas, and designs. Stuff that would never get studio funding.
As a creative, I'm overjoyed by this. My friends and I are getting to create things we never could make before [2].
>You'll be able to follow that steampunk vampire creator that's been missing from the world until now. Every long tail interest will be catered to. Even the most obscure and wild tastes, ideas, and designs. Stuff that would never get studio funding.
Your optimism reminds me of the optimism I had around the early internet. Power to the people, long tail, rise of the creative class, the fall of gatekeeping corporations, etc.
It was like that for a couple of years in the late 90s before power and control got vastly more centralized than before. Maybe this time it’ll be different.
The big difference is that back then, anyone with a consumer-level computer in their bedroom could turn it into a server and be a first-class citizen on the Internet.
With generative AI, models will be controlled by a handful of giant corporations who have the enormous corpuses (of dubious provenance) and compute ability to train them.
You can run ComfyUI and AnimateDiff on your PC. If you haven't checked them out, please do.
And there are other angles to consider. Apple, for one, is expressly interested in not becoming a thin client to cloud AI. They're baking a lot of inference power into their chips. If the creative class don't need their devices, that doesn't bode well for them...
FakeYou, CivitAi, WeightsGg, Comflowy, ... -- there are tons of vibrant communities to teach you everything you need to know. The tools are open source, free to use, and accessible.
Many YouTube Poops are artistic expression (e.g. https://redirect.invidious.io/watch?v=dO4eIEvHjSw). Skibidi Toilet is definitely artistic expression: it's a full-on epic. (Reactions from one ≈50-year-old: “baffling” “how did they do that?” “why would anyone make this?”)
If you think the Luddites were defeatist, you don't know much about the Luddites.
> On the contrary - every budding film student will soon [1] be able to execute on their entire visions straight out of the gates. […] Creatives have more ways of building brands and followings for themselves than ever before.
Yet, we have no shortage of starving artists. Will AI provide them food and shelter?
This is unequivocally a win for creative expression for hobbyists, but it stands to harm professionals – at least in the short term, perhaps longer-term. It's not happening in a vacuum: the greedy are revoking livelihoods because they think AI can do it faster and cheaper (laundering appropriated hobbyist and increasingly-cheap professional labour).
> The studio system. Not the industry.
Huh, the word 'industry' has a specialised meaning in economics. Didn't know that.
> Similarly shaped people thought that digital cameras would ruin cinema and photography.
Obviously, but you seem to be arguing that AI is just another evolution of productivity tools. You still need to have a photographer's eye while using this technology.
If you couldn't make a good composition on film, a digicam will not save you, and it definitely did not replace photographers. Perhaps lowered the barrier of entry for prosumers.
Are you talking about some as yet unseen research/technology? The aesthetic sample looks like something we could have seen on the SD subreddit for the last year.
> Said the butter churner, cotton ginner, and petrol pumper.
Said the bank teller, record producer, etc.. Plenty of cases where we've been told technology and automation would democratise the field and remove the middleman, and actually it's the opposite.
Yes, it would be nice if AI made it easy for anyone who wanted to make a great movie. That doesn't mean it's going to happen.
> The field is extremely inequitable, full of nepotism and "who you know"
Maybe, but it's never been cheaper to make a movie.
I know someone with no connections and (almost) no money which in 4 years made multiple no. 1 box-office films (obviously not in US, in a smaller country) and then got picked up by Netflix.
Yes of course - it depends on what lens though. If you mean "I'm learning to build better from this" then no, but its very informative on Meta's own goals and mindset as well as real numbers that allow comparison to investment in other areas, etc. Also the point was mostly that Meta does publish a lot in the open - including actual open source tech stacks etc. They're reasonably good actors in this specific domain.
Meta is still playing catch-up. Might be hard to believe but according to Reuters they've been trying to run AI workloads mostly on CPUs until 2022 and they had to pull the plug on the first iteration of their AI chip.
> we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.
Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.
I don't know. I haven't actually worked with IB in this specific space (or since before Nvidia acquired MLNX). My experience with RoCE/IB was for storage cluster backend in the late 2010s.
This is great news for Nvidia and their stock, but are they sure the LLMs and image models will scale indefinitely? nature and biology has a preference for sigmoids. What if we find out that AGI requries different kinds of cpu capabilities
If anything, NVIDIA H100 GPUs are too general purpose! The optimal compute for AI training would be more specialised, but then would be efficient at only one NN architecture. Until we know what the best architecture is, the general purpose clusters remain a good strategy.
Not really. Ranking and recommendation models require different infrastructure than LLMs. The models are generally smaller and require more data processing before training.
I also use uBlock but my filters are the default ones and I saw it without any problem but tbh this is the first time that I saw some post on the Web have HN as a share option or the first time that I was surprised seeing it. Maybe it has something to do with Google ranking "trusted human information and knowledge" higher than "non-human" information and knowledge[0] or simply some Meta software engineer loves and uses HN so s/he decided to include HN as well, idk.
The link mentions "our internal job scheduler" and how they had to optimize it for this work -- does anyone know what this job scheduler is called, or how it works?
Sure. 100T/day * 1day/86400sec ~= 1B/sec. They're probably considering at least a few hundred candidates per impression, and every impression is going to go through _at least_ two models (relevance and pCTR/revenue), so you could get there just with online serving at 5Mqps, which is plausible. But they're also going to be doing a lot of stuff in batch - spam predictions, ad budget forecasts, etc - so that every candidate actually runs through four or five different models, and every actual impression could do more than that.
How many ads does Meta serve a day, and how many AI model executions are done for each one? Repeat the same for stories, post and comment recommendations on Facebook and Instagram, and you have very big numbers. To that, Add VR, internal modeling and other backoffice/ offline analyses over billions of users and you'll easily get into the trillions.
Perhaps there's some combinatorics where every time an ad or post is displayed to the user, it runs through some hundreds/thousands of candidates and computes their relevance.
it's really interesting just how similar these systems are to the designs adopted for HPC over the past few decades. I'm salty because it took a while for the ML community to converge on this (20+K GPUs connected by a real fabric with low latency and high bandwidth).
Metas backing itself into a corner with its admirable commitment to open source. Unfortunately, at some point when they decide to monetize their billions spent and try to release a closed source model, the level of vitriol they will deal with will be an order of magnitude above what even OpenAI is experiencing. I don’t think they realize that!
Meta's commitment to Open Source is well under calculation.
OCP is a way to rally lower-tier vendors to form a semi-alliance to keep up with super-gorilla like AWS & Google.
LLaMA has already gained much more than its cost (look at the stock price, and the open source ecosystem built surrounding LLaMA, and Google's open source Gemma models which is a proof of Meta's success).
IMHO, Meta's Open Source strategy already covered at least 5 years in prospect. That's enough to finesse a 180 degree turn around if necessary (i.e., from open source to close source)
"By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s."
This AI game is getting into a GPU war. Heard that Meta is pushing a lot of CPU wordloads to GPU to co-locate with model inference for infra simplicity.
Meta seems to actually be taking all the right steps in how they're contributing to open source AI research. Is this a "commodotize your complement" kind of situation?
I genuinely think one of the most plausible short-term dangers of AI is the creation of lifelike bots which will be absolutely indistinguishable from real humans in short-form online interaction.
Since people don't want to talk to algorithms, this would result in them shunning all social media, which is a huge danger to companies in the space.
In pretty much every interview, Yann has talked about how important that AI infrastructure is open and distributed for the good of humanity, and how he wouldn't work for a company that wasn't open. Since Mark doesn't have an AI product to cannibalize, it's in his interest to devalue the AI products of others ("salting the earth").
If they make AI models free to use it makes OpenAI nearly valueless, which means that they can't survive and then sell Meta's competitors a better GenAI product than Meta can make themselves.
So basically since they don't make money directly on GenAI, it makes sense for them to release it for free so no one else can have something better, so they don't have to compete on GenAI abilities with their competitors.
The angle is that by releasing cutting edge AI research to the public openly, the relative difference between open source models/tech and closed source tech shrinks.
Whether or not you think the "value" of AI products is proportional to their performance gap vs the next closest thing or not is up to you. Very interesting PG essay I read recently talks about the opposite of this (Superlinear returns) where if you're half as good as the next competitor, you don't get half the customers, you get 0.
Windows should always provide enough additional value that makes up for what they are asking as money - compared to the free option. That is the point. If you had no other viable options, then they could do whatever they like. Now they have a baseline to compete with and it is very hard to compete with free.
Mistral makes comparable models to Facebook. Mistral charges money, Facebook does not. This negatively affect’s Mistral’s pricing power because a customer can get 70% of the performance they need for 0% of the cost.
The “0% of the cost” part is unique to software businesses because you can copy software so cheaply
The Llama models have played a large part in fostering the development of the open source LLM ecosystem, and I expect Llama3 to put in performance > mistral medium and anthropic haiku while being fully open and able to be run on consumer hardware.
is "salting the earth", in the biblical sense of destroying your enemy and their land to the point where not even plants grow again, a SV term used for companies that promote open source?
It's a term used for making a certain type of business unviable. In this case, high quality open models will make closed source models less viable, since the closed source model providers won't be able to charge monopoly prices for their models, but will have to approach the price of cloud GPU time or lose customers to equally capable open models.
That's a weird comparison. The GPU is only a part of the capex: there's the rest of the servers and racks, the networking, as well as the buildings/cooling systems to support that.