Hacker News new | past | comments | ask | show | jobs | submit | allenrb's comments login

Was summarizing this article for a group of friends who largely met during the Apple II days and wanted to repost a bit of that here:

The optimized code at the end takes 94 nanoseconds to sum an array of 1024 32-bit floating point numbers.

In 94 nanoseconds, our old friend the 1 MHz 6502, would be just starting to consider signaling the memory chips that maybe they ought to try digging up the first byte of the first instruction in the program.

Worth mentioning, that code is entirely dependent on running in cache. Otherwise even the mighty M1 Max in the post would still be stuck waiting on that first memory fetch. DRAM is slow. :-)


Lucky our total L1 cache sizes are about as big as the entire addressable memory of the 6502.

We truly live in amazing times.


Truly. And I'm also always amazed by how much slower (in terms of wall time) modern software is than it ought to be. CPUs sure don't feel three to four orders of magnitude faster than they were 50 years ago, because software has gotten four to five orders of magnitude more wasteful to compensate. Argh...


I recently moved a project from a Phenom II 945 to an I7 8700. The project took around 1:30 to compile on the Phenom, 15s to compile on the I7. Others working on the project with even more modern systems are compiling in half that time.

The major advantage I had was running the same compilers, same interpreters and same tooling, just in better hardware.

On the other hand, I always felt KDE 2 and 3 were way more responsive on way less hardware than one of the modern versions or Gnome. Part of it is UI design - giving feedback that an operation is going to take some time immediately instead of blocking until the operation is done before doing anything.


Plenty of that software is fairly well optimized, but most software is not that optimized. Microsoft Teams, Discord, Slack, basically anything else that also uses Electron... it's not UI design, it's legitimately wasted work, and tons of it.


So many programmers saw "Premature optimization is the root of all evil" and thought it means "Caring about performance makes you a heretic".

You can't hotspot optimize a Fiat Multipla into an Formula 1 car. When every software you run creates a dozen factories to replace one for-loop, you get the modern desktop experience


    > So many programmers saw "Premature optimization is the root of all evil" and thought it means "Caring about performance makes you a heretic".
This is very well said. Also, with enough experience, many good programmers will "know" where the hotspots will be, while writing the code. So, they can be "sloppy" in areas where it doesn't matter, and "care more about performance" in areas where it will likely matter much more.


    > most software is not that optimized
I would say: For good reason. The value is too low.

    > basically anything else that also uses Electron... 
To me, the point of Electron is a lovely platform with very high developer productivity. This is partly the reason why Java/C# was able to displace so much C&C++ code inside big enterprise corps writing CRUD apps. Sure, Java/C# is a bit bloatly/slow compared to C&C++, but the developer productivity is way higher.

    > it's not UI design, it's legitimately wasted work, and tons of it.
I don't understand this part. Can you explain more? Maybe an example would help.


> I would say: For good reason. The value is too low.

Oh my god, absolutely not. The value is extremely high to me, the consumer, even if the company doesn't care. Sure, they can technically still get their value proposition out even if the app is slow and bloated and sucky because every app these days is slow and bloated and sucky. And you can say they only care about money all you want, but that's not going to convince me that's a good reason for an app to perform terribly.

> To me, the point of Electron is a lovely platform with very high developer productivity. This is partly the reason why Java/C# was able to displace so much C&C++ code inside big enterprise corps writing CRUD apps. Sure, Java/C# is a bit bloatly/slow compared to C&C++, but the developer productivity is way higher.

I use Tauri in place of Electron and it's a lot snappier than Electron is, and you can get much of the same benefits in terms of developer productivity, because it's also just a webview. The host side needs to be a Rust application, but I wouldn't say it's much more difficult to get started with than Electron. Obviously, what you put inside the webview still matters, but in general, I'd say Electron is just an inferior choice in most cases, even if you do legitimately want a webview.

> I don't understand this part. Can you explain more? Maybe an example would help.

Parent said the lack of progress bars in GNOME make it feel slow. I argue that while progress bars could be a hack to make the user more open to waiting, the issue is that the user probably shouldn't need to wait at all. There are definitely cases where a progress bar is a good idea, but they shouldn't be needed in most cases.

For example, if I right-click a message in Discord, sometimes the context menu contains a loading spinner. I have to wait for the context menu to finish loading. Is a loading spinner the right solution here? Surely the user wouldn't want for the menu to not open, or for it to simply be blank until it loads. I think none of those are quite right. The context menu shouldn't have to load at all, it should simply open.


> the context menu contains a loading spinner

How the fuck is that possible, permissible, tolerated?

Another example. I am looking for a different file browser because Dolphin waits while something happens when I select a bunch of files and right-click. I know that it is downloading something very slowly because I can see the network activity but I have no idea why.


> I am looking for a different file browser because Dolphin waits while something happens when I select a bunch of files and right-click.

File browsers being slow seems like a common issue. When I tell Windows 11 Explorer to sort by modified date, it has to slowly load the list and populate the files one by one. What the fuck it is doing I have absolutely no clue. All I know is that it's doing something it shouldn't, because the last modified date should be returned instantly by the directory listing, and no additional metadata should be required to perform the sort.

And back on macOS, around five years ago, whenever I opened a folder, Finder would wait for a few seconds and then play the uncollapse animation for every folder I have expanded, shifting the position of basically every item in the list (and also causing tons of lag).

I think the second one is a well-intentioned feature that just needs some work, but the first one is just garbage, that's not how it should work.


Well it depends on what software you're talking about. Browsers are way more capable than they were before. I'd be surprised if old computers would be able to play a 1080p video at 60fps even if somehow your network card had the bandwidth for it. And copying/pasting files is certainly way faster than it used to be. Compiling large applications is much faster now when it would be an overnight process decades ago.

Nothing is stopping you from using the old school software but the reality is that old software was missing a lot of the features that we rely on. That's not to say that current software isn't bloated but if you told me I could go back to using the old software I wouldn't (and I could).


Now you're making me wonder if there's a 6502 emulator making use of that fact.


Hah, I had the same thought. What kind of hacks can I do to convince the processor to keep as much of the emulator + opcodes in L1 as I can...


Back in the nineties, 3DFX had a synthetic rendering benchmark that relied on keeping the entire benchmark in L1 but the secret was taking over the entire machine so that no interrupt or other mechanism could pollute the cache.


A bit of ignorance on my part, but would the L1 be holding the data and the instructions? In which case we would be trying to fit our entire 6502 emulator in less than 64K of memory alongside the emulated RAM/ROM?


https://en.wikipedia.org/wiki/Apple_M1#CPU:

“The high-performance cores have an unusually large 192 KB of L1 instruction cache and 128 KB of L1 data cache and share a 12 MB L2 cache; the energy-efficient cores have a 128 KB L1 instruction cache, 64 KB L1 data cache, and a shared 4 MB L2 cache.”

⇒ chances are you don’t even have to try to fit an emulator of a 8-bit machine and it memory into the L1 cache.


I think you would very much have to try to fit a complete emulator of, say, the Game Boy into 128 + 64KB.

There's plenty of behaviour that is self-evident on real silicon but verbose and tricky to express in software.


Real question about L1 caches. For a long time, x86 (Intel & AMD) L1 caches have been pretty much pegged at 32KB. Do you know why they didn't make them larger? My guess: There is a trade-off between economics and performance.


There is a trade-off between cache size and latency.


Ok, so why do the new Mx chips from Apple have an L1 cache size greater than 32KB? Did they solve some long standing design issue?


The CPU decides what goes in there and when. You can only pray and offer sacrifices and guess at when and how.


Depends on the precise architecture, but ARM (and other RISC designs) usually have separate data and instruction L1 caches. You may need to be aware of this if writing self-modifying code, because cache coherence may not be automatic


This appears to be a pressure-fed rather than pumped engine, so limited real-world utility. Nonetheless, it’s incredibly impressive especially given that it seems to have been successful on the first try.

I wonder how practical it might be to integrate turbo machinery into an automated design system like this?

Oh, and it really is beautiful with copper construction and that fascinating swirl.


> This appears to be a pressure-fed rather than pumped engine, so limited real-world utility

This is addressed in the article:

> This is a relatively compact engine, which would be suitable for a final kick stage of an orbital rocket.

It has lots of real world application, just not currently as part of a lift stage since you're right it's a pressure based one as opposed to a pumped engine.


All are pressure fed. A pump generates pressure. It's common to test engine components without pumps using high pressure vessels in lieu of pumps. The E Complex at Stennis Space Center specializes in this approach.


Pressure fed is a fixed term when applied to rocket engines and means “fed only by the pressure in the tank (which is most of the time generated by having a pressurization system fed by another high pressure helium tank) and not by a pump”.


If I'm not mistaken, the Falcon 1 used a pressure-fed upper stage engine.


Sure did! The Kestrel engine (had to double-check that memory) was a cute little thing, but it did the job.


The swirl isn't really an essential part of the rocket design, but ports for thermocouples (i.e, temperature sensors).


“ The engine uses thin cooling channels that swirl around the chamber jacket, with a variable cross sections as thin as 0.8mm. The Kerosene is pressed through the channels to cool the engine and prevent it from melting.”


Seriously, my high school was mostly boring and pointless. I’m in awe of the opportunities some kids are able to find/create today.


Seriously. A good place in the United States will give you four weeks of PTO. Couple that with a decent holiday schedule and unlimited sick days (limiting sick days is a red flag, imho) and the only other thing you need is a big check.

Full disclosure: I’m 51 but never expected anything from my employers beyond a pleasant work environment, time off, medical insurance, and a significant regular contribution to my account.


We all understand that this cannot and will not continue, right? One of two things happens:

1. The AI boom goes bust. Nvidia sales and/or margins crater. The stock craters with it.

2. The AI boom is the real deal. Companies aren’t stupid and won’t keep paying Nvidia these prices forever. Pretty soon hardware and software architectures are standardized enough that anyone who can get onboard with TSMC, Samsung, or Intel can churn out hardware optimized for the right few functions and sell for a faction of the price. Nvidia can still be an innovator but they won’t be able to sell “bread and butter” products at these prices. Sales and/or margins crater, as does the stock.


NVIDIA got lucky that modern computer architecture happens to favor GPUs for AI workloads. They won't be able to milk it for long, but they for sure have the money to pivot. Question is whether they will put that money to good use or not.


You know, you can make money with predictions if you have the conviction.


I also think NVDA is overvalued, but timing is everything. They could stay overvalued for years before coming back down. "The market can remain irrational longer than you can remain solvent" and all that.


Not GP, but that's exactly what I'm doing. Sold NVIDIA recently at 869 USD (having bought it at 453 USD), and invested profits in Intel :) We'll see how that plays out in a year or two.


$869 sale means you sold between March 2024 and last week. Let’s game this out:

100 shares of NVDA at 453 pared down to just profits from 869 sale would mean holding about 48 shares and selling 52 shares. After today’s after hours movement your invested profit would be worth about $48k, or about $6300 more than when you cashed out your initial investment ($41712).

You take that same $41712 and buy INTC between March and now. If it was bought in March then you’re in for something like $43/share or 970 shares. Intel has been sliding since April and is now at $31. Your profit from NVDA has shrunk by nearly 30%. In the worst case (you bought in March rather than April), you’ve given up near $19k in profit by following your convictions. Your gains have gone from 115% (had you held) to potentially as low as 70%. I guess it’s all house money anyways though, right?


You're talking about this with benefit of hindsight. When I was selling at $869 I would not know how it will go. And, as a matter of fact, a week or two after I sold NVDA dipped to $700, although temporarily. So, it's just a matter of personal risk tolerance, I guess.


I have a 6 figure profit in NVDA shares right now. Cost basis is something like $100. I have considered selling( or at what price I would sell). The problem I have is what could I possibly buy with that profit? Intel hasn't shown that they have a real plan. The move in META seems to have already happened. I don't trust that Alphabet has any idea what they're doing. I could buy more Microsoft or Amazon shares I guess. I wouldn't throw money at these software stocks like Snowflake or the SNOWs of the world. AAPL might be a play if I have the conviction that Tim Cook has an AI plan in place.

In the end I'd probably just take the easy way out and go buy VTSAX and chill.


Your last paragraph: I would recommend this instead: VOO-Vanguard S&P 500 ETF


really? intel a sideways stock from the 70s is the best long you found in the entire planet? out of all the financial instruments and all the markets and all the categories u actually believe that’s the best pain to gain ratio play


They have their own fabs, a new GPU line, and they're national security critical. they also make pretty good CPUs even if they aren't the best value/$.


Intel literally cant fail because of national security. The government will continuously bail them out which I think is a pretty big advantage, giving them time to figure something out. Theyre cheap as fuck too so tons of upside


Well, I'm betting on Gelsinger's plan to launch Intel 18A process, and regaining process lead - this should bring Intel's stock price from current $31 to at least $70. Also, it's a hedge against China invading Taiwan and blowing up TSMC, in which case it would probably skyrocket to $100+. In the meantime Intel pays nice dividends.


MSFT went sideways for like a decade too. Who knows... Intel is doing something the others aren't with building its own fabs. Which arguably carries a lot of risk but it could pay off.


3. To justify current valuation revenues must continue to grow next 4-5 years until plateauing while Nvidia profit margin gradually declines to something normal, like 30%.

High growth is fragile. The value of Nvidia has dropped 50% multiple times in the past. Recression, or temporary oversupply, anything can mess it.


You mean the same companies who are paying AWS, Azure and GCP a lot of money for decades despite there being cheap alternatives? Or which customers do you talk about?

My company uses Azure for years. I don't expect them to change that anytime soon, I mean we're a SAP customer for 40+ years and SAP is Germany's No. 1 price gouging company. We use MS products for decades as well despite them being more expensive every year and I still do more or less the same in Excel today as I did years ago.

I think you have a misunderstanding of how B2C works and especially when we talk about enterprise level SW solutions. No CEO in their mind is switching business operation SW if a competitor is on sale lol.


This thought is strange. This very moment is the moment of success. Perhaps it can grow more, who knows, but a maximum declines on either side, that's literally what a maximum is.

Saying it won't last should not be profound because that fact should be self-evident. The only reason it becomes profound is because a lot of people are just that stupid.


What is happening is that every government on the planet is forced to procure local DC compute capacity to protect national sovereignty. The US ensured this with SWIFT deplatforming Russia.

And there is nobody else they can buy from on their timelines.


I noticed that your post failed to include important details such as when option 1 or 2 will happen or how option 2 will be implemented. I don’t disagree, I’m just pointing out that your post is worse than useless blathering, it’s useless noise.


A significant question is what "pretty soon" means. Nvidia also owns other parts of the stack (cuda) that makes them more difficult to replace.


The hard question is "when". I agree with you, yet I'd rather be long NVDA than short NVDA. (I am neither).


There is also third option - which I'm not saying I think happens, but if AGI was to happen then there is possibility of just few winners taking it all. Whoever gets there first would simply snowball away. It would be end game of capitalism.


lol, so short the stock...?


I’m with you. It’s hilarious to think back on, but I got a “no” from my present employer after the first technical interview because I didn’t remember the options to one particular tool.

Thankfully a somewhat distant inside connection was achieved, who managed to suggest “maybe that wasn’t actually the correct decision” and I got the chance to continue with the process. It’s all been good since then.

I sure do have an internal chuckle every time I have to use that particular tool… and still have to look up the options. :-)


I recently went to an interview where I was asked some specifics where the real-life answer is "I'd Google and have and answer in 30 seconds". I'm very curious as to whether this will be held against me...

But they also did give me what I thought was quite a good test, which was talking through some working code and suggesting how it could be improved.


Good news, you can have both!


Fun thought exercise, thanks! My question is, what advantage would a large exchange find in moving to cloud? They’ve already got the personnel capable of managing their environment. They’re not a rapidly-growing startup in need of flexibility. They’re large enough to get at least decent deals purchasing gear. “The cloud” will naturally expect to make a profit on the deal, which likely eats up (and then some) any savings which might otherwise be delivered.

I “get” cloud in a lot of circumstances but it doesn’t seem to make much sense here.


Without reductions in personnel, then none.

That's essentially what you're buying from a cloud provider. Most of the time its not so much renting the hardware as renting their labor in maintenance.

That is assuming your hardware needs don't have a wide enough variance from time to time (scale up/scale down)


Unintegrated or, perhaps… Dis-integrated?


The correct term is discrete circuit.

We used to have circuits, and when integrated on one piece of silicone, we called them "integrated circuits", while the old ones became "discrete circuits".


I think that’s not quite correct re: the Cray 1 having 64 copies of certain units. True, the vector registers did have 64 entries but the vector functional units were pipelined. They could return a result each cycle (once past the initial latency) but did not return 64 results simultaneously.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: