I think it's also interesting if taken into the context that a lot of the Nvidia-Intel feud stems around Nvidia's development of an x86 CPU that had to be shelved due to Intel patents. The chipset licensing disputes, etc, seem to all stem from Intel being upset about Nvidia attempting to encroach on their turf.
It makes me wonder if Nvidia was lobbying Microsoft to expand Windows support beyond x86 as a direct result of all this.
Either way, Nvidia's pretty clearly wanted to expand into the desktop CPU market for a while now, and with Intel blocking their entrance to the x86 market, it only makes sense they'd start examining other options. ARM seems like a pretty logical place for them to end up given the options.
> Nvidia's pretty clearly wanted to expand into the desktop CPU market for a while now
While trying to just hold position as a supplier of GPUs for x86 PCs would be very difficult for nVidia now that both of the big x86 CPU manufacturers are pushing their own GPU systems hard (and increasingly, integrating them in the CPU). This is probably quite an up-or-out situation for them.
Or, what if they just need to expand to be able to afford continuing the development of cutting-edge graphics? The profit margins must be getting thinner.
Windows NT has been ported to PowerPC, DEC Alpha and Itanium in the past, so an ARM port shouldn't be especially hard. Longhorn was probably a much more ambitious change to WinNT than a CPU port. MS would probably want to introduce some kind of universal binary format as well, but that shouldn't be undoable either.
To ensure portability, NT was originally written for the DEC Alpha and later ported to Intel. (It was also original created with a Pig-Latin UI and later localized to English.) Also, the Xbox360 runs a stripped-down branch of NT on big-endian PowerPC. An native ARM port should be relatively easy.
That's not a universal binary. It's a bytecode package. It still needs to be run through an interpreter or a JIT compiler, just like a JAR file for Java.
A universal binary contains actual machine code for multiple architectures. Universal binary support requires deeper changes to the OS and can complicate testing, but it's basically a requirement for making cross-platform high performance code because a JIT compiler can't spend as much time optimizing code as an ahead-of-time compiler.
Machine-independent bytecode paired with a high-quality VM allows you to ship a cross-platform executable, but it's not going to be enough when ARM PCs are facing an uphill battle to prove their performance is acceptable to a market that isn't particularly satisfied with Intel's Atom.
That depends on the market, I think. AFAIK most of the dissatisfaction with Atom is due to its power consumption rather than its performance, so an ARM with comparable computing power and lower power consumption would be quite satisfactory for most users when teamed up with a solid GPU.
It's a smart, easy optimization, so I'd be surprised if .NET wasn't using it, but ultimately it has the same effect as reducing the frequency of GC pauses. It doesn't lead to faster execution. It doesn't change the fact that the code didn't pass through a more thorough analyzer/optimizer. How good are JIT compilers at automatic vectorization, for example? Opportunities for automatic vectorization could be encoded into bytecode such that the SIMD capabilities of different architectures could be used, but I don't think .NET does that.
I remember we had these in Acorn machines, when I was at school in the 90s (in the U.K.). They were always very fast I remember. They were replaced with PCs later on, but due to the budget the school was on, these ending up being much slower at the time but that was most likely due to Windows needing a lot more memory than RISC OS did.
I'm pretty sure it was to do with the RISC OS being lightweight memory wise than anything. I had some good spec'ed PCs at that time but those relatively cheap Acorn machines were always faster in basic OS tasks.
Which way do you think the issue was settled? Last time I read much about the issue, there seemed to be decent arguments both ways: CISC allows denser packing of instructions, but all CPUs are RISC internally, but the CISC decoders are negligible overhead, but good compilers are easier to write for RISC, ...
How valid are the various arguments these days? The compiler complexity argument seems to work well against the Itanium, but I'm not sure that it makes much difference between ARM and x86 or x86_64. The instruction decoders on an x86 processor really do look minor on paper, but why hasn't Intel been able to produce a chip with competitive performance per watt, especially given their fab advantage? It seemed like they were really trying with the Atom, but it still hasn't gotten down to the power levels where ARM really shines, even though the high-end ARM cores are now out-of-order superscalar designs.
The consensus is that RISC vs. CISC doesn't matter because microarchitecture trumps ISA: the cost of instruction decoding is now small, I-caches negate any difference in code density, compilers can produce efficient code for any reasonable (i.e. non-Itanium) architecture, etc. I am honestly surprised to see Dally's complaints.
The instruction decoders on x86 processors consume a lot of silicon. x86's instruction retirement also imposes additional overhead due to book-keeping on the scheduler. Eliminating the x87 stack-based FPU is been a gradual process involving an evolving SIMD instruction set that started as a hack.
In reality, resources trumps ISA. x86 has the advantage of hand-optimized logic combined with the best high-performance VLSI manufacturing in the industry.
In spite of that, Itanium blows the doors off of every x86 processor built on the same fabrication technology -- notice that so far, the Itanium and x86 contemporaries are built on fab processes that have been two generations apart, with the high-volume part using the newest process.
+1 for "compilers can produce efficient code for any reasonable (i.e. non-Itanium) architecture".. laughed out loud, until i tried to explain why i was laughing to my wife. She thinks i'm a geek.
Do you have any insights into why then x86 chips consume so much power than the ARM equivalents?
Btw, I tend to agree with you,and think Intel's engineers are just being lazy wrt power. I expect this Nvidia chip to get crushed technically when Intel's engineers gear up and really work on power (like they did when Transmeta's Crusoe came out).
There's probably quite a bit of truth to that, but in recent years Intel has had two dramatic under-performers (Atom can't come close to ARM and the on-chip graphics for their mid-range processors have yet to fill the gap left when Intel forced NVidia to stop making integrated graphics chipsets) and a complete failure (Larrabee) that seem like they shouldn't have happened given Intel's dominance.
Why is Intel having trouble broadening their horizons? Is it an institutional thing, that the teams not working on the main (and most profitable) CPU product lines just can't get the resources needed to catch up with the competition? Intel always has the latest and greatest fabs, but it seems like the designs they're producing for these new product lines are completely squandering that advantage and then some.
There really aren't equivalent x86 and ARM processors yet; IIRC even the slowest Atom is faster than the fastest Cortex A8 (I'm still waiting on A9 benchmarks).
I don't think Intel is being lazy, but Atom is certainly an immature design and the mainstream Intel cores target a much higher level of performance; brainiac cores are fundamentally inefficient because power efficiency decreases as performance increases (in other words, each marginal increase in performance costs more in power than the previous one).
I suspect that the inability to bring down power isn't just "laziness" on Intel's part. Decreasing power consumption is (more or less) equivalent to increasing performance per watt - an area where Nvidia and ARM designs may simply be better.
Many arm chips support both a RISC ISA (the original ARM), and a more CISC-y ISA (thumb2). They consume less power and perform better with the latter. So well, in fact, that some chips don't even bother with the legacy RISC mode.
Thumb and Thumb2 really aren't any more 'CISC'-y then ARM; about the most cisc aspect would be that thumb2 supports two different instruction lengths (2 and 4 byte) whereas ARM supports only 4 byte instructions.
That said i've heard/read that Thumb2 tends to be the optimum size/space trade-off, but that's not because it's somehow more 'cisc'.
If this were another TransMeta, I'd agree, but the situation is quite different, because Intel is very early in their learning curve on graphics, and nVidia is quite far along in their learning curve on graphics and high-performance computing. They actually have a decent shot at competing with Intel in the low end of the desktop market by making up with the GPU what they end up lacking in the CPU.
A high-end ARM CPU would have lower single-threaded performance, but a higher number of cores, possibly achieving higher overall throughput. Coupled with an NVIDIA GPU on the same die, eliminating the PCI bottleneck, I think this could easily be more appealing than Sandy Bridge, for applications from netbooks to workstations to supercomputers.
The actual nvidia press release engadget worked off of doesn't use the word desktop at all, it looks like that was engadget's doing.
NVIDIA announced today that it plans to build high-performance ARM® based CPU cores, designed to support future products ranging from personal computers and servers to workstations and supercomputers.
It's easy to believe personal computers could really mean laptops considering basic industry trends. Workstations not so much, but if you currently manufacture the world's fastest GPGPU perhaps workstations just look to you like a bunch of GPGPUs with a bit of a standards based cpu glue for the OS.
lower power per instruction is the number one advantage here. Lowering power consumption means not only lower power/cooling bills, it also means better scalability in frequency domain, since speed of today's x86/x64 CPUs are limited by the cooling system's capability to evacuate heat (evenly) from the chip. IOW, it means temporal restoration of the MHz race, while starting number-of-cores race.
Well, as many has already said the power consumption of the ARM is interesting BUT I beg to differ :)
I don't think that's why ARM was chosen. Nvidia has been willing to build a CPU for a long time. And it's a well know fact that it has been barred from market entrance by Intel patents. ARM is beefing up its CPU line beyond mobile sector, so it's a logical choice.
As for power, well don't forget it'll be associated with a GPU, so the power advantage could easily be offseted by GPU hungriness. Time will tell.
On the other hand, tight integration and great compiler could allow Nvidia to tap into the GPU number crunching power through CUDA.
Still, it's not easy, just look at how the Cell hardly lived up to expectations. Interesting move from Nvidia. Wait and see the results.
Very hard to see ARM advantage on mid/high end desktops. Where Intel is with Sandy Bridge - it will be a long time before ARM reaches that performance.
And then low cost, lower performance desktops - why would anyone make one when laptops are more convenient?
I fail to see the point of ARM on "desktop" - Server yes (power consumption), Laptops yes (battery life, form factor) - but desktop?
But surely Intel's performance lead is basically a matter of process, not architecture. Presumably nVidia is going to strive very hard to narrow the process gap - and if it doesn't succeed at that, it's not as if an x86 arch would have saved it.
Nvidia may well narrow the low end x86 and high end ARM performance gap but at that point for a desktop they would have same/similar performance of x86 at may be a lower cost but at the huge disadvantage of lack of compatibility - apps and peripherals likewise.
EDIT: http://arstechnica.com/gadgets/news/2011/01/nvidias-project-... says this isn't about the desktop as much as it is about servers and workstations. Makes much more sense. John Stokes rightly points out - "this is a very tall order, and a lot of things could go wrong here. Right now, the GPU execution part is the only one where confidence is warranted based on a track record. With the system integration stuff and CPU part, NVIDIA is in uncharted territory. "
Sure. On the one hand the mid-to-high end desktop isn't the be-all and end-all of high-end chipmaker revenues anymore. And on the other hand a mid-to-high end Windows ARM desktop might be usable with some combination of native Windows and Office, an increasing supply of new ARM-native Windows binaries from ISVs (the Internet should help by making distribution much easier), Web apps running in ARM-native Web browsers, and butt-slow emulation for old but indispensable x86 binaries. It seems to me that the biggest issue could turn out to be PC games. Presumably nVidia either has to get the big games studios to issue ARM ports of their next and recent games, or it has to go on being successful at selling standalone GPUs for gamers' x86 PCs, or it has to give up on its gaming constituency for a time at least. I presume that since most PC games are now written to port to PowerPC and/or Cell, doing an ARM port isn't the adventure it might once have been?
Many games already run on iOS and Android which are ARM based and so at least those games could be ported.
Why Nvidia might want to compete with either of the PC, Portable Gaming Systems (PSP, iTouch, PSPhone), Xbox360 and PS3 without either a solid advantage or agreement with big game studios is beyond me.
nVidia may not have much choice but to try, if it's being locked out of the x86 CPU market at the same time that the market for third-party discrete GPUs on x86 PCs is being squeezed hard. And supporting ARM Windows may not that big a burden for PC game publishers if Windows' ARM support is first-class and producing an ARM build is largely just a recompile for the studios.
It's about the cost: 2GB RAM, 2-core 2GHz ARM CPU, GPU, 160GB hard disk, for just 100 US$. Add another 100 US$ for a case, keyboard, mouse and a cheap monitor... the whole desktop computer for just 200 US$ (!)
Obviously. But people need performance on the desktop. With cheap AMD / Intel QC CPUs do you think ARM will offer better performance at significantly lower cost than AMD/Intel? I doubt it. And just cost is not going to play well in the desktop market.
EDIT : At that configuration, without a monitor, when desktop sales are all time low compared to laptops in the PC market - why wouldn't one just buy a AMD Fusion Netbook?
Desktop performance, most of the time, isn't CPU-bound. HDD speed is the usual suspect.
These NVidia chips will most likely ship with custom circuitry to accelerate the few CPU hungry operations performed at the desktop (video codecs and 3D gfx). Keep in mind that at that point, at least 8 cores per chip will probably be the norm. Single core performance will not be that important.
Edit: of course, add CUDA to the mix regarding computationally heavy tasks without dedicated acceleration.
This is a big win for nVidia on the supercomputer side of things. They will soon have to face integrated CPU-GPU solutions from Intel and AMD which greatly simplify the process of building and programming a supercomputer. They've just one-upped them both by creating a similar offering with better performance on the GPU side (where the FLOPS are) and better power efficiency on the CPU side. In the race to the exaflop, nVidia just changed the odds dramatically.
Keep your grades up. When I was in college NVidia had, emblazoned in bold text across their intern ads, that a minimum 80%+ was required for consideration.
Part of the reason I never bothered with them - IMHO a company who heavily bases their hiring choices on school grades is not someone I want to work for. It betrays a belief in bad/unreliable indicators/metrics.
Nah, it just means that they get a large number of applicants. If I had to WAG, it means they'd filter out about 50% of the good programmers and 90% of the bad ones.
I also believe that grades correspond better with ability in chip design then it does in programming. (I've actually done both. Chip design is fun, but I'm addicted to very short code-compile-test cycles.)
Also remember that most companies arent willing to spend nearly as much recruiting effort per candidate for interns, and understandably so. I interned at NVIDIA a few summers back (although in software not hardware) and the university recruiter said that they more or less don't consider applicants with less than 3.5GPAs for internships. At my school, probably 80% of the people I knew and considered to be "good" programmers had above that, so I think its a perfectly reasonable thing to do.
A friend of a friend was asked to implement sqrt in hardware during a phone interview with nVidia. If you're smart enough to do that and you can't maintain the equivalent of a 2.7 GPA then that betrays a poor work ethic.
I think their interview's might be a little uneven. I was asked how to swap the values of two registers in hardware and I said something about muxes and lines. When the interviewer sounded skeptical I started expounding on different types of latches and their properties. In retrospect I think they wanted the xor trick, but that's really a firmware or software thing. Needless to say I didn't get the real interview.
Or far more interesting questions abound than the ones in your "Communications 101" class. (Also, top 80%+ isn't the same thing as a 2.7 GPA - completely different systems.)
Not my experience in college at all. The bad hackers were the ones at the extreme ends of the scales - the borderline-failing ones were expectedly terrible, and the top-marks usually had some other major handicaps that made them remarkable people but terrible colleagues (extreme lack of team work ability, zero communication skills, tactlessness, gigantic egotism, etc etc).
The best hackers (rather, the ones you'd want to work with) tended to fall in the middle of the pack. Decent marks, not great. Didn't spend their days trying to ace the next exam and spent a lot of time hacking on cool projects instead.
My own experience in college is that marks in the middle of the pack made the best employees, and in fact nowadays an extremely high GPA is a yellow flag when I'm reviewing resumes (though obviously, not a disqualifier at all).
Keep in mind at this school 80%+ would put you in the top 10-15% of all marks. I'm not against filtering for low marks, but in this case NVidia set their sights on the top 10-15% of the student population and disqualified everyone else as a matter of course. IMHO a dumb move that unnecessarily turned away a lot of qualified people.
We're not really talking about hackers here though. To me a hacker is ripping through code getting it done. But that works better in the software world where it is relatively easy to make a mistake.
I don't work at Nvidia, but working in the hardware field things go a bit slower. You need people who pay attention to detail and follow the process to a T to avoid costly mistakes and rework. Thats why "people who could put up with the bullshit that it took to get into the top 15%" and "people who are intelligent enough to design chips at a high level and disciplined to follow the process to avoid blowing hundreds of thousands of a dollars per mistake" have more cross over than high GPA and good ruby on rails hacker. It is just a different mindset and a different set of economics to be building slow iterating hardware vs quick iterating software.
It's a good point - FWIW, the jobs I was looking at were strictly software positions, but it stands to reason that the corporate hiring culture would be based on the needs of hardware people.
Agree. Hardware design is a much slower and deliberate process. The EEs and CMEs I know who are the best at hardware really are completely different from the best software people I know. I don't think there's a great deal of overlap between the two groups.
I see hardware design as mostly the same as software. There really isn't much difference to it. You can have flashes of genius and figure out how to implement a solution in minutes, just like in software.
And just like for production software, you have to test your code to death to make sure there isn't any bug. For instance, we usually booted Linux on CPUs to make sure nothing would go wrong (and things would usually go wrong around cycle 1,000,000,000! Debug that)
I imagine at a high level software that is well tested an goes through QA is a lot like hardware (or other engineering in general). But my experience in software was at a startup where if it compiled an built, we all shouted SHIP IT immediately (and were only half joking). We had great hackers doing tough things quickly, but there were definitely times we cut corners on testing knowing we could just patch it later if need be.
So you don't have a problem correlating high marks with "lack of team work, communication skills, tactlessness, gigantic egotism, etc etc" yet you have a problem with NVidia correlating high marks with good devs?
The brightest kids I knew got 80% without trying. If you're bright enough to work at NVidia and can't get 80%, you're lazy. I'm sure they missed some talent but saved a ton of time not having to interview mediocre students.
Not everyone has the time to do well in college even if they have the ability. I worked 30-35 hours a week every week in college. I can guarantee that I would have gotten a 3.9 or 4.0 had I been able to not work, but I didn't have that luxury. Oh well, I got out with a 3.2 which seemed good to me.
Even with a 3.2, you probably beat most kids at an engineering school. Although I've heard of engineering schools where half the class has a 3.5, but at my school our top guy had a 3.8, nobody had a 3.9 or 4.0 no matter how hard they tried. I was in the top 20% with a 3.3
I agree, but my point stands that most of the people getting bad grades probably deserve them despite the few who deserve better. Companies who want to make the best use of their limited hiring resources would be served by avoiding people with low GPAs.
If I were you, when applying for jobs, I would state you worked full time while in school. I'd definitely take a second look at a mid-GPA candidate who had an awesome work ethic like that.
Very good point. I was in much the same boat as the OP. There is a lot of respect out there for people who've worked their way through school.
Be careful not to emphasize it too much though, lest it seem like you're making excuses. Maybe the person considering you for a job did the same thing.
> Part of the reason I never bothered with them - IMHO a company who heavily bases their hiring choices on school grades is not someone I want to work for.
I worked for NVIDIA in a key group on key projects, and I never even finished my university degree (though my grades were very good). To make a general and hopefully obvious statement, if you're hiring someone fresh out of school with a bachelor's degree but no real-world accomplishments, you have to judge them by some objective measure, and grades are part of that. If you're dealing with someone with real-world accomplishments, it's a completely different matter.
When interviewing people during my time there, I personally never so much as glanced at their GPAs. But then again our group generally only hired people who had (and would soon have) Ph.D. degrees or experienced hackers whose past work spoke for itself. I imagine if you're hiring for entry-level positions, you need a very different approach just for the first round of culling due to the sheer volume of applicants.
That was my MO during college, and it seems to have worked pretty well for me ;)
Some companies though, will refuse to look at a resume, regardless of how richly experienced it may be, simply based on the fact that you failed to hit an arbitrary bar during college. To me this is lunacy.
If you were putting yourself through college, paying as you go, then as long as you managed a low "B" I'll probably cut you a break. You'll get a fair shot.
If you decide to skip college and throw yourself into work, I'm not going to hold it against you. That being my background, I generally see it as a plus.
If you went to college on someone else's dime, your parent's or the bank's, and have such a weak work ethic you couldn't be bothered to apply yourself then odds are you aren't going to bring anything special to the table at the work-place.
I'm sure there are a lot of cases where it's not fair, but there's few things that annoy me more than an inflated sense of entitlement, and for good reason. In my experience it's a great indicator of a poor performing employee. Generally with drama, generally thinking a lot of themselves despite negligible contributions to the business.
There's a disconnect here. You're falsely associating "average marks" with "didn't apply yourself", which simply isn't true, particularly in our field (I'm assuming you're in software).
I wasn't blessed with an unusually powerful intellect. I knew guys in college who would just get the course material after a couple of hours. For the rest of us earning marks meant many dozens of hours pounding a book, and at that point you run across the very real limitation of having 24 hours in a day.
The vast majority of college students have to make a very real tradeoff - marks vs. practical experience. For me, I chose not to invest dozens more hours to eke out 10% more on the exam, and instead chose to put it into hacking on relevant projects that taught me new technologies.
My marks were certainly not poor, they were perhaps 1 std. deviation above average. Decent, not great. Could they have been higher? Sure, but I'm no ubermensch, it would've meant a large sacrifice in another area.
If that's your image of a student not applying himself, then guilty as charged. I for one don't regret it for one second - after all, I hacked out a lot of stuff in those 4 and a half years (paid my way through schools by not having summers and staggering internships between semesters instead), and had a number of offered before graduating thanks to it. I'd hate to imagine my position if I'd just stuck to acing the school work and never hacking anything on my own.
Again, this goes back to my original point: companies that weigh marks heavily in recruitment are making a lot of fundamentally unreliable assumptions and rejecting candidates (before interviews, even) on what is IMHO a weak correlation to on-the-job performance. With the exception of a small minority of extremely smart people, someone with top marks had to sacrifice something major to get there, and in my experience that "something" is often something that gravely impacts employability.
In this case, perhaps this issue should be raised internally at NVidia. I for one didn't even bother applying, since having interned at companies that filtered strongly by grades, this was a major red flag.
Candidates with significant work experience - even ones still in college - are in high demand everywhere and have a lot of choices. Even a non-ninja-rockstar-guru like me had internship offers piling up outside the door. In this case it would benefit the company to not put things into job ads that don't actually matter.
It likely just means that they prefer to have many false negatives (rejecting good employees) and reduce the number of false positives (accepting poor employees). If the size of their applicant pool is high, that's a trade-off they can afford to make.
Incidentally, I interned at nv and my cumulative average at the time of applying was somewhere in the high 60s. Don't believe everything it says on the box ;)
(I was a software intern, but the job I had applied to asked for 80% as well.)
I used to work at NVIDIA as a hardware engineer and would go to nearby university career fairs to recruit new college grads. There's no absolute rule regarding GPA but generally 3.5 was the minimum cut off.
NVIDIA was also the only company I knew of that required candidates to take a written exam on the spot at the career fair. Scary stuff. The problem is there are too many applicants to look at so any kind of filter (administered fairly) is better than none at all.
I actually worked for ARM, but not on their main architecture. I worked for a satelite office acquired a few years earlier, which was spun out again last year. The project was a dynamically retargettable compiler for DSPs.
I doubt nVidia will be competing against x86 but rather x86-64, which has many of the "legacy" features you're no doubt thinking of (e.g. segment registers) removed or sequestered.
64-bit is a big deal on x86 because with it came a couple extra registers and a better architecture overall. On modern machines, 64 bits usually only means more addresses, because they have plenty registers already.
I remember that back when the 64-bit UNIX was introduced in, IIRC, SGI machines. Nobody made a big deal of it. SGIs were already very impressive and nobody cared that much for the extra bits in registers.
Back then, RAM was vastly more expensive than it is now, and processor speeds weren't great for processing gigabytes of data. Nowadays, the bigger address space is a big deal. Some of this is the OS (Windows is the biggest offender with its <2GB available user space addresses).
Even with Eclipse running and all of the memory allocated, less than 4 GB are currently being used for program workspace. The rest is being used by buffers. It's one of the cases when a PAE-like memory model would suffice.
I cannot remember when was the last time a single program wanted more than 4GB (I can, actually, it was Firefox and I left it running with a page full of Flash thingies for the weekend - by Monday, it was unresponsive and I had to xkill it). I can agree we need 64-bit addresses for servers (and we have been needing them for quite some time now) but not for desktops and certainly not for my own computers.
Commercial PC game developers constantly run into the ~1.7GB address space available to user apps. You can forget memory-mapping any asset files. Windows can be booted with support for 3GB/process (same as the default on most Linux distros) but that's useless for mass market stuff. Even just running a 64-bit OS gives 32-bit user processes 4GB address space to play with.
ARM recommends using Thumb2 for non-legacy software. Thumb2 is denser than x86, so actually ARM is the one with the advantage here (unless you have an existing ARM codebase).
This makes sense in the era of webapps and python and java. If people were still reliant on programs explicitly written for x86 this would never go anywhere.