Hacker News new | past | comments | ask | show | jobs | submit login
x86-64 Assembly Language Programming with Ubuntu (unlv.edu)
276 points by lainon on Aug 31, 2018 | hide | past | favorite | 91 comments



I am currently taking this class, and am happily surprised this made it here.

The book needs more work, but I still believe it's an great resource. For example, on page 11 it says "Note that when the lower 32-bit eax portion of the 64-bit rax register is set, the upper 32-bits are unaffected." In reality, the high order bits are zeroed to avoid a data dependency. I'm going through the entire book for typo hunting :-)

Also I found some issues while discussing Unicode, but the class only requires use of the ASCII character set.


Note that when the lower 32-bit eax portion of the 64-bit rax register is set, the upper 32-bits are unaffected.

That sounds like wishful thinking, and indeed would be expected by someone familiar with the 16-32 transition of the 386 (modifying AX doesn't change the upper 16 bits of EAX.) Instead of getting even more 32-bit registers, or even 64-bit registers accessible in 32-bit mode, AMD64 gave us a weird not-quite-fully-64-bit extension.

I've heard the "partial register stall" excuse multiple times, ostensibly valid but only if you insist on thinking in "partial registers" instead of simply more 32-bit ones as input. For example, some variants of the divide instruction use EDX:EAX (or RDX:RAX) for its input.


> I've heard the "partial register stall" excuse multiple times, ostensibly valid but only if you insist on thinking in "partial registers" instead of simply more 32-bit ones as input. For example, some variants of the divide instruction use EDX:EAX (or RDX:RAX) for its input.

That would mean you have to double the amount of state you track. The hardware cost of doing this is ~= the cost of doubling the amount of 64-bit registers. The amount of transistors used for storing register data is negligible compared to the cost of "metadata" and handling around them. Why not just have more registers then?

"Just allow us to partially update the upper halves of the registers" is the sort of thing someone who understands software but not hardware would ask. It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls. (Any instruction that might update only partially now has to wait for all the previous results on the register.)


It's 99% of the cost of just having twice the registers, but not nearly as useful, and it would introduce a lot of potential performance pitfalls.

Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one, and having them combine automatically into high and low halves is more useful than you think. Tight loops with non-parallelisable bit/byte manipulations of this sort occur quite often in things like data compression and emulation.

Any instruction that might update only partially now has to wait for all the previous results on the register

Does it? Once again, you seem to be thinking in "partial registers" rather than "just another one" --- and I argue that this conceptual difference is very important. E.g. you can work with both AL and AH independently, then use them together as AX --- at which point, yes, the processor will need to wait for the results from both, but then it can combine them implicitly without having to waste time and space decoding and executing the instructions to do it.


>Of the times I've used Asm, there's been far more situations where an extra 32-bit register would be more useful than a 64-bit one

You need a 64 bit register any time you want to store a pointer though unless you want to use some kind of a segmented memory model. I don't think anybody wants to go back to that although I'm not one to criticize weird fetishes.

Clearly when you look at the fine details of AMD64 it looks like a weird frankenstein monster of an instruction set. REX prefix holding the MSBs of registers since 32bit opcodes only allowed three bits to encode 8 registers. Same prefix used to set the width of memory target operands, except that some instructions default to 32bits while others default to 64. R12 has weird encoding quircks because it matches RSP in the "low" register set and that register has specific semantics when used as a base register...

I wonder how much die area is used on a modern CPU just to deal with all this cruft and translate it into a saner RISC instruction set internally.


Well the 16-bit and 8-bit registers work this way, and actual hardware shows this is far from free. We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al).

It's easy to say "think of them as separate registers" just as it's easy to say "think of them as a partial registers" - but the ISA definition is such that they have to appear as partial registers in the scenarios where it would be visible.

So sure, you could make hardware that would rename and treat them as different registers (at it has been done on some x86 versions), but then when you read a wider portion you need to combine them, which won't be free (yes it happens "implicitly" but that doesn't somehow make it much easier for the hardware).

There are also plenty of cases where you want zero-extension for functional reasons, especially in compiled code where things like casts to a larger size become free. Cleverly using both halves of a register and using the implicit combination into a full 64-bit value is much rarer that just wanting to store a 32-bit value and sometimes wanting to use it as a zero-extended 64-bit value.


>"We've had partial register stalls, merging uops, and other weird behavior (for example, ah today works very differently from al)."

Can you elaborate on the last behavior, that " ah today works very differently from al"? How do they differ? I hadn't heard this before.


You can find all the gory details at:

https://stackoverflow.com/a/45660140/149138


I think ARM64 has instructions for inserting ranges of bits from one register into another.


Interesting. It seems then that we can understand the difference between the 16->32 design of IA-32 (1985) and the 32->64 design of x86-64 (2003) as being heavily influenced by the fact that out-of-order execution had become an important consideration in 2003?


Another thing that could be fixed: the table in page 9 says long is 32 bits (with a footnote saying that depends on the compiler, the value shown is for gcc and g++).

That's only true when compiling for x86 (32 bits). When targeting x86-64 (the subject of the book), long is 64 bits in gcc and g++.


Thanks, "fixing now!"


Will these be making their way back to the author so the text is updated? Or documented somewhere?

Cheers!


Yes, I plan to let him know soon!

EDIT: I found out he is providing extra credit for textbook errors.


Ask him to put the text onto GH/GL or something so that we can submit PR's with corrections. I haven't founded any code errors yet, but quite a few proofreading things I could point out to him. Parallelism with some of the section headings was the first thing that jumped out at me. e.g:

  1.3 Why Learn Assembly Language
  1.3.1 Gain a Better Understanding of Architecture Issues
  1.3.1 Understanding the Tool Chain
  1.3.1 Improves Understanding of Functions/Procedures
Should all be in the same tense.


> For example, on page 11 it says "Note that when the lower 32-bit eax portion of the 64-bit rax register is set, the upper 32-bits are unaffected." In reality, the high order bits are zeroed to avoid a data dependency.

Well, there is one case where the upper 32-bits are not zeroed. It turns out that xor eax, eax is assigned to opcode 0x90, which is better known to most people as NOP.

If you want real fun, read up on what happens with AVX registers. Whether or not you leave untouched or zero the upper bits are dependent on if you use VEX encoding or not.


No, in section 3.4.1.1 General-Purpose Registers in 64-Bit Mode of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, it says, "32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register."

`xor eax, eax` actually generates 0x31 0xc0, and `xor rax, rax` generates 0x48 0x31 0xc0. 0x90 decodes to xchg eax, eax in all modes except long mode, which has no effect. In long mode, the opcode 0x90 has no effect still but is no longer equal to xchg eax, eax.


Indeed xchg eax,eax is a nop idiom; there are many. Recent microarchitectures simply ignore nops without executing anything. From the Intel® 64 and IA-32 Architectures Optimization Reference Manual:

16.2.2.6 NOP Idioms

NOP instruction is often used for padding or alignment purposes. The Goldmont and later microarchitecture has hardware support for NOP handling by marking the NOP as completed without allocating it into the reservation station. This saves execution resources and bandwidth. Retirement resource is still needed for the eliminated NOP.


This nop idiom is very special however, since it isn't just about efficiency: if it wasn't a nop idiom, xchg eax, eax would not be a nop at all, it would clear the upper 32 bits, as xchg ebx, ebx does (or any other register other than eax).


> It turns out that xor eax, eax is assigned to opcode 0x90, which is better known to most people as NOP.

This can't be true, since xoring a register with itself zeroes that register, and zeroing a register can't possibly be a general NOP instruction.

XOR also sets flags, another thing NOPs can't do.


That's because it's actually xchg eax, eax not xor eax, eax.


xor eax,eax is a zero (or dependency breaking) idiom. Zero idioms are detected and removed by the renamer. So they have no execution latency.


I recently did a video series on x86 using nasm and GCC. It only covers 32-bit but I think that's a better way to start since of the conventions are simpler (especially when interfacing with C code).

https://www.youtube.com/playlist?list=PLmxT2pVYo5LB5EzTPZGfF...


> It should be noted that Unicode uses 2 bytes for each character.

But you're programming "with Ubuntu", not Windows. IMHO you could safely assume/recommend UTF-8.


Just a reminder BTW that since version 2.0 (1996), Unicode is not an encoding scheme but a character set (I avoid the confusing “charset” word on purpose). Therefore, Unicode does not use any number of bytes: it only assigns code points to characters.

Windows used to use the UCS-2 encoding scheme which indeed used 2 bytes for each character, but since Windows 2000, it uses UTF-16 instead, which like UTF-8 uses a variable number of bytes per character.


Indeed. "Unicode" is an abstract character set, it doesn't "use" any bytes. A specific encoding does.


Even with UTF-16 that quote is incorrect, due to surrogate pairs. It's only correct for UCS-2, and even then, only if you take 'characters' to mean 'codepoints', and take 'Unicode' to mean 'a specific Unicode encoding'.


If you have a particular need for learning x86 assembly this is great, however I want to point out that if you just want to learn an assembly for the sake of learning very low level development and understanding how CPUs work at the lowest software level I would not advise picking x86. It's a crufty, messy, overcomplicated, plagued with decades of shifting paradigms in CPU ISA design and still maintained for backward-compatibility's sake.

If you value your time and sanity consider learning something smaller and more reasonable such as AVR assembly (the kind of controllers you find on Arduinos). It's a lot smaller and you don't even need an OS, you can truly do everything from scratch. If you want something a little more advanced ARM is an obvious target, it's got all the features you'd expect from a modern CPU (SIMD, floating point etc...) and it's not nearly as crazy as x86 assembly.


Learning doesn't happen in a vacuum though. People generally try to learn things that will be handy for them. AVR or ARM assembly are far less handy to know than x86, so telling people to ditch x86 and learn those instead kind of misses the point.


Sure x86 is more ubiquitous (or is it really? Many arm cores and embedded AVR are released embedded) but the number of times I've needed asm in x86 is way less than I've had with those embedded platforms where a byte is a byte ... (ditto for cycles) edit:typo


> Sure x86 is more ubiquitous (or is it really? Many arm cores and embedded AVR are released embedded)

I can't make sense of this. Is your logic "there are more ARM CPUs than x86 CPUs => there are more programmers dealing with ARM assembly than with x86 assembly"?

> but the number of times I've needed asm in x86 is way less than I've had with those embedded platforms

Sure, this is your situation. But are you claiming your situation is typical? In your mind do the majority of programmers who deal with assembly deal with embedded platforms as much as you do?


You're asking leading questions but they don't really...seem to lead anywhere.

You suggested x86 assembly was more useful to learn than ARM or AVR ("AVR or ARM assembly are far less handy to know than x86"), but provided no justification for that claim - and yet seem to be extremely demanding on similar claims of others.

So what's your situation? Are you claiming that it's typical?

The logic of: "desktop CPUs are rarely coded in assembly, embedded CPUs are absolutely everywhere and often coded in assembly, the latter assembly languages are more useful to know" is extremely obvious and straightforward. I can't make sense of your opposition to it, especially since you've given absolutely no substance to back up your contrarian position.


These types of resources are awesome though I've always wished that they at least briefly touched upon the x86 memory model/consistency. Understanding the concept of memory barriers should be considered fundamental.


Out of curiosity, what are the main reasons to need to actually write assembler in 2018? Compilers? Games? Genuinely curious


1. When you have no choice in languages: Homebrew development for, say, the Nintendo Game Boy happens by and large in assembly language.

2. Optimization: Some critial code sections or cryptographic operations can be massively sped up with hand-crafted assembly[1].

3. Obfuscation: Sometimes, you have to drop to assembly for obfuscation. This is especially the case if you need to/want to fool the Hex-Rays Decompiler. Applications include malware and digital rights management (some might argue the latter is just a sub-species of malware). This also includes writing your own custom assembly language for a custom virtual ISA for the purpose of code obfuscation.

4. Embedded platforms: Some very obscure microcontrollers may only have support for pure assembly, though at least C89 compilers being available seems to be the norm.

5. Education: It can be very enlightening to understand how things work below your "home" layer of abstraction. Some parts of C seem to be much easier to understand once you have a firm grasp on a common assembly language.

6. Compilers: Self-evident, you kind of have to understand the code you emit.

[1] See e.g. this benchmark https://monocypher.org/speed -- authenticated encryption on libsodium gets an improvement of over 300% from hand-crafted assembly over the portable C implementation


Another common use case is for writing shellcode and other exploit-related work.


Low-level system code (particularly when you're doing things like servicing interrupts) needs to be written in assembler. The very early boot phase is even more annoying because you're starting out in one of the 16-bit modes and working your way to changing to x86-64 and virtual memory.

Outside of system code, the main use for assembly is trying to maximize performance on hot loops. Your optimized matrix-multiply routine, or media decoding kernel, could well be written in assembly. I've seen a few cases where people do things such as manual calling conventions in assembly as well.


>you're starting out in one of the 16-bit modes and working your way to changing to x86-64 and virtual memory

"ontogeny recapitulates phylogeny"

https://en.wikipedia.org/wiki/Recapitulation_theory


The book reasons that learning assembly also teaches the fundamentals of computer architecture.

1.3.1 Gain a Better Understanding of Architecture Issues 1.3.1 Understanding the Tool Chain 1.3.1 Improve Algorithm Development Skills 1.3.1 Improves Understanding of Functions/Procedures 1.3.1 Gain an Understanding of I/O Buffering 1.3.1 Understand Compiler Scope 1.3.1 Introduction to Multi-processing Concepts 1.3.1 Introduction to Interrupt Processing Concepts

The book uses the 1.3.1 heading for each and I'm too lazy to change them.

Reasons to (maybe/arguably) write assembly:

1) You're bringing up a new board, your bootloader is partially written but you need some customization for the real-time OS you're using. It can be advantageous to do this in assembly 2) You're dealing with some particularly old hardware and (ab)using it for some commercial purpose

Of course, I can imagine that for each you'll have someone obstinately state there's no need to use assembly because of some gcc feature. More than one way to get things done, and most use the tools they're comfortable with.


Two use cases I've had in recent years (technically not 2018, but well…): - optimization of simple functions that are called a huge amount of times (the only viable alternative to assembly being intrinsics that come with their own portability constraints), - just in time compilation.

Additionally, knowing assembly can help a lot when debugging weird crashes in code written in higher-level languages. gdb's asm layout is an awesome resource if you know how to use it, but someone who's never used assembly before will probably not even consider using it.


One practical "high level" example would be analyzing function/type specialization in Julia and understanding the resulting code that can be easily inspected.

See eg (from 2013, but at a glance seem to outline the ideas well):

http://blog.leahhanson.us/post/julia/julia-introspects.html

For Julia - it's fairly easy to work "high up" most of the time - and drop down inspecting the code (and profiling) - and unlike many other languages, even high pref libraries will often be julia all the way down (unlike tcl/perl/ruby/python + c/fortran etc).

https://docs.julialang.org/en/stable/manual/profile/

https://docs.julialang.org/en/stable/devdocs/reflection/

Similar for sbcl (common lisp) or various languages that target/use llvm. And obviously for looking at output of optimizing compilers for c, c++, rust, d and similar "medium to high" level languages (Pascal, Ada, crystal, etc).


Assembler is still popular for IBM mainframes. The current version has been around since '92 and is called High Level Assembler.

It's popular partially because people have codebases that they started writing in the 70's or 80's in assembler that they maintain to this day because it's cheaper than switching it all over to a new language. Pretty much the same reason that COBOL is still around.

z/OS (the OS that runs on IBM mainframes) also exposes a lot of it's functionality through HLASM, so it's far more convenient to use than x86 assembly.

For whatever reason, C also never really caught on as ubiquitously as it did in the PC world. Probably because IBM themselves generally used their proprietary PL/S language instead back in the 70's and 80's.

https://en.wikipedia.org/wiki/IBM_High_Level_Assembler


UNIX was to C what the browser is to JavaScript.

Naturally it didn't caught on mainframes, already using better system languages.

It is also barely used on Unisys ClearPath MCP.


This is fascinating, I didn't know there was a such thing as a high level assembly language, but IBM High Level Assembler has IF/ELSE/ENDIF, and several types of built in loops. I wonder how similar it is to writing in C. One thing this page doesn't mention is structured data types, I suppose these would still have to be implicit like in other assembly languages.


I used to write assembly language programs back in the 70s while working on process control computers (Texas Instruments 990, TI 980, TI 960 etc). At one point I was using an assembler that supported complex macros (macros that could be expanded into other macro definitions and supported counters and so forth) so I developed a library of macros that supported nested it-then-else and loops. They made the code a bit easier to read, but it was probably not worth the trouble.

The problem with a high level assembly language is that it really isn't very high level; your program still rests right on the hardware for a reason, and usually that reason is a concern about using registers and instructions very carefully for performance or interacting with hardware at ring 0 level where you are managing the virtual memory page table or handling network device interrupts or system IPC and so forth.

In my experience (as an IBM AIX kernel architect, virtual memory architect, and distributed file system design), sometimes one needs assembly language, but it was always a relief to get up to the level of C programming where the programming teams were much more productive. Much OS development has been done with C and it really was the best choice for most of the kernel work going on back then in my opinion.

AIX was an interesting project. The hardware didn't exist in final form while AIX was being developed. The challenge for our group was developing/porting a whole OS, the kernel and user space code, that would run on hardware being developed at the same time. IBM's language PL/1 was an important mainframe language, but seemed a poor fit for systems programming. However, IBM had state of the art compilers for it and a strong research interest in compilers for RISC machines (like the POWER processors, the first of which outside of IBM's research processors would run AIX 1); so they took the 80% of PL/1 that seemed useful to systems programming and wrote a compiler for PL.8 (.8 of PL/1) to run on the hypothetical RISC system my group was developing.

We were developing a Unix system on the RISC hardware, but we didn't have a stable target (page table sizes, floating point hardware traps, etc.) and couldn't afford to wait for the hardware before starting development. The approach my group took was to write the lowest level parts of the kernel in PL/.8 so that as the hardware changed the compiler could be tweaked to take advantage of it easier than rewriting low level assembly language code. The high-level parts of the kernel (coming from licensed Unix code) could then be mated to the low level code and wouldn't be affected by the changes in the hardware that happened over time.

I wasn't in charge of these decisions, so I don't really know enough about them to say that this was better or worse than just using C and assembly language as is normally done in most OS development, but I do see some of the trade offs that had to be made.

An aside on higher level system programming languages, I know that some on HN say that C is a terrible choice for OS development. Perhaps there are better choices (now), but I see things a bit differently. At the time there were not obvious choices that were better. We didn't have Rust or even C++. We had C, Pascal, MODULA, PL/1, and a few other unlikely choices (e.g. ALGOL-68, LISP, JOVIAL). C is a big improvement over assembly language, but it isn't clear to me that PASCAL or MODULA, or LISP or the others available back then were better choices than C. Unix became a kind of proof of C's suitability as a OS development language. Before that, PL/1 had been used to develop Multics, but Multics failed as a commercial OS (despite it's subsequent influence on OS design). C was simpler than PL/1. Algol had been used by Burroughs, but it was a non-standard version of Algol specially designed to work with the rather novel hardware.

C is flawed but none of the other candidates for a language higher level than assembly language for system programming was without flaws and they hadn't produced something like Unix. The C used in the Unix kernel was the real K&R C; it was the same language that ran on many platforms. Other attempts at a high level systems programming language based on Lisp, Smalltalk, Pascal, Algol, and IBM's proprietary subsets of PL/1 were all languages modified for the hardware they ran on. C seemed to be just low enough to work for most of the kernel's requirements without special extensions.

I always appreciate pjmlp's comments reminding HN readers about Pascal or Modula. I liked those languages; I'm very familiar with them. I still think C was the correct language for system programming in the past. Today, I'm more interested in seeing what happens with Rust for kernel development and Go for non-kernel systems programming.


Thanks very much for the insightful comment.

Also interesting to learn that PL.8 also had a shot at the Aix kernel. I got all the PL.8 papers I could get my hands on.

Regarding UNIX and C's adoption, I think that had Bell Labs been allowed to go commercial from day one with UNIX and history of C's adoption would have been quite different.


The IF/ELSE stuff is similar to the preprocessor macros people write in C. They basically generate HLASM code on the fly based on certain flags being passed to the program and what not.

If you're curious what a simple program ends up looking like, I've got one I wrote that copies the contents of one file into another file up on GitLab. Lots of loading registers and what not.

https://gitlab.com/thisisnickwilson/zos/blob/master/IO


Thanks, this is pretty interesting to read through, and your comments are very helpful. I didn't realize this language has no comment syntax, but I guess it makes sense since each opcode probably has a fixed number of parameters and anything after that can be safely assumed to be comments. Neat stuff.


Yes, it was quite common in the 70 and 80's.

MS-DOS and Amiga assemblers (TASM, MASM, DevPac) had quite powerful macro capabilities, many of which gas still doesn't support.

Here is the documentation for MASM high level constructs, it is still distributed with Visual Studio.

https://msdn.microsoft.com/en-us/library/8t163bt0.aspx


NASM also has macro capability, though I'm not sure how it compares to the others you mentioned (EDIT or to gas). On the plus side, it's available on Linux.

https://www.nasm.us/doc/nasmdoc4.html


Also not sure, they are supposed to be quite good, but by the time NASM came around, my focus was no longer on pure Assembly programming, so I never used it in anger.


If you are developing a compiler you will need to understand the output assembly. Learning to program in it is one way to get to know it. There is also plenty of little code snippets around run-time libraries and OS that are written directly in assembly (e.g. crt0.o, crt1.o & friends).

Basically: somebody has to provide the infrastructure between assembly and high level languages, so that everyone else can write in HLL.

Optimization of games or HPC workloads is also a valid use-case, but today you probably want to use intrinsics in C instead.


Not necessarily regarding the latter. Intrinsics only help with the part where your code doesn't fit the instruction set because the latter can encode operations the former can't natively, and the compiler might not always be able to detect such structures.Als Also, if you use assembler you can take better care about scheduling instructions around loops in a way to get very high throughput, because you might see that k-ary search happens to be faster than binary search because you can hide some of the latency you'd otherwise have. Also some other issues requiring higher-level restructuring once you see that your first implementation stalls the processor too hard. Once you reach GPU cores and other non-OoO cores you need to be careful with memory load stalls, and try to use some staging memory or so and e.g. load the data for the iteration after the next iteration into staging, then fetch it into registers in the next iteration, so it is ready in the one where it will be processed. You need such optimization to reach e.g. high throughput in matrix multiplication, because you need to minimize the instructions that use the ALU but don't do actual result-affecting calculations (and instead just e.g. loop/array index manipulation and such). The ability to be good with this is reduced in OoO machines with register renaming, sadly, and I'd like a core that is fast for sequential code due to low-delay high-clock operation, yet exposes the whole register set to the assembly programmer... One of the recent chip startups seems to target such a processor. Obviously also the Mill, but that's vaporware since forever...


Some mainframe and workstation architectures never had direct support for Assembly instructions, only intrinsics on their system languages.

Burroughs, Xerox PARC workstations, ETHZ workstations as a couple of examples.


For me, because it is interesting. It is neat to understand how things work at a lower-level.

Knowing assembly also helps out when doing capture the flag challenges, specifically reverse engineering and binary exploitation.


Lots of reasons, but one would be processor specific optimizations and/or extensions for performance reasons.

OpenSSL, for example, has a fair amount of ASM in it for various processors to speed things up.

See https://software.intel.com/en-us/articles/improving-openssl-... for some examples of what they specifically did for x86/64.


I've seen it used a lot in I/O from micro-controllers where speed/timing is important. The arduino C++ library for instance does some weird things under the hood when reading from an Analog-Digital-Converter (read the voltage at some pin), whereas with assembly you know that you can read in exactly 1 instruction. When timing or speed are important, that level of control is useful.


Knowing assembly made me a better C programmer. I still remember this tech interview I had with a Seattle based company (no, not MSFT) many years ago. The interviewer was trying to twist and turn all kinds of C-pointer questions, but knowing how the code translated to assembly made the whole thing a breeze. I got the job but didn't take it, and I somewhat regret that.


Perhaps wanting to know how the computer works.


Firmware and driver development. Not in all cases, but you're very likely to encounter it in those domains.


I work on compilers and I hate assembly.


shellcode, maybe, or really tight inner loops that need to use weird vector instructions.


It's not the vector instructions, it's the careful scheduling of instructions to spend just enough time manipulating pointers when you want to crunch actual data. All while respecting dependency chains and memory stall times. (Hyperthreading helps a lot with the latter, see Nvidia Maxas (nervana systems now) for details on how flexible number of threads benefit weighing of memory load stall hiding vs. register pressure causing more data shuffling.


Take a look at projects such as pixman [0]

But other than highly opimized graphics manipulation I can't really think of any other good application for assembly.

[0] http://www.pixman.org/


Not necessarily "write" assembler, but understand it, and, as always we are not at the end of history. New dominant CPU architecture could arise, people understanding and writing assembler will always be needed.


Compilers, embedded systems and similar low level work, reverse engineering where you don't have the binary (e.g. malware), and that kind of thing.


Tasks involve reading assembly are more common, and it's hard to understand it very well if you've never written any.


Cryptography, not only for speed, but also for ensuring implementations are constant time.


Embedded devices?


I have never had the need to write assembly on any embedded device. There are C (or other high language) compilers for just about any architecture. For very specific instructions (i.e. interrupt jumps) there is usually available as a compiler alias/feature.


If you want to turn an AVR/AtMega32 into a spectrum analyzer, you need to massage the x, y and z registers that support implicit incement/decrement adressing. C fails hard for such code.


better performance and millions of dollars of monthly savings in cost (in a bank) - not x86 though


reverse-engineering ?


Maybe not that much to write it, but it's very valuable to understand the disassembly when debugging.


To learn what programming really is and how systems really work.

You can't really understand programming without having built a compiler. Or at the very least having done assembly programming. I don't think you can truly understand any basic programming concept like variable, pointer, reference, etc without learning compilers or assembly. Yes, a pointer "points" to something. But what does that really mean? When you realize that it's all just higher level concepts of specific assembly mechanisms of accessing data, you'll have a eureka moment. Don't see how you do that without digging into compilers/assembly.

And ultimately, it'll make you a better overall programmer since compilers/assembly is fundamental to every programming language.


> You can't really understand programming without having built a compiler. Or at the very least having done assembly programming. I don't think you can truly understand any basic programming concept like variable, pointer, reference, etc without learning compilers or assembly.

This is just elitism with a little bit of "back in my day" thrown in.

I can assure you that I fully understood the concepts of pointers, variables, and references without knowing anything about assembly language or compilers. The concepts are not that difficult to grasp.


> This is just elitism with a little bit of "back in my day" thrown in.

That wasn't my intention and I apologize if it came off that way. Also, I'm not that old and I certainly didn't start programming with assembly. My programming experience began with OOPs ( Java, C#, C++ ). Didn't really touch assembly until I went to college. Also, compilers is a basic core course you have to take to get a CS degree. How is something everyone has to learn elitist?

> I can assure you that I fully understood the concepts of pointers, variables, and references without knowing anything about assembly language or compilers. The concepts are not that difficult to grasp.

The concepts were fairly difficult for me. Especially pointers. I thought I understood it but then later on realized I didn't. Maybe other people have their eureka moments sooner. For me it happened when I built a compiler and did some assembly programming.

I just think knowing how things work under the hood will make you are better programmer. Especially something like assembly since all programming languages gets translated to assembly whether you are using C# or Haskell or Lisp or ML. I wasn't trying to offend anyone. But I'm sorry if others were offended by my comment. That was not my intention.


> Also, compilers is a basic core course you have to take to get a CS degree.

I'm not sure if this is true. It wasn't required for my degree.

I wasn't offended by your comment, though. I just disagree with you. No hard feelings about that.

Anyways, I agree that knowing how things work under the hood will make you a better programmer. I just disagree with how you phrased that sentiment in your original comment.


For context, I agree with you that understanding compilers is very important to a deep understanding of the tools used by programmers. Knowing how languages are implemented sheds light into why certain design decisions are made and why certain language features behave the way they do.

That said...

> Also, compilers is a basic core course you have to take to get a CS degree. How is something everyone has to learn elitist?

Not everyone is lucky enough to get a CS degree.

Some people come from backgrounds where CS was not in their worldview growing up. Or they were discouraged from pursuing CS despite the subject being interesting to them. Or they tried taking CS, but they had to deal with external factors that prevented them from continuing.

To discount these individuals from making meaningful contributions to a programming endeavor is short-sighted. Their particular experience may allow them to be better at implementing the correct solution, even if they are not as strong of a programmer. Or they may be a very fast learner, and given more time and mentorship, they will learn about compilers, but they can be strong contributors until then.


Pointers deliberately hide these details from you by obscuring where things are actually located; presenting a unified memory model. To really understand what's going on you need to know how the stack is laid out, how structures work, and how functions are called.


Hmm , this is great. But to understand where variables are placed in the stack and how stack is managed your need to play with Assembly. Pointers are pointers but when you play with using Assembly then your know why some languages exposes only reference and lack pointer arithmetic. It sounds easy but it help you to build compilers or whatever.


Reversing the NSA's latest bad idea to figure out how to stop it from taking over your network.


For those interested...

Up until the early 2000's ish, Randall Hyde used to develop a ton of teaching materials and libraries for intel.

You can find most of if here: http://www.plantation-productions.com/Webster/index.html


Thankfully it uses Intel syntax.


I was skeptical but this looks like an amazing resource!


Why Ubuntu? IE why is any of it distro-specific?


syscalls for OS maybe, but why Ubuntu... who knows


Likely because it's the most popular among target group of students at time of writing. Choosing a distribution allows one to discuss complier, assembler, gdb, text editor with known versions and known availability reducing admin overhead so students can get on with it. It's not a terrible way to go whatever one thinks of the most popular distribution at any given point in time.


The TOC is very tempting. Thanks for the link


Thanks for sharing!


Thanks for this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: