Some of the arm instruction set manuals seem harder to come by than others. Recently I reverse engineered one of the firmwares of the ImmersionRC Vortex 150 racing quadcopter (it uses an stm32f3 chip, arm cortex instruction set). It was pretty hard to come by a copy of the full instruction set manual - i just assumed that kind of thing would be a quick google away. Eventually i got there and made my changes but it was harder to get going than i expected but for different reasons than i was anticipating.
For documentation of the devices in a particular SoC you'll need the reference manual from the SoC manufacturer -- they of course vary in how easy it is to find those docs.
You can also compile assembly with gcc rather than as + ld.
And you can output assembly from C programs
gcc -S <source file>
You can disassemble a binary to see how it actually looks. The resulting binary is much larger than your assembly code.
objdump -d <binary file>
Many disassemblers will show you friendlier output than objdump. I use ht editor (packaged in Debian based distros as ht), an open source clone of Hiew. In ht, press F6 -> select image, and you will have an easy to follow disassembled version of a binary that you can edit, if you happen to know opcodes.
It must be possible to change instruction sets when branching, in order to call or return from a function in the opposite instruction set.
There are branching instructions that can't change instruction sets for two reasons:
1. A direct branch can have more range if you can assume it's going to a 4-byte aligned address.
2. For compatibility with code written for ARM7DI and other really old pre-thumb processors. Since the original direct branch instructions ignored the bottom two bits of the target address, new direct branch instructions were added rather than change the behavior of the existing ones.
ARM and Thumb are just different encodings of the same instructions, mostly. Having switchover be done implictly via branch instructions is convenient for interworking -- you can link an ARM library with a Thumb application and it all just works, with function addresses having the low bit set for Thumb and clear for ARM. Generated code doesn't need to care beyond making sure it is using the right interworking instructions for call and return.
This is distinct from x86 32 vs 64 bit and ARM 32 vs 64 bit: in both those cases there's really a different processor mode with extra registers and so forth, and switchover is correspondingly more involved.
I've done several embedded projects in the past 5-10 years years. Mostly it's a matter of writing just enough assembly to get things running, and flipping into C when humanly possible because life is short.
Talk to old video game veterans [waves]. We wrote tons of assembly because there wasn't much choice. But these were largely 8-bit processors, the compilers weren't any good and the code space was constrained. But I was talking with a guy recently who said he'd written hundreds of thousands of lines of 68K assembly, and I have no idea why you would do that because 68K C compilers, Pascal compilers, anything compilers were pretty good even back in the benighted 80s. Well, better than assembly.
Of course, once you flip into C you're still not in an environment where you have much of a runtime (I kept having to explain to a contractor why he couldn't do heap operations in an early boot phase, much less expect the results to be addressable later).
Even in a very code-space sensitive project, I started off with a tiny bit of assembly, then made everything more or less functional in C, then went back and hand-coded routines as we needed to get bytes back: http://www.dadhacker.com/blog/?p=1911
My level of fascination with an architecture can be dramatically by the quality of the available tooling. If there's nothing then that's fine, green fields are great fun. But if the tooling sucks (TI and your DSP software, I'm looking at you) or is horridly expensive, then I'm usually going to look for excuses to use something else.
Since you ask, I use lots of assembly level programming for digital audio signal processing. Some ARM instructions offer special DSP specific features like saturation and fractional arithmetic that have no equivalent int C, so it justifies using assembly. Smaller ARM cores without NEON offer some kind of miniature SIMD by operating on two 16-bit or four 8bit numbers, these can speed up things as well. To take full advantage of these it is best to write some portion of code in assembly. I usually wrap a processing sequence into a gcc-style inline assembly macro, this way I can easily compare with a C-only macro and maintain portability, and it saves me from having to deal with the complete instruction set and calling conventions.
I've also been working with DSP processors in audio processing. The only time I've used actual inline assembly was storing stack pointer to measure stack usage.
For saturating arithmetic and other stuff we used compiler intrinsics, which freed us from handling register allocation, stack management etc by hand. On that processor there weren't special instructions for saturating arithmetic but a flag was used instead, the compiler also kept track of that one too.
We did read the assembly result and tweaked C code until assembly looked like what was expected, though.
Technically, assembly is not required for measuring stack: just use regular C address of any local variable/function argument (minus an address of some local variable in main) to have some approximation of stack usage.
Like kabdib, I've done far more reading disassembly than writing it, because for most purposes it's easier to use C as a macro assembler.
I used to work next to someone programming Tilera manycore systems in assembler, because that's a sufficiently weird architecture that you need to do that in order to see any benefit. This is probably why manycore has never really taken off.
I do have to drop little snippets of asm in to code for bare metal stuff, but generally looking at the output of the compiler to see if it and I are on the same page is my main use.
However, as we may well be about to switch our IoT device to ARM from AVR, this might be a useful primer...
I'm only a couple of pages in but already a lot of this guide is incorrect for ARMV8-A (A64). Much is different, e.g. no thumb mode, no directly accessible program counter, no load multiple, no PUSH/POP, different stack pointer, etc. Looks good if you're interested in older ARM ISAs, which is probably more applicable for IoT etc.
Lol, that's because it's about ARMv6. She even points it out in Part 1 or so (not sure where but I saw it somewhere). If you wanna include all differences of ARM that would be a long tutorial.
I think you'll find the only reference to v6 is where the register names are explained, e.g. <= v7: r0, r1, etc. This is another thing it gets wrong, i.e. A64 registers can be x0, X1, etc, but also the lower 32 bits can be addressed via w0, w1, etc. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....
Page 1, right before the table cross-referencing ISA with ARM family. "The examples in this tutorial were created on an 32-bit ARMv6 (Raspberry Pi 1)."
Easy to miss, particularly if you are scanning through.
What do you mean by "AT&T syntax"? Neither 68000 nor SPARC use it, because it's an x86 thing. Do you really mean "operand order"? There's more to an assembly language syntax than just the order of the operands.
AT&T syntax is mainly about operand order, but there are assembler directives and constructs specific to AT&T as found in as(1) on any traditional UNIX, including illumos based ones. That every assembler has his own syntax is nothing new, compare and contrast Master SEKA with ASM-One on Amiga, or MASM, TASM or nasm, for example. AT&T as(1) syntax has nothing to do with x86. Any code written for SVR4 as(1) will be using AT&T syntax, irrespective of processor architecture. ISA used with AT&T as(1) will still be that of the processor, of course, but instead of things like a0 or d0, it'll be %a0 and %d0, for example.
I was very surprised when I'd discovered that the assembly in the Linux kernel sources for x86 was written in the AT&T syntax and the assembly for ARM was not. I always thought that the AT&T syntax was supposed to be independent from an architecture.
"Architecture independent assembly language". Just say that out loud and set it sink in for a moment. Anyways, glad that there's no AT&T abomination for ARM, too.