Java bytecode fundamentals

verroq · on Jan 8, 2011

Having never touched Jave bytecode. I have to say I was pretty thrown off. Perhaps I need a "Java bytecode fundamentals" fundamentals.

barrkel · on Jan 8, 2011

I don't know what you're specifically missing, but it is important to be aware of the basics of how RPN works, i.e. reverse polish notation, as stack machines in the JVM, .NET etc. are essentially RPN calculators with a lot of extra bits and bobs.

5 + 4 * 2, in RPN with an explicit push opcode, looks like this:

    push 5
    push 4
    push 2
    multiply
    add

Each operation manipulates a stack. Here it is, marked up with the stack:

    push 5
    // now stack is: 5 (top of stack is leftmost)
    push 4
    // now stack is: 4 5
    push 2
    // stack: 2 4 5
    multiply // pop top two on stack, multiply, and push
    // stack: 8 5
    add
    // stack: 13

Add in a few more opcodes, and instead of pushing simple integer constants, you can push an object reference instead; and instead of a multiply or add opcode, you might have a getfield or putfield opcode, which takes an argument (in the opcode stream, rather than on the stack) indicating which field to read or write. Similarly for functions to call, constructors to call when creating an object, etc.

JVM uses an explicitly typed instruction set. There are variants of each operation, such as those starting with a (e.g. aload - JVM spells push as load) for objects, starting with i for integers, and f for floats (so iadd is different from fadd). Part of the verification the JVM does when loading classes is to simulate the stack operations and make sure that the operations match up with their types; the type associated with each opcode is technically redundant, and was probably done that way to help in making simple interpreters slightly faster. .NET doesn't do this; it has a single add instruction etc., and instead infers the type of the arguments by what was pushed onto the stack. .NET stack code requires at least an analysis pass to interpret with any efficiency.

Stormbringer · on Jan 8, 2011

Depends what you want to do with it.

Do you just want to pull apart a class and twiddle with its internal strings (e.g. to retrofit some duck-typing) or do you want to manipulate the code?

If the latter, I believe it is a stack based language, so go learn you some forth, first.

If you want to implement some dynamic language running on the vm, then you need to dive in a lot more deeply than this.

erikb · on Jan 8, 2011

Well, it says nothing more then the original article by Peter Haggar, does it?

http://www.ibm.com/developerworks/ibm/library/it-haggar_byte...

arhan · on Jan 8, 2011

actually it doesn't. it is just a rewised version of the original article with some updates and other examples.

the generated bytecode is a bit different coz Haggar's article is a bit old..

__Joker · on Jan 12, 2011

May be coincidence, but I like the work rewised. The article getting wised up !!!