Understanding Java Bytecode

By Cameron Aavik

A bit about me

  • Second Year Software Engineering Student
  • Minecraft Modder for about 2 years
  • Developer of the mod sprinkles_for_vanilla which has just over 100,000 downloads.
  • Currently working as an intern at a local startup, Redback Technologies

Why should you learn Java bytecode?

  • Any kind of analysis on compiled Java classes (.class files)
  • JVM Languages
  • Debugging very complex, low-level issues
  • Squeezing out extra performance
  • More comprehensive understanding of the inner workings of Java.
  • Learn what Java really does under the cover when compiling

The .class file

The .class file is generated whenever you compile your Java code. These files are also created when compiling any other JVM language, this is why you can have two JVM languages interact with each other as they do so at the compiled level.

 

This file is loaded by the Java Virtual Machine and then interpreted.

Hello World

public class HelloWorld
{
    public static void main(String[] args)
    {
        System.out.println("Hello World")
    }
}

Hello World

public class HelloWorld
{
    public static void main(String[] args)
    {
        System.out.println("Hello World")
    }
}

Run javap -v HelloWorld

public class HelloWorld
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #6.#15         // java/lang/Object."<init>":()V
   #2 = Fieldref           #16.#17        // java/lang/System.out:Ljava/io/PrintStream;
   #3 = String             #18            // Hello World
   #4 = Methodref          #19.#20        // java/io/PrintStream.println:(Ljava/lang/String;)V
   #5 = Class              #21            // HelloWorld
   #6 = Class              #22            // java/lang/Object
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               main
  #12 = Utf8               ([Ljava/lang/String;)V
  #13 = Utf8               SourceFile
  #14 = Utf8               HelloWorld.java
  #15 = NameAndType        #7:#8          // "<init>":()V
  #16 = Class              #23            // java/lang/System
  #17 = NameAndType        #24:#25        // out:Ljava/io/PrintStream;
  #18 = Utf8               Hello World
  #19 = Class              #26            // java/io/PrintStream
  #20 = NameAndType        #27:#28        // println:(Ljava/lang/String;)V
  #21 = Utf8               HelloWorld
  #22 = Utf8               java/lang/Object
  #23 = Utf8               java/lang/System
  #24 = Utf8               out
  #25 = Utf8               Ljava/io/PrintStream;
  #26 = Utf8               java/io/PrintStream
  #27 = Utf8               println
  #28 = Utf8               (Ljava/lang/String;)V
{
  public HelloWorld();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3                  // String Hello World
         5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8
}
SourceFile: "HelloWorld.java"

How the JVM runs the bytecode

  • Whenever a method is run it creates a new frame
  • A frame consists of an array of local variables, an operand stack, and a reference to the class's constant pool
  • The size of the operand stack and the array of local variables are determined at compile time
  • A frame is destroyed when a method invocation completes
  • The JVM stack for a given thread is a stack of frames.

Local Variable Table

  • The local variable table can hold the java primitive types or a reference to an instance of an object.
  • long and double primitive types take up two values in the local variable table as each value in the local variable table is 32 bits
  • Other number types such as booleans, bytes, chars and shorts are still stored as 32 bits.
  • The local variable table is addressed by indexing starting at 0
  • If a method is non-static, the index represented by 0 will be a reference to the instance of the object.
  • If a method takes arguments they will then automatically fill into the start of the local variable table

Operand Stack

  • This is a stack which the bytecode interacts with.
  • It is used to push data around and to load the arguments into the bytecode operations.
  • If you wanted to add two numbers, you would add both of them to the stack, then call the iadd bytecode and it will read (while popping) the top two values on the stack and then push the resultant value on the top of the stack
  • As with the local variable table, each value on the stack is 32 bits which means that longs and doubles take up two values.

Constant Pool

  • All constants used in the code are stored in the classes constant pool. The constant pool is shared among all methods in that class.
  • A frame will have a reference to that class's constant pool.
  • Examples of constants include:
    • Strings
    • Names of methods
    • Names of classes
    • Method descriptors

Frame for Hello World

  • As we saw earlier, we could view the class format in a readable manner using javap.
  • From that we were able to get the following bytecode
public HelloWorld();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3                  // String Hello World
         5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8

The constructor

  • Even though we didn't put it in the java code, a constructor was created.
  • We see that the stack and local variable table are set to a maximum of 1 value
  • Because a constructor is non-static, it automatically adds a reference to this in the local variable table
  • The bytecode:
    • aload_0 will load the 0th element from the local variable table and put it on the stack. In this case, the 0th element is this
    • invokespecial is an operation which calls a special method. It uses the method reference which is stored at index 1 in the constant pool as shown by the #1. The comment to the right shows what the value of that method reference is. In this case it is the constructor for object. This line is the equivalent of calling super();
    • The reason that it loaded this in the first instruction was so it could call super() on this. Because of that, this is popped from the stack.
    • return will return void.
public HelloWorld();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1 // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

The main method

  • We can see here that the main method has a stack size of 2 and a local variable size of 1.
  • Because this is a static method, there is no this value in the local variable table, but there is a String array in the arguments. This is now the 0th element of the local variable table
  • The bytecode:
    • getstatic is an operation which will retrieve a static field from another class. In this case it is retrieving the field out from System. This field reference is put on the stack
    • ldc loads a constant from the constant pool. In this case it is at index #3 which is for "Hello World"
    • invokevirtual will call a non-static, non-special method given by a method reference in the constant pool. In this case, it is PrintStream.println. This method takes in an argument and is also non-static. Therefore it reads the top two values on the stack. The one we pushed on first is the instance of PrintStream and the argument is supplied by the second value we pushed.
    • The above method invocation will then have popped off everything from the stack
    • return returns void
public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3 // String Hello World
         5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8

Some things to note

  • You can see on the bottom there is a LineNumberTable. This is how you are able to know what line an error/exception occurred on in the stack trace. It simply says what line of the original java file maps to a given offset in the bytecode
  • The number before an operation in the bytecode, for example 3 before the ldc operation, specifies the offset from the start of the method in the bytecode. This is useful because it allows us to see how many bytes each operation takes up.
  • Every bytecode instruction has a byte for its opcode, therefore there is a maximum of 256 possible instructions. The range of bytes from 0xCB - 0xFD are currently unused however.
  • Each bytecode instruction can have 0 or more operands. In this case, getstatic has 2 bytes for its operands. This is actually a 16 bit index to the constant pool. As a result of this, a constant pool can have a maximum of 32,768 constants
public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3 // String Hello World
         5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8

Some things to note

  • Methods and Types have descriptors.
  • In this case the method descriptor for the main class is ([Ljava/lang/String;)V
  • The method descriptor is defined by (Argument-Types)ReturnType
  • There is one argument type for this main method, it is a String[]. There is also a single return type which is void.
  • The type descriptor for String[] is given by "[Ljava/lang/String;". The [ at the start indicates it is an array, and "Ljava/lang/String;" indicates that it is a reference to a String
public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2 // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #3 // String Hello World
         5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 5: 0
        line 6: 8
Type Descriptor
array [Type
boolean Z
byte B
char C
double D
float F
int I
long J
reference LClassname;
short S

Bytecode instructions

Instruction Name Other Bytes Description
nop Perform no operation
aconst_null Push null reference onto stack
iconst_0 Load int 0 onto stack
iconst_1 Load int 1 onto stack
iconst_m1 Load int -1 onto stack
lconst_0 Load long 0 onto stack
fconst_0 Load float 0.0f onto stack
bipush byte Push a byte onto the stack
sipush byte1, byte2 Push a short onto the stack
ldc index Push constant at #index in constant pool onto stack
ldc_w index1, index2 Push constant at index, given by two bytes, in the constant pool onto stack

Constants

Bytecode instructions

Instruction Name Other Bytes Description
iload index Load an int from the local variable table at #index
lload index Load a long from the local variable table at #index
fload index Load a float from the local variable table at #index
dload index Load a double from the local variable table at #index
aload index Load a reference from the local variable table at #index
iload_0 Load an int from local variable table at index 0
iload_1 Load an int from local variable table at index 1
aload_0 Load a reference from local variable table at index 0
iaload Use array reference and index on the stack to load an int from an array
aaload Use array reference and index on the stack to load a reference​ from an array

Loads

Bytecode instructions

Instruction Name Other Bytes Description
istore index Store an int in the local variable table at #index
lstore index Store a long in the local variable table at #index
fstore index Store a float in the local variable table at #index
dstore index Store a double in the local variable table at #index
astore index Store a reference in the local variable table at #index
istore_0 Store an int in local variable table at index 0
iload_0 Store an int in local variable table at index 1
astore_0 Store a reference in local variable table at index 0
iastore Use array reference, index, and value on the stack to store an int in an array
aastore Use array reference, index, and value on the stack to store a reference​ in an array

Stores

Bytecode instructions

Instruction Name Other Bytes Description
pop Discard top value on the stack
pop2 Discard top two values on stack
dup Insert copy of top value into stack
dup_x1 Insert copy of top value into stack, two values from the top
dup2 Insert copy of top two values into stack
swap Swap top two values on stack

Stack

Bytecode instructions

Instruction Name Other Bytes Description
iadd Add two ints on stack
ladd Add two longs on stack
isub Subtract two ints on stack
imul Multiply two ints on stack
idiv Divide two integers on stack
irem Remainder of division of two ints on stack
ineg Negate an int on the stack
ishl Bit shift left an int on the stack
iand Bitwise and on two integers on stack
iinc index, const Increment local variable at index by signed byte const

Math

Bytecode instructions

Instruction Name Other Bytes Description
i2l Convert int to long
i2d Convert int to double
l2i Convert long to int
f2i Convert float to int
d2i Convert double to int
i2b Convert int to byte
i2c Convert int to char
i2s Convert int to short

Conversions

Bytecode instructions

Instruction Name Other Bytes Description
lcmp Compare two longs (0 if same, 1 if greater, -1 if less)
fcmpl Compare two floats
ifeq branch1, branch2 If int on stack is 0, branch to instruction at branchoffset made from the two other bytes
ifne branch1, branch2 If int on stack is not 0, branch to instruction at branchoffset made from the two other bytes
iflt branch1, branch2 If int on stack is less than 0, branch to instruction at branchoffset
if_icmpeq branch1, branch2 If top two ints on stack are equal, branch to instruction at branchoffset
if_icmplt branch1, branch2 If top int on stack is less than the second int, branch to instruction at branchoffset
if_acmpeq branch1, branch2 If top two references on stack are equal, branch to instruction at branchoffset

Comparisons

Bytecode instructions

Instruction Name Other Bytes Description
goto branch1, branch2 Go to another instruction at branchoffset
ireturn Return an integer from a method
return Return void from a method

Control

Bytecode instructions

Instruction Name Other Bytes Description
getstatic index1, index2 Get static field value from field ref in constant pool #index
putstatic index1, index2 Put static field value from field ref in constant pool #index
invokevirtual index1, index2 Invoke method on object using constant pool #index
invokespecial ​index1, index2 Invoke special method on object using constant pool #index
invokestatic ​index1, index2 Invoke static method using constant pool #index
invokeinterface ​index1, index2, count, 0 Invoke interface method on object using constant pool #index
invokedynamic ​index1, index2, 0, 0 Invoke dynamic method (such as lambdas) using constant pool #index
new ​index1, index2 Create new object using constant pool #index
athrow Throw an error or exception
checkcast ​index1, index2 Checks object is a certain type at constant pool #index

References

Example Code

If statements

 0: iconst_1
 1: istore_1
 2: iload_1
 3: ifeq          17
 6: getstatic     #2  // Field java/lang/System.out:Ljava/io/PrintStream;
 9: ldc           #3  // String Success
11: invokevirtual #4  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
14: goto          25
17: getstatic     #2  // Field java/lang/System.out:Ljava/io/PrintStream;
20: ldc           #5  // String Failure
22: invokevirtual #4  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
25: return
public static void main(String[] args)
{
    boolean myBoolean = true;
    if (myBoolean)
    {
        System.out.println("Success");
    }
    else
    {
        System.out.println("Failure");
    }
}

Example Code

For loops

 0: iconst_0
 1: istore_1
 2: iload_1
 3: bipush        20
 5: if_icmpge     21
 8: getstatic     #2    // Field java/lang/System.out:Ljava/io/PrintStream;
11: iload_1
12: invokevirtual #3    // Method java/io/PrintStream.println:(I)V
15: iinc          1, 1
18: goto          2
21: return
public static void main(String[] args)
{
    for (int i = 0; i < 20; i++)
    {
        System.out.println(i);
    }
}

Example Code

While loops

 0: iconst_0
 1: istore_1
 2: iload_1
 3: bipush        20
 5: if_icmpeq     22
 8: getstatic     #2  // Field java/lang/System.out:Ljava/io/PrintStream;
11: iload_1
12: invokevirtual #3  // Method java/io/PrintStream.println:(I)V
15: iload_1
16: iconst_1
17: iadd
18: istore_1
19: goto          2
22: return
public static void main(String[] args)
{
    int i = 0;
    while (i != 20)
    {
        System.out.println(i);
        i = i + 1;
    }
}

Example Code

Different sized integers

public static int addBytes(byte, byte);
  0: iload_0
  1: iload_1
  2: iadd
  3: ireturn

public static int addShorts(short, short);
  0: iload_0
  1: iload_1
  2: iadd
  3: ireturn
public static int addBytes(byte a, byte b)
{
    return a + b;
}

public static int addShorts(short a, short b)
{
    return a + b;
}

A live demonstration of Bytecode manipulation

  • https://github.com/CameronAavik/Universal-Jar-Transformer
  • For this talk I have made a Java program which will load any other java program and allow you to transform it's bytecode as it is loaded in.
  • It makes use of the Java ASM library available here: http://asm.ow2.org/
  • ASM is a Java bytecode manipulation framework which allows you to read the bytes of a class file and interact with it in an object-oriented manner.
  • There exists eclipse and IntelliJ plugins which allow you to view the ASMified version of a class.

Understanding Java Bytecode

By Cameron Aavik

Understanding Java Bytecode

  • 703