Understanding Java Bytecode
By Cameron Aavik
A bit about me
- Second Year Software Engineering Student
- Minecraft Modder for about 2 years
- Developer of the mod sprinkles_for_vanilla which has just over 100,000 downloads.
- Currently working as an intern at a local startup, Redback Technologies
Why should you learn Java bytecode?
- Any kind of analysis on compiled Java classes (.class files)
- JVM Languages
- Debugging very complex, low-level issues
- Squeezing out extra performance
- More comprehensive understanding of the inner workings of Java.
- Learn what Java really does under the cover when compiling
The .class file
The .class file is generated whenever you compile your Java code. These files are also created when compiling any other JVM language, this is why you can have two JVM languages interact with each other as they do so at the compiled level.
This file is loaded by the Java Virtual Machine and then interpreted.
Hello World
public class HelloWorld
{
public static void main(String[] args)
{
System.out.println("Hello World")
}
}
Hello World
public class HelloWorld
{
public static void main(String[] args)
{
System.out.println("Hello World")
}
}
Run javap -v HelloWorld
public class HelloWorld
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #6.#15 // java/lang/Object."<init>":()V
#2 = Fieldref #16.#17 // java/lang/System.out:Ljava/io/PrintStream;
#3 = String #18 // Hello World
#4 = Methodref #19.#20 // java/io/PrintStream.println:(Ljava/lang/String;)V
#5 = Class #21 // HelloWorld
#6 = Class #22 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 main
#12 = Utf8 ([Ljava/lang/String;)V
#13 = Utf8 SourceFile
#14 = Utf8 HelloWorld.java
#15 = NameAndType #7:#8 // "<init>":()V
#16 = Class #23 // java/lang/System
#17 = NameAndType #24:#25 // out:Ljava/io/PrintStream;
#18 = Utf8 Hello World
#19 = Class #26 // java/io/PrintStream
#20 = NameAndType #27:#28 // println:(Ljava/lang/String;)V
#21 = Utf8 HelloWorld
#22 = Utf8 java/lang/Object
#23 = Utf8 java/lang/System
#24 = Utf8 out
#25 = Utf8 Ljava/io/PrintStream;
#26 = Utf8 java/io/PrintStream
#27 = Utf8 println
#28 = Utf8 (Ljava/lang/String;)V
{
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 5: 0
line 6: 8
}
SourceFile: "HelloWorld.java"
How the JVM runs the bytecode
- Whenever a method is run it creates a new frame
- A frame consists of an array of local variables, an operand stack, and a reference to the class's constant pool
- The size of the operand stack and the array of local variables are determined at compile time
- A frame is destroyed when a method invocation completes
- The JVM stack for a given thread is a stack of frames.
Local Variable Table
- The local variable table can hold the java primitive types or a reference to an instance of an object.
- long and double primitive types take up two values in the local variable table as each value in the local variable table is 32 bits
- Other number types such as booleans, bytes, chars and shorts are still stored as 32 bits.
- The local variable table is addressed by indexing starting at 0
- If a method is non-static, the index represented by 0 will be a reference to the instance of the object.
- If a method takes arguments they will then automatically fill into the start of the local variable table
Operand Stack
- This is a stack which the bytecode interacts with.
- It is used to push data around and to load the arguments into the bytecode operations.
- If you wanted to add two numbers, you would add both of them to the stack, then call the iadd bytecode and it will read (while popping) the top two values on the stack and then push the resultant value on the top of the stack
- As with the local variable table, each value on the stack is 32 bits which means that longs and doubles take up two values.
Constant Pool
- All constants used in the code are stored in the classes constant pool. The constant pool is shared among all methods in that class.
- A frame will have a reference to that class's constant pool.
- Examples of constants include:
- Strings
- Names of methods
- Names of classes
- Method descriptors
Frame for Hello World
- As we saw earlier, we could view the class format in a readable manner using javap.
- From that we were able to get the following bytecode
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 5: 0
line 6: 8
The constructor
- Even though we didn't put it in the java code, a constructor was created.
- We see that the stack and local variable table are set to a maximum of 1 value
- Because a constructor is non-static, it automatically adds a reference to this in the local variable table
- The bytecode:
- aload_0 will load the 0th element from the local variable table and put it on the stack. In this case, the 0th element is this
- invokespecial is an operation which calls a special method. It uses the method reference which is stored at index 1 in the constant pool as shown by the #1. The comment to the right shows what the value of that method reference is. In this case it is the constructor for object. This line is the equivalent of calling super();
- The reason that it loaded this in the first instruction was so it could call super() on this. Because of that, this is popped from the stack.
- return will return void.
public HelloWorld();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=1, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
LineNumberTable:
line 1: 0
The main method
- We can see here that the main method has a stack size of 2 and a local variable size of 1.
- Because this is a static method, there is no this value in the local variable table, but there is a String array in the arguments. This is now the 0th element of the local variable table
- The bytecode:
- getstatic is an operation which will retrieve a static field from another class. In this case it is retrieving the field out from System. This field reference is put on the stack
- ldc loads a constant from the constant pool. In this case it is at index #3 which is for "Hello World"
- invokevirtual will call a non-static, non-special method given by a method reference in the constant pool. In this case, it is PrintStream.println. This method takes in an argument and is also non-static. Therefore it reads the top two values on the stack. The one we pushed on first is the instance of PrintStream and the argument is supplied by the second value we pushed.
- The above method invocation will then have popped off everything from the stack
- return returns void
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 5: 0
line 6: 8
Some things to note
- You can see on the bottom there is a LineNumberTable. This is how you are able to know what line an error/exception occurred on in the stack trace. It simply says what line of the original java file maps to a given offset in the bytecode
- The number before an operation in the bytecode, for example 3 before the ldc operation, specifies the offset from the start of the method in the bytecode. This is useful because it allows us to see how many bytes each operation takes up.
- Every bytecode instruction has a byte for its opcode, therefore there is a maximum of 256 possible instructions. The range of bytes from 0xCB - 0xFD are currently unused however.
- Each bytecode instruction can have 0 or more operands. In this case, getstatic has 2 bytes for its operands. This is actually a 16 bit index to the constant pool. As a result of this, a constant pool can have a maximum of 32,768 constants
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 5: 0
line 6: 8
Some things to note
- Methods and Types have descriptors.
- In this case the method descriptor for the main class is ([Ljava/lang/String;)V
- The method descriptor is defined by (Argument-Types)ReturnType
- There is one argument type for this main method, it is a String[]. There is also a single return type which is void.
- The type descriptor for String[] is given by "[Ljava/lang/String;". The [ at the start indicates it is an array, and "Ljava/lang/String;" indicates that it is a reference to a String
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #3 // String Hello World
5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 5: 0
line 6: 8
Type | Descriptor |
---|---|
array | [Type |
boolean | Z |
byte | B |
char | C |
double | D |
float | F |
int | I |
long | J |
reference | LClassname; |
short | S |
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
nop | Perform no operation | |
aconst_null | Push null reference onto stack | |
iconst_0 | Load int 0 onto stack | |
iconst_1 | Load int 1 onto stack | |
iconst_m1 | Load int -1 onto stack | |
lconst_0 | Load long 0 onto stack | |
fconst_0 | Load float 0.0f onto stack | |
bipush | byte | Push a byte onto the stack |
sipush | byte1, byte2 | Push a short onto the stack |
ldc | index | Push constant at #index in constant pool onto stack |
ldc_w | index1, index2 | Push constant at index, given by two bytes, in the constant pool onto stack |
Constants
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
iload | index | Load an int from the local variable table at #index |
lload | index | Load a long from the local variable table at #index |
fload | index | Load a float from the local variable table at #index |
dload | index | Load a double from the local variable table at #index |
aload | index | Load a reference from the local variable table at #index |
iload_0 | Load an int from local variable table at index 0 | |
iload_1 | Load an int from local variable table at index 1 | |
aload_0 | Load a reference from local variable table at index 0 | |
iaload | Use array reference and index on the stack to load an int from an array | |
aaload | Use array reference and index on the stack to load a reference from an array |
Loads
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
istore | index | Store an int in the local variable table at #index |
lstore | index | Store a long in the local variable table at #index |
fstore | index | Store a float in the local variable table at #index |
dstore | index | Store a double in the local variable table at #index |
astore | index | Store a reference in the local variable table at #index |
istore_0 | Store an int in local variable table at index 0 | |
iload_0 | Store an int in local variable table at index 1 | |
astore_0 | Store a reference in local variable table at index 0 | |
iastore | Use array reference, index, and value on the stack to store an int in an array | |
aastore | Use array reference, index, and value on the stack to store a reference in an array |
Stores
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
pop | Discard top value on the stack | |
pop2 | Discard top two values on stack | |
dup | Insert copy of top value into stack | |
dup_x1 | Insert copy of top value into stack, two values from the top | |
dup2 | Insert copy of top two values into stack | |
swap | Swap top two values on stack |
Stack
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
iadd | Add two ints on stack | |
ladd | Add two longs on stack | |
isub | Subtract two ints on stack | |
imul | Multiply two ints on stack | |
idiv | Divide two integers on stack | |
irem | Remainder of division of two ints on stack | |
ineg | Negate an int on the stack | |
ishl | Bit shift left an int on the stack | |
iand | Bitwise and on two integers on stack | |
iinc | index, const | Increment local variable at index by signed byte const |
Math
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
i2l | Convert int to long | |
i2d | Convert int to double | |
l2i | Convert long to int | |
f2i | Convert float to int | |
d2i | Convert double to int | |
i2b | Convert int to byte | |
i2c | Convert int to char | |
i2s | Convert int to short |
Conversions
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
lcmp | Compare two longs (0 if same, 1 if greater, -1 if less) | |
fcmpl | Compare two floats | |
ifeq | branch1, branch2 | If int on stack is 0, branch to instruction at branchoffset made from the two other bytes |
ifne | branch1, branch2 | If int on stack is not 0, branch to instruction at branchoffset made from the two other bytes |
iflt | branch1, branch2 | If int on stack is less than 0, branch to instruction at branchoffset |
if_icmpeq | branch1, branch2 | If top two ints on stack are equal, branch to instruction at branchoffset |
if_icmplt | branch1, branch2 | If top int on stack is less than the second int, branch to instruction at branchoffset |
if_acmpeq | branch1, branch2 | If top two references on stack are equal, branch to instruction at branchoffset |
Comparisons
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
goto | branch1, branch2 | Go to another instruction at branchoffset |
ireturn | Return an integer from a method | |
return | Return void from a method |
Control
Bytecode instructions
Instruction Name | Other Bytes | Description |
---|---|---|
getstatic | index1, index2 | Get static field value from field ref in constant pool #index |
putstatic | index1, index2 | Put static field value from field ref in constant pool #index |
invokevirtual | index1, index2 | Invoke method on object using constant pool #index |
invokespecial | index1, index2 | Invoke special method on object using constant pool #index |
invokestatic | index1, index2 | Invoke static method using constant pool #index |
invokeinterface | index1, index2, count, 0 | Invoke interface method on object using constant pool #index |
invokedynamic | index1, index2, 0, 0 | Invoke dynamic method (such as lambdas) using constant pool #index |
new | index1, index2 | Create new object using constant pool #index |
athrow | Throw an error or exception | |
checkcast | index1, index2 | Checks object is a certain type at constant pool #index |
References
Example Code
If statements
0: iconst_1
1: istore_1
2: iload_1
3: ifeq 17
6: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
9: ldc #3 // String Success
11: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
14: goto 25
17: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
20: ldc #5 // String Failure
22: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
25: return
public static void main(String[] args)
{
boolean myBoolean = true;
if (myBoolean)
{
System.out.println("Success");
}
else
{
System.out.println("Failure");
}
}
Example Code
For loops
0: iconst_0
1: istore_1
2: iload_1
3: bipush 20
5: if_icmpge 21
8: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
11: iload_1
12: invokevirtual #3 // Method java/io/PrintStream.println:(I)V
15: iinc 1, 1
18: goto 2
21: return
public static void main(String[] args)
{
for (int i = 0; i < 20; i++)
{
System.out.println(i);
}
}
Example Code
While loops
0: iconst_0
1: istore_1
2: iload_1
3: bipush 20
5: if_icmpeq 22
8: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream;
11: iload_1
12: invokevirtual #3 // Method java/io/PrintStream.println:(I)V
15: iload_1
16: iconst_1
17: iadd
18: istore_1
19: goto 2
22: return
public static void main(String[] args)
{
int i = 0;
while (i != 20)
{
System.out.println(i);
i = i + 1;
}
}
Example Code
Different sized integers
public static int addBytes(byte, byte);
0: iload_0
1: iload_1
2: iadd
3: ireturn
public static int addShorts(short, short);
0: iload_0
1: iload_1
2: iadd
3: ireturn
public static int addBytes(byte a, byte b)
{
return a + b;
}
public static int addShorts(short a, short b)
{
return a + b;
}
A live demonstration of Bytecode manipulation
- https://github.com/CameronAavik/Universal-Jar-Transformer
- For this talk I have made a Java program which will load any other java program and allow you to transform it's bytecode as it is loaded in.
- It makes use of the Java ASM library available here: http://asm.ow2.org/
- ASM is a Java bytecode manipulation framework which allows you to read the bytes of a class file and interact with it in an object-oriented manner.
- There exists eclipse and IntelliJ plugins which allow you to view the ASMified version of a class.
Understanding Java Bytecode
By Cameron Aavik
Understanding Java Bytecode
- 703