Bytecode manipulations

AR, November 2017

Who am I?

  • Siberian
  • Web developer with 10+ years of experience
  • Python enthusiast (6+ years)
  • Lead Full Stack programmer at

Aleksey Rembish

Who am I?

  • Siberian
  • Web developer with 10+ years of experience
  • Python enthusiast (6+ years)
  • Lead Full Stack programmer at

Aleksey Rembish

Python exorcist

And so we code...

  • a LPG operator?

And so we code...

  • a LPG operator?
something <|> True

less-pipe-greater operator

And so we code...

  • a LPG operator
<|>

less-pipe-greater operator

needs grammar patches

And so we code...

  • a LPG operator
  • increments and decrements?

And so we code...

  • a LPG operator
  • increments and decrements?
x = 1
y = x++ + ++x
print(y)

Hell Quiz Question #3

And so we code...

  • a LPG operator
  • increments and decrements?
In [1]: def increment(value):
   ...:     return ++value
   ...: 

In [2]: from dis import dis

In [3]: dis(increment)
  2           0 LOAD_FAST                0 (value)
              3 UNARY_POSITIVE      
              4 UNARY_POSITIVE      
              5 RETURN_VALUE        

And so we code...

  • a LPG operator
  • increments and decrements

And so we code...

  • a LPG operator
  • increments and decrements
  • GOTO and labels!!!

WORK

Code formatting

def method():
    label: print("foo")
    goto label

Code formatting

def method():
    label: print("foo")
    goto label
def method():
    label .here
    print("foo")
    goto .here

Code formatting

def method():
    label .here
    goto .here
  2           0 LOAD_GLOBAL              0 (label)
              3 LOAD_ATTR                1 (here)
              6 POP_TOP

  3           7 LOAD_GLOBAL              2 (goto)
             10 LOAD_ATTR                1 (here)
             13 POP_TOP

             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE

label .here == label.here

Function bytecode

# Python 2.x
def function():
    pass

code = function.func_code
# Python 3.x
def function():
    pass

code = function.__code__

Pure Bytecode

code.co_code
== 't\x00\x00j ... \x00S'   # Py2
== b't\x00\x00j ... \x00S'  # Py3

from opcode import opname, opmap

print(code.co_code[0])  # = 'LOAD_GLOBAL'
print(opmap['LOAD_GLOBAL'])  # = 116
opmap['LOAD_GLOBAL'] == code.co_code[0]  # = True 

Opcode arguments

from opcode import HAVE_ARGUMENT  # == 90

opcode = code.co_code[i]

if opcode >= HAVE_ARGUMENT:
    lo_byte = code.co_code[i + 1]
    hi_byte = code.co_code[i + 2]
    position = (hi_byte << 8) ^ lo_byte

Cherchez la femme

opcode = code.co_code[i]
current = opname[opcode]
command = code.co_names[previous_arg_position]

if current == 'LOAD_ATTR' \
        and previous == 'LOAD_GLOBAL':

    if command == 'label':
        # store label position
    elif command == 'goto':
        # same for goto
code.co_names
# == ('label', 'here', 'goto')

Nope, nope, NOP!

Label:

  • does nothing
  • stores position
  • old code = 3 + 3 + 1 instructions
  • new code = 7 NOPs

Goto:

  • jumps to label
  • old code = 3 + 3 + 1 instructions
  • new code = JUMP_ABSOLUTE (3 bytes) + 4 NOPs

and JUMP!

Code rebuild

from types import CodeType

new_code = CodeType(
    code.co_argcount, 
    code.co_kwonlyargcount,  # py3 only
    code.co_nlocals, code.co_stacksize, code.co_flags,
    bytes(map(ord, codebytes_list)),  # string in py2
    code.co_consts, code.co_names, code.co_varnames,
    code.co_filename, code.co_name, code.co_firstlineno,
    code.co_lnotab
)

Function rebuild

from types import FunctionType

rewritten = FunctionType(new_code, function.func_globals)  # py2
rewritten = FunctionType(new_code, function.__globals__)   # py3

Implementation

Nice to read

We are hiring!

Python 3.5, async, Tornado, Flask, Docker, Kubernetes, Prometheus, GitLab CI, ...

Bytecode manipulations

Bytecode manipulations

By Alex Rembish

Bytecode manipulations

  • 1,989