Accelerating NumPy with C++

C++ Data Structures for Data Science

Wolf Vollprecht, BAyPIGgies, July 27th 2017

About Me

  • Robotics Student at ETH Zurich
  • Currently visiting the Autonomous Systems Lab at Stanford
  • Researching autonomous cars, drones and other robots

Wolf Vollprecht

Github: @wolfv

Twitter: @wuoulf

 

JET

  • A Python library
  • Translates NumPy to C++
  • Compiles Just In Time (JIT)

XTENSOR

  • Ultra modern C++ library
  • API similar to NumPy
  • Bindings to R, Julia and Python

Motivation

Accelerate math-heavy code!

OBSERVATION

NumPy dtypes and shapes are often known!

 

 

 

 

Combine

  • Beauty & Simplicity of NumPy
  • Speed of C++11
a = np.array([1.1, 2.2, 3.3])              # shape = (3,)
b = np.arange(9, dtype=int).reshape(3, 3)  # shape = (3, 3)
                                           # dtype = int64
c = a * b  
# result ==> shape = (3, 3), dtype = float64

Audience

Python lovers who need speed in

in numerical systems

 

FOR EXAMPLE

Solving ODEs

Optimize with Least Squares

Simulate large dynamic systems

Overview

Python Script

NumPy

NumPy API

calls

Overview

Python Script

JET

NumPy API

calls

Overview

Python Script

JET

NumPy API

calls

Operations

Computation

Graph

C++

Shared Object (.so)

call

register

register

compile

import jet as jt
import numpy as np


ph    = jt.placeholder(name='holder', 
                       shape=(3, 3))
var   = jt.variable(name='variable',
                    value=np.zeros((1, 3)))
const = jt.constant(name='constant',
                    value=1.5)





res1 = ph * var







res2 = res1 + const
builder = JetBuilder(out=[res2], 
                     fun_name='test')

module = builder.build()
instance = test_module.TestClass()

# adjustable class member 'variable':
instance.variable = np.array([1, 2, 3])

# calling our function
computed = instance.test(np.ones((3, 3)))
print(computed)
constexpr auto constant = 2.0;

class TestClass {
public:
    Mat<1, 3> variable = {0.0, 0.0, 0.0};

    Mat<double>::fixed<3, 3> testFunc(Mat<double>::fixed<3, 3> holder) {
        Mat<double>::fixed<3, 3> Add = (holder * variable) + constant;
        return Add;
    }
}


// Python binding code

namespace py = pybind11;
void pyexport(py::module& m) {
    py::class_<TestClass>(m, "TestClass")
        .def(py::init<>())
        .def("testFunc", &TestClass::testFunc)
        .def_readwrite("variable", &TestClass::variable)

        .def("args", [](TestClass&) {
            return vector<string> { "holder" };
        })
    ;
}

The Library

  • Very small ~3’000 LOC (compared to Tensorflow / Cython / Pythran)
  • Transparent conversion to C++ class (not to LLVM IR as Numba)
  • Not trying to reimplement Python (as Cython or Pythran…), just NumPy
  • Piggyback on established LinAlg libraries and incredible compiler optimizations
  • Optionally compile!

 

Migrating from NumPy

  • Replace NumPy operations with JET operations. Usually replacing ‘import numpy as np’ with ‘import jet as np’ is sufficient.
  • You don't have to replace constants such as np.array([1, 2]) with JET-arrays
  • Decorate all top-level functions with the JIT-decorator
import numpy as np


def sub_func(a):
    return a * 2

def func(a, b):
    c = sub_func(a) + b
    return np.dot(c, c)

print(func(np.array([1,2]), 2))
import numpy as np
import jet as jt
from jet.jit import jit

def sub_func(a):
    return a * 2

@jit
def func(a, b):
    c = sub_func(a) + b
    return jt.dot(c, c)

print(func(np.array([1,2]), 2))

Test Case: Quadrotor Simulation

  • Full state model:
    • Quadrotor dynamics
    • Propeller dynamics
    • Sensor model
    • Control system
\dot{x} = f(x, u)
x˙=f(x,u)\dot{x} = f(x, u)

state derivatives

state

inputs

from jet.jit import jit

import scipy.integrate
from simulation_model import \
    state_derivatives, init_state, time_vec

jet_state_derivatives = jit(state_derivatives)

states = scipy.integrate.odeint(
    jet_state_derivatives, 
    init_state,
    time_vec
)

~ 55x speed-up of math-heavy Python/Numpy code

 

The Backend

  • Currently uses Armadillo
  • Armadillo semantics are different from NumPy
  • E.g. column-wise storage, no broadcasting ...

Outlook

  • Change to a “NumPy-compatible” backend → xtensor!
  • While/For-loop operation
  • Parallelization feature

test it / contribute on:
https://github.com/wolfv/pyjet

xtensor + jet

By Wolf Vollprecht

xtensor + jet

  • 481