Accelerating NumPy with C++

C++ Data Structures for Data Science
Wolf Vollprecht, BAyPIGgies, July 27th 2017
About Me
- Robotics Student at ETH Zurich
- Currently visiting the Autonomous Systems Lab at Stanford
- Researching autonomous cars, drones and other robots

Wolf Vollprecht
Github: @wolfv
Twitter: @wuoulf
JET
- A Python library
- Translates NumPy to C++
- Compiles Just In Time (JIT)
XTENSOR
- Ultra modern C++ library
- API similar to NumPy
- Bindings to R, Julia and Python

Motivation
Accelerate math-heavy code!
OBSERVATION
NumPy dtypes and shapes are often known!
Combine
- Beauty & Simplicity of NumPy
- Speed of C++11
a = np.array([1.1, 2.2, 3.3]) # shape = (3,)
b = np.arange(9, dtype=int).reshape(3, 3) # shape = (3, 3)
# dtype = int64
c = a * b
# result ==> shape = (3, 3), dtype = float64Audience
Python lovers who need speed in
in numerical systems
FOR EXAMPLE
Solving ODEs
Optimize with Least Squares
Simulate large dynamic systems
Overview
Python Script
NumPy
NumPy API
calls
Overview
Python Script
JET
NumPy API
calls
Overview
Python Script
JET
NumPy API
calls
Operations
Computation
Graph
C++
Shared Object (.so)
call
register
register
compile
import jet as jt
import numpy as np
ph = jt.placeholder(name='holder',
shape=(3, 3))
var = jt.variable(name='variable',
value=np.zeros((1, 3)))
const = jt.constant(name='constant',
value=1.5)
res1 = ph * var
res2 = res1 + const
builder = JetBuilder(out=[res2],
fun_name='test')
module = builder.build()
instance = test_module.TestClass()
# adjustable class member 'variable':
instance.variable = np.array([1, 2, 3])
# calling our function
computed = instance.test(np.ones((3, 3)))
print(computed)constexpr auto constant = 2.0;
class TestClass {
public:
Mat<1, 3> variable = {0.0, 0.0, 0.0};
Mat<double>::fixed<3, 3> testFunc(Mat<double>::fixed<3, 3> holder) {
Mat<double>::fixed<3, 3> Add = (holder * variable) + constant;
return Add;
}
}
// Python binding code
namespace py = pybind11;
void pyexport(py::module& m) {
py::class_<TestClass>(m, "TestClass")
.def(py::init<>())
.def("testFunc", &TestClass::testFunc)
.def_readwrite("variable", &TestClass::variable)
.def("args", [](TestClass&) {
return vector<string> { "holder" };
})
;
}The Library
- Very small ~3’000 LOC (compared to Tensorflow / Cython / Pythran)
- Transparent conversion to C++ class (not to LLVM IR as Numba)
- Not trying to reimplement Python (as Cython or Pythran…), just NumPy
- Piggyback on established LinAlg libraries and incredible compiler optimizations
- Optionally compile!
Migrating from NumPy
- Replace NumPy operations with JET operations. Usually replacing ‘import numpy as np’ with ‘import jet as np’ is sufficient.
- You don't have to replace constants such as np.array([1, 2]) with JET-arrays
- Decorate all top-level functions with the JIT-decorator
import numpy as np
def sub_func(a):
return a * 2
def func(a, b):
c = sub_func(a) + b
return np.dot(c, c)
print(func(np.array([1,2]), 2))import numpy as np
import jet as jt
from jet.jit import jit
def sub_func(a):
return a * 2
@jit
def func(a, b):
c = sub_func(a) + b
return jt.dot(c, c)
print(func(np.array([1,2]), 2))Test Case: Quadrotor Simulation
- Full state model:
- Quadrotor dynamics
- Propeller dynamics
- Sensor model
- Control system
\dot{x} = f(x, u)
x˙=f(x,u)
state derivatives
state
inputs
from jet.jit import jit
import scipy.integrate
from simulation_model import \
state_derivatives, init_state, time_vec
jet_state_derivatives = jit(state_derivatives)
states = scipy.integrate.odeint(
jet_state_derivatives,
init_state,
time_vec
)~ 55x speed-up of math-heavy Python/Numpy code
The Backend
- Currently uses Armadillo
- Armadillo semantics are different from NumPy
- E.g. column-wise storage, no broadcasting ...
Outlook
- Change to a “NumPy-compatible” backend → xtensor!
- While/For-loop operation
- Parallelization feature
test it / contribute on:
https://github.com/wolfv/pyjet
xtensor + jet
By Wolf Vollprecht
xtensor + jet
- 481