Rust Accelerated Pythons

Presented by Dorian Puła @ PyCon Canada 2019

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

+

Who am I?

Dorian Puła (@dorianpula)

Software Developer @ Points

  • Develops loyalty program e-commerce platforms.
  • Micro-services APIs, web frontend + DevOps tooling

 

Open Source

  • Contributer to Ansible, Fabric, CPython + Flask.
  • Rookeries (Markdown and Web Component static site generator powered by Rust).
  • PyCon US/Canada speaker + sprints coordinator.
  • Working to improve developer experience for Devs and Ops on Linux with Python + Rust through writing about and coding on tooling.

Why Extensions and Why Rust?

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Use Cases for Writing Extensions

  • Performance:
  • Interoperability with Existing Libraries:
    • Leveraging existing native libraries built over the past few decades with C libraries on a stable ABI.
    • Many interesting libraries like SDL, Boost, or database drivers.

Why Rust?

  • Features:
    • Memory Safety Guarantees in the Compiler.
      • Few devs can write safe non-trivial C/C++ code.
    • Ergonomics in Language
      • Easier for a non-system dev to learn than C++.
      • Language based on Python-like syntax, years of C++ experience, and language design research.
      • Powerful tooling and workflow via cargo.
    • Efficient on par with C: Rust 1.03x vs C++ 1.33x.

An innovative systems-level language that helps

with writing memory safe, fast and reliable code.

Example Library Binding

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Unit Conversion Calculator Example

  • Need fast conversions from Celsius to Fahrenheit.
  • Feature of a Django based search engine for developers to rival the functionality of DuckDuckGo, Google, Bing, and others.
  • The unit conversion needs to be driven from Python.
  • Small proof of concept for more complex enhanced searches like exchange rates of real and virtual currencies.
    • Rust's WebAssembly tooling interesting for cross backend/web frontend applications.

Rust + Python Code

Rust

fn convert_to_fahrenheit(celsius: f32) -> f32 {
    celsius * 1.8 + 32.0
}

#[cfg(test)]
mod tests {

    use super::{convert_to_fahrenheit, Temperature};

    #[test]
    fn conversion_celsius_to_fahrenheit() {
        assert_eq!(convert_to_fahrenheit(25.0), 77.0);
    }
}

Python

import unit_converter


def test_using_unit_converter():
    assert unit_converter.convert_to_fahrenheit(25.0) == 77.0

Rust Structs / Python Objects

Rust

struct Temperature {
    celsius: f32,
}

impl Temperature {
    fn to_fahrenheit(&self) -> f32 {
        self.celsius * 1.8 + 32.0
    }

    fn windchill(&self, wind_speed_kph: f32) -> f32 {
        13.12 + (0.6215 * self.celsius) - (11.37 * wind_speed_kph.powf(0.16))
            + (0.3965 * self.celsius * wind_speed_kph.powf(0.16))
    }
}

Python

import unit_converter


def test_using_unit_converter():
    temperature = unit_converter.Temperature(25.0)
    assert temperature.to_fahrenheit() == 77.0
    
def test_windchill():
    temperature = unit_converter.Temperature(-20.0)
    assert round(temperature.windchill(32.0), 1) == -32.9

Binding Between Python + Rust

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

How Python Bindings Work

  • Translates code on the binary level.
    • Not using RPC, JSON, or other serialization.
    • Code needs memory alignment to be callable.
      • Rust & C++ both do name mangling.  Need to use a C style library for binding.
  • Works with CPython's internal representation
    • Calling Py* methods like PyModule_AddObject.
    • Manage reference counting of objects.
    • Acquiring / releasing the GIL.
    • Override/implement operators.
    • Translate between types.
  • Non-trivial work that you don't want to do by hand.

PyO3 - Intro + Installation

  • PyO3
    • Bi-directional bindings between Rust + Python.
    • Use Rust Nightly release for procedural macros that wrap functions, and structures.
    • Create a library project using cargo new lib.
    • Add pyo3 dependencies to Cargo.toml

    • Lib dependencies: Python 3.5+ or PyPy.
  • Maturin
    • Tooling for building Python packages from Rust.
    • pip install maturin

Wiring up the Rust Code

#![feature(specialization)]

#[macro_use] extern crate pyo3;

use pyo3::prelude::*;


#[pyfunction]
pub fn convert_celsius_to_fahrenheit(celsius: f32) -> f32 {
    celsius * 1.8 + 32.0
}

#[pymodule]
fn unit_converter(_py: Python, module: &PyModule) -> PyResult<()> {
    module.add_wrapped(wrap_pyfunction!(convert_celsius_to_fahrenheit))?;

    Ok(())
}

// ... Test code below ...

Wiring up the Rust Structs

#![feature(specialization)]
#[macro_use]
extern crate pyo3;

use pyo3::prelude::*;

#[pyclass(module = "unit_converter")]
struct Temperature { celsius: f32 }

#[pymethods]
impl Temperature {
    #[new]
    fn new(obj: &PyRawObject, temperature: f32) {
        obj.init(
        	Temperature { celsius: temperature }
        );
    }

    fn to_fahrenheit(&self) -> f32 { self.celsius * 1.8 + 32.0 }
    // Windchill function truncated for clarity.
}

#[pymodule]
fn unit_converter(_py: Python, module: &PyModule) -> PyResult<()> {
    module.add_class::<Temperature>()?;
    Ok(())
}

Creating a Python Package from a Rust Bindings Crate

  • maturin
    • maturin develop - for local builds
    • maturin publish - for releasing a PyPI package.
    • maturin build --release - for an optimized build installed locally.

Demo

Performance Benchmarks

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Benchmarks

Using pytest-benchmark to compare against a Python implementation...

  • Why would the Rust version be significantly slower?
  • Example of why you need to measure before saying an approach is more optimized than another.

Batch Benchmarks

What about batching the calculations?

  • Much better but crossing FFI boundary is expensive.
  • Batching is how numpy increases its performance.

Better Benchmark Examples

PyO3 has a better example with parallel word count.

  • Sequential Rust - 2x faster than sequential Python.
  • Parallel Rust using threads - 6x faster.
----------------------------------------------------------------------------------------------------
Name (time in us)                       Min                    Max                   Mean           
----------------------------------------------------------------------------------------------------
test_pure_python_once_numba        292.0990 (1.0)         317.7590 (1.0)         296.7477 (1.0)     
test_numpy_numba                   326.2470 (1.12)        526.1350 (1.66)        338.1704 (1.14)    
test_rust_bytes_once               336.0620 (1.15)      1,053.0090 (3.31)        342.5122 (1.15)    
test_c_swig_bytes_once             375.6310 (1.29)      1,389.9070 (4.37)        388.9181 (1.31)    
test_rust_once                     986.0360 (3.38)      2,498.5850 (7.86)      1,006.5819 (3.39)    
test_numpy                       1,137.1750 (3.89)      2,000.5430 (6.30)      1,167.2551 (3.93)    
test_rust                        2,555.1400 (8.75)      3,645.3900 (11.47)     2,592.0419 (8.73)    
test_regex                      22,597.1750 (77.36)    25,027.2820 (78.76)    22,851.8456 (77.01)   
test_pure_python_once           32,418.8830 (110.99)   34,818.0800 (109.57)   32,756.3244 (110.38)  
test_pure_python                43,823.5140 (150.03)   45,961.8460 (144.64)   44,367.1028 (149.51)  
test_python_comprehension       46,360.1640 (158.71)   50,578.1740 (159.17)   46,986.8058 (158.34)  
test_itertools                  49,080.8640 (168.03)   51,016.5230 (160.55)   49,405.2562 (166.49)  
----------------------------------------------------------------------------------------------------

Counting double letter pairings example more thorough

Summary

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Using PyO3 + Rust + Python 3

  • Great experience writing in Rust.
    • Great tooling, documentation and community.
    • Innovative language design.
    • Expect some struggles learning about memory management, traits, and lifetimes.
    • Ultimately an empowering experience.
  • PyO3 makes native binding creation easy.
    • Maturin streamlines the packaging aspect.
  • Consider it as another tool to improve performance.
    • Check with benchmarks first.

Further Reading

Thanks!

Twitter: @dorianpula

Email: dorian.pula@gmail.com

Web: https://dorianpula.ca/

Questions?

+

Rust Accelerated Pythons (PyCon Canada 2019)

By Dorian Pula

Rust Accelerated Pythons (PyCon Canada 2019)

Sometimes you need to scale the performance of your Python code, or you need to hook into a C API. Wouldn't it be nice not having to do that in C or C++? This talk walks through how to accelerate Python code using binding written in Rust (a safe, fast systems level programming language).

  • 2,715