Rust Accelerated Pythons

Presented by Dorian Puła @ PyCon Canada 2019

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

+

Who am I?

Dorian Puła (@dorianpula)

Software Developer @ Points

  • Develops loyalty program e-commerce platforms.
  • Micro-services APIs, web frontend + DevOps tooling

 

Open Source

  • Contributer to Ansible, Fabric, CPython + Flask.
  • Rookeries (Markdown and Web Component static site generator powered by Rust).
  • PyCon US/Canada speaker + sprints coordinator.
  • Working to improve developer experience for Devs and Ops on Linux with Python + Rust through writing about and coding on tooling.

Why Extensions and Why Rust?

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Use Cases for Writing Extensions

  • Performance:
  • Interoperability with Existing Libraries:
    • Leveraging existing native libraries built over the past few decades with C libraries on a stable ABI.
    • Many interesting libraries like SDL, Boost, or database drivers.

Why Rust?

  • Features:
    • Memory Safety Guarantees in the Compiler.
      • Few devs can write safe non-trivial C/C++ code.
    • Ergonomics in Language
      • Easier for a non-system dev to learn than C++.
      • Language based on Python-like syntax, years of C++ experience, and language design research.
      • Powerful tooling and workflow via cargo.
    • Efficient on par with C: Rust 1.03x vs C++ 1.33x.

An innovative systems-level language that helps

with writing memory safe, fast and reliable code.

Example Library Binding

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Unit Conversion Calculator Example

  • Need fast conversions from Celsius to Fahrenheit.
  • Feature of a Django based search engine for developers to rival the functionality of DuckDuckGo, Google, Bing, and others.
  • The unit conversion needs to be driven from Python.
  • Small proof of concept for more complex enhanced searches like exchange rates of real and virtual currencies.
    • Rust's WebAssembly tooling interesting for cross backend/web frontend applications.

Rust + Python Code

Rust

fn convert_to_fahrenheit(celsius: f32) -> f32 {
    celsius * 1.8 + 32.0
}

#[cfg(test)]
mod tests {

    use super::{convert_to_fahrenheit, Temperature};

    #[test]
    fn conversion_celsius_to_fahrenheit() {
        assert_eq!(convert_to_fahrenheit(25.0), 77.0);
    }
}

Python

import unit_converter


def test_using_unit_converter():
    assert unit_converter.convert_to_fahrenheit(25.0) == 77.0

Rust Structs / Python Objects

Rust

struct Temperature {
    celsius: f32,
}

impl Temperature {
    fn to_fahrenheit(&self) -> f32 {
        self.celsius * 1.8 + 32.0
    }

    fn windchill(&self, wind_speed_kph: f32) -> f32 {
        13.12 + (0.6215 * self.celsius) - (11.37 * wind_speed_kph.powf(0.16))
            + (0.3965 * self.celsius * wind_speed_kph.powf(0.16))
    }
}

Python

import unit_converter


def test_using_unit_converter():
    temperature = unit_converter.Temperature(25.0)
    assert temperature.to_fahrenheit() == 77.0
    
def test_windchill():
    temperature = unit_converter.Temperature(-20.0)
    assert round(temperature.windchill(32.0), 1) == -32.9

Binding Between Python + Rust

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

How Python Bindings Work

  • Translates code on the binary level.
    • Not using RPC, JSON, or other serialization.
    • Code needs memory alignment to be callable.
      • Rust & C++ both do name mangling.  Need to use a C style library for binding.
  • Works with CPython's internal representation
    • Calling Py* methods like PyModule_AddObject.
    • Manage reference counting of objects.
    • Acquiring / releasing the GIL.
    • Override/implement operators.
    • Translate between types.
  • Non-trivial work that you don't want to do by hand.

PyO3 - Intro + Installation

  • PyO3
    • Bi-directional bindings between Rust + Python.
    • Use Rust Nightly release for procedural macros that wrap functions, and structures.
    • Create a library project using cargo new lib.
    • Add pyo3 dependencies to Cargo.toml

    • Lib dependencies: Python 3.5+ or PyPy.
  • Maturin
    • Tooling for building Python packages from Rust.
    • pip install maturin

Wiring up the Rust Code

#![feature(specialization)]

#[macro_use] extern crate pyo3;

use pyo3::prelude::*;


#[pyfunction]
pub fn convert_celsius_to_fahrenheit(celsius: f32) -> f32 {
    celsius * 1.8 + 32.0
}

#[pymodule]
fn unit_converter(_py: Python, module: &PyModule) -> PyResult<()> {
    module.add_wrapped(wrap_pyfunction!(convert_celsius_to_fahrenheit))?;

    Ok(())
}

// ... Test code below ...

Wiring up the Rust Structs

#![feature(specialization)]
#[macro_use]
extern crate pyo3;

use pyo3::prelude::*;

#[pyclass(module = "unit_converter")]
struct Temperature { celsius: f32 }

#[pymethods]
impl Temperature {
    #[new]
    fn new(obj: &PyRawObject, temperature: f32) {
        obj.init(
        	Temperature { celsius: temperature }
        );
    }

    fn to_fahrenheit(&self) -> f32 { self.celsius * 1.8 + 32.0 }
    // Windchill function truncated for clarity.
}

#[pymodule]
fn unit_converter(_py: Python, module: &PyModule) -> PyResult<()> {
    module.add_class::<Temperature>()?;
    Ok(())
}

Creating a Python Package from a Rust Bindings Crate

  • maturin
    • maturin develop - for local builds
    • maturin publish - for releasing a PyPI package.
    • maturin build --release - for an optimized build installed locally.

Demo

Performance Benchmarks

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Benchmarks

Using pytest-benchmark to compare against a Python implementation...

  • Why would the Rust version be significantly slower?
  • Example of why you need to measure before saying an approach is more optimized than another.

Batch Benchmarks

What about batching the calculations?

  • Much better but crossing FFI boundary is expensive.
  • Batching is how numpy increases its performance.

Better Benchmark Examples

PyO3 has a better example with parallel word count.

  • Sequential Rust - 2x faster than sequential Python.
  • Parallel Rust using threads - 6x faster.
----------------------------------------------------------------------------------------------------
Name (time in us)                       Min                    Max                   Mean           
----------------------------------------------------------------------------------------------------
test_pure_python_once_numba        292.0990 (1.0)         317.7590 (1.0)         296.7477 (1.0)     
test_numpy_numba                   326.2470 (1.12)        526.1350 (1.66)        338.1704 (1.14)    
test_rust_bytes_once               336.0620 (1.15)      1,053.0090 (3.31)        342.5122 (1.15)    
test_c_swig_bytes_once             375.6310 (1.29)      1,389.9070 (4.37)        388.9181 (1.31)    
test_rust_once                     986.0360 (3.38)      2,498.5850 (7.86)      1,006.5819 (3.39)    
test_numpy                       1,137.1750 (3.89)      2,000.5430 (6.30)      1,167.2551 (3.93)    
test_rust                        2,555.1400 (8.75)      3,645.3900 (11.47)     2,592.0419 (8.73)    
test_regex                      22,597.1750 (77.36)    25,027.2820 (78.76)    22,851.8456 (77.01)   
test_pure_python_once           32,418.8830 (110.99)   34,818.0800 (109.57)   32,756.3244 (110.38)  
test_pure_python                43,823.5140 (150.03)   45,961.8460 (144.64)   44,367.1028 (149.51)  
test_python_comprehension       46,360.1640 (158.71)   50,578.1740 (159.17)   46,986.8058 (158.34)  
test_itertools                  49,080.8640 (168.03)   51,016.5230 (160.55)   49,405.2562 (166.49)  
----------------------------------------------------------------------------------------------------

Counting double letter pairings example more thorough

Summary

Photo © 2010 Colin - https://www.flickr.com/photos/48625620@N00/4446021223/

Using PyO3 + Rust + Python 3

  • Great experience writing in Rust.
    • Great tooling, documentation and community.
    • Innovative language design.
    • Expect some struggles learning about memory management, traits, and lifetimes.
    • Ultimately an empowering experience.
  • PyO3 makes native binding creation easy.
    • Maturin streamlines the packaging aspect.
  • Consider it as another tool to improve performance.
    • Check with benchmarks first.

Further Reading

Thanks!

Twitter: @dorianpula

Email: dorian.pula@gmail.com

Web: https://dorianpula.ca/

Questions?

+