rminiconda

Clean Python management

when using R as an interface to Python

DSC 2019

Interfaces should be convenient

  • Programming a package to use an interface should be straightforward

  • The users of the application should be largely unaware of the interface

From John Chambers (Extending R)

Interfaces should be convenient

  • Programming a package to use an interface should be straightforward

  • The users of the application should be largely unaware of the interface

From John Chambers (Extending R)

reticulate accomplishes this very nicely

Except for one thing

166 / 429 issues (~39%)

Why is this a problem?

There are plenty of interfaces with system dependencies

  • C, C++, Fortran

      

  • Misc system requirements

 

  • SQL databases, Spark, etc.

Why is this a problem?

There are plenty of interfaces with system dependencies

  • C, C++, Fortran

      

  • Misc system requirements

 

  • SQL databases, Spark, etc.

Usually not a problem on Mac/Linux, CRAN binaries

Motivated user / installed by system admin

Often easy to install / configure (brew, apt, yum, etc.)

What's special about Python?

reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages

What's special about Python?

  • Different versions of Python

  • Many Python packages with different versions and dependencies

  • Many environment management approaches

  • Python environments are often already configured for other uses outside of your particular R interface use case

reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages

What's special about Python?

  • Different versions of Python

  • Many Python packages with different versions and dependencies

  • Many environment management approaches

  • Python environments are often already configured for other uses outside of your particular R interface use case

reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages

It is inevitable that at some point a user will need to do something manually in their system outside of R to get their Python environment installed or configured properly

rminiconda

  • Install miniconda Python in an isolated, "namespaced" location that can be fully customized for your particular use case

  • Provides utilities for making this installation and configuration part of an R package setup

  • These installations do not interfere with any other Python installation on your system

  • Works on Linux, MacOS, and Windows

rminiconda

  • Install miniconda Python in an isolated, "namespaced" location that can be fully customized for your particular use case

  • Provides utilities for making this installation and configuration part of an R package setup

  • These installations do not interfere with any other Python installation on your system

  • Works on Linux, MacOS, and Windows

Goal: Give R users access to anything in Python without them knowing they are using Python

Why miniconda?

  • Relatively small
  • Self-contained install option
  • Fast install
  • Easy to install on major platforms

"Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages."

Install

# install.packages("remotes")
remotes::install_github("hafen/rminiconda")

Use case 1:

 

I'm a data scientist using reticulate and I want a "clean" separate Python installation

install_miniconda()

Places an isolated miniconda installation in a subdirectory of a base directory housing all installations made through rminiconda

OS Base Directory Location
Windows %APPDATA%\rminiconda
Linux ~/.rminiconda
MacOS ~/Library/rminiconda

Installing miniconda

rminiconda::install_miniconda(version = 2, name = "my_python")

# Using path for conda installation:
#   /Users/hafen/Library/rminiconda/my_python
# Downloading miniconda installer...
# Source: https://repo.anaconda.com/miniconda/Miniconda2-latest-MacOSX-x86_64.sh
# Destination: /Users/hafen/Library/rminiconda/my_python
# trying URL 'https://repo.anaconda.com/miniconda/Miniconda2-latest-MacOSX-x86_64.sh'
# Content type 'application/x-sh' length 44325091 bytes (42.3 MB)
# ==================================================
# downloaded 42.3 MB
#
# By installing, you accept the Conda license:
#   https://conda.io/en/latest/license.html
# Installing isolated miniconda distribution...
# ...
# ...
# miniconda installation successful!

Using miniconda

# get the path to the binary and set reticulate to use it
py <- rminiconda::find_miniconda_python("my_python")
reticulate::use_python(py, required = TRUE)

Use case 2:

 

I'm an R package developer and I want my package users to not worry about anything related to Python

Case study: kitools

  • Utilities for data scientists working in the "knowledge integration" (ki) group at a large non-profit
  • Mostly R users but need to support Python as well
  • Complex logic -- don't want to maintain two independent codebases
  • Build the package in Python and use reticulate to port it to R
  • Our users won't use it if it's not easy to install

zzz.R

#' @import rminiconda
.onLoad <- function(libname, pkgname) 
  # side effects (bad!) but can lead to trouble if not unset
  Sys.setenv(PYTHONHOME = "")
  Sys.setenv(PYTHONPATH = "")
  # make sure Python is configured for this package
  is_configured(msg = packageStartupMessage)
}

#' Check to see if the kitools Python environment has been configured
#' @param msg What function to use for messages (could be called at package startup or elsewhere in the package)
is_configured <- function(msg = message) {
  # should also check that the required packages are installed
  if (!rminiconda::is_miniconda_installed("kitools")) {
    msg("It appears that kitools has not been configured...")
    msg("Run 'kitools_configure()' for a one-time setup.")
    return (FALSE)
  } else {
    py <- rminiconda::find_miniconda_python("kitools")
    reticulate::use_python(py, required = TRUE)
    return (TRUE)
  }
}
kitools_configure()
#' One-time configuration of environment for kitools
#'
#' @details This installs an isolated Python distribution along with required dependencies so that the kitools R package can seamlessly wrap the kitools Python package.
#' @export
kitools_configure <- function() {
  # install isolated miniconda
  if (!rminiconda::is_miniconda_installed("kitools"))
    rminiconda::install_miniconda(version = 3, name = "kitools")
  # install python packages
  py <- rminiconda::find_miniconda_python("kitools")
  rminiconda::rminiconda_pip_install("beautifultable", "kitools")
  rminiconda::rminiconda_pip_install("synapseclient", "kitools")
  rminiconda::rminiconda_pip_install("kitools", "kitools",
    "-i https://test.pypi.org/simple/ kitools")

  reticulate::use_python(py, required = TRUE)
}

User's Experience

> library(kitools)
# It appears that kitools has not been configured...
# Run 'kitools_configure()' for a one-time setup.
#
# Attaching package: ‘kitools’
>
> kitools_configure()
# Using path for conda installation:
#   /Users/hafen/Library/rminiconda/kitools
# Downloading miniconda installer...
# Source: https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
# Destination: /Users/hafen/Library/rminiconda/kitools
# trying URL 'https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh'
# Content type 'application/x-sh' length 54554851 bytes (52.0 MB)
# ==================================================
# downloaded 52.0 MB
#
# By installing, you accept the Conda license:
#   https://conda.io/en/latest/license.html
# Installing isolated miniconda distribution...
# ...

Considerations

  • Disk space: a basic Miniconda install with a few additional packages is ~250Mb
  • If developing many packages with a common theme, use and maintain the same rminiconda "namespace"
  • Versions: support specific versions of Python? (Miniconda versions do not match Python versions)
  • Other convenience functions?

Thoughts on reticulate interfaces

  • Build wrapped R-natural interfaces on Python packages? Or import classes and use methods?
    • Class methods are a bit unnatural for R users
      • obj$method(...) <-> method(obj, ...)
    • How to document / discover obj$method()?
    • Wrapping is probably always best
    • A best practices document might help set standards for quality
  • A few notes on reticulate
    • Issues with interactive input in RStudio IDE
    • Handling Python errors / print methods

Thank You

Made with Slides.com