rminiconda
Clean Python management
when using R as an interface to Python
DSC 2019
Interfaces should be convenient
-
Programming a package to use an interface should be straightforward
-
The users of the application should be largely unaware of the interface
From John Chambers (Extending R)
Interfaces should be convenient
-
Programming a package to use an interface should be straightforward
-
The users of the application should be largely unaware of the interface
From John Chambers (Extending R)
reticulate accomplishes this very nicely
Except for one thing
166 / 429 issues (~39%)
Why is this a problem?
There are plenty of interfaces with system dependencies
- C, C++, Fortran
- Misc system requirements
- SQL databases, Spark, etc.
Why is this a problem?
There are plenty of interfaces with system dependencies
- C, C++, Fortran
- Misc system requirements
- SQL databases, Spark, etc.
Usually not a problem on Mac/Linux, CRAN binaries
Motivated user / installed by system admin
Often easy to install / configure (brew, apt, yum, etc.)
What's special about Python?
reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages
What's special about Python?
-
Different versions of Python
-
Many Python packages with different versions and dependencies
-
Many environment management approaches
-
Python environments are often already configured for other uses outside of your particular R interface use case
reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages
What's special about Python?
-
Different versions of Python
-
Many Python packages with different versions and dependencies
-
Many environment management approaches
-
Python environments are often already configured for other uses outside of your particular R interface use case
reticulate enables R to be not just an interface to Python, but many interfaces to many Python packages
It is inevitable that at some point a user will need to do something manually in their system outside of R to get their Python environment installed or configured properly
rminiconda
-
Install miniconda Python in an isolated, "namespaced" location that can be fully customized for your particular use case
-
Provides utilities for making this installation and configuration part of an R package setup
-
These installations do not interfere with any other Python installation on your system
-
Works on Linux, MacOS, and Windows
rminiconda
-
Install miniconda Python in an isolated, "namespaced" location that can be fully customized for your particular use case
-
Provides utilities for making this installation and configuration part of an R package setup
-
These installations do not interfere with any other Python installation on your system
-
Works on Linux, MacOS, and Windows
Goal: Give R users access to anything in Python without them knowing they are using Python
Why miniconda?
- Relatively small
- Self-contained install option
- Fast install
- Easy to install on major platforms
"Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages."
Another (messier) way: https://github.com/Sage-Bionetworks/PythonEmbedInR
Install
# install.packages("remotes")
remotes::install_github("hafen/rminiconda")
Use case 1:
I'm a data scientist using reticulate and I want a "clean" separate Python installation
install_miniconda()
Places an isolated miniconda installation in a subdirectory of a base directory housing all installations made through rminiconda
OS | Base Directory Location |
---|---|
Windows | %APPDATA%\rminiconda |
Linux | ~/.rminiconda |
MacOS | ~/Library/rminiconda |
Installing miniconda
rminiconda::install_miniconda(version = 2, name = "my_python")
# Using path for conda installation:
# /Users/hafen/Library/rminiconda/my_python
# Downloading miniconda installer...
# Source: https://repo.anaconda.com/miniconda/Miniconda2-latest-MacOSX-x86_64.sh
# Destination: /Users/hafen/Library/rminiconda/my_python
# trying URL 'https://repo.anaconda.com/miniconda/Miniconda2-latest-MacOSX-x86_64.sh'
# Content type 'application/x-sh' length 44325091 bytes (42.3 MB)
# ==================================================
# downloaded 42.3 MB
#
# By installing, you accept the Conda license:
# https://conda.io/en/latest/license.html
# Installing isolated miniconda distribution...
# ...
# ...
# miniconda installation successful!
Using miniconda
# get the path to the binary and set reticulate to use it
py <- rminiconda::find_miniconda_python("my_python")
reticulate::use_python(py, required = TRUE)
Use case 2:
I'm an R package developer and I want my package users to not worry about anything related to Python
Case study: kitools
- Utilities for data scientists working in the "knowledge integration" (ki) group at a large non-profit
- Mostly R users but need to support Python as well
- Complex logic -- don't want to maintain two independent codebases
- Build the package in Python and use reticulate to port it to R
- Our users won't use it if it's not easy to install
zzz.R
#' @import rminiconda
.onLoad <- function(libname, pkgname)
# side effects (bad!) but can lead to trouble if not unset
Sys.setenv(PYTHONHOME = "")
Sys.setenv(PYTHONPATH = "")
# make sure Python is configured for this package
is_configured(msg = packageStartupMessage)
}
#' Check to see if the kitools Python environment has been configured
#' @param msg What function to use for messages (could be called at package startup or elsewhere in the package)
is_configured <- function(msg = message) {
# should also check that the required packages are installed
if (!rminiconda::is_miniconda_installed("kitools")) {
msg("It appears that kitools has not been configured...")
msg("Run 'kitools_configure()' for a one-time setup.")
return (FALSE)
} else {
py <- rminiconda::find_miniconda_python("kitools")
reticulate::use_python(py, required = TRUE)
return (TRUE)
}
}
kitools_configure()
#' One-time configuration of environment for kitools
#'
#' @details This installs an isolated Python distribution along with required dependencies so that the kitools R package can seamlessly wrap the kitools Python package.
#' @export
kitools_configure <- function() {
# install isolated miniconda
if (!rminiconda::is_miniconda_installed("kitools"))
rminiconda::install_miniconda(version = 3, name = "kitools")
# install python packages
py <- rminiconda::find_miniconda_python("kitools")
rminiconda::rminiconda_pip_install("beautifultable", "kitools")
rminiconda::rminiconda_pip_install("synapseclient", "kitools")
rminiconda::rminiconda_pip_install("kitools", "kitools",
"-i https://test.pypi.org/simple/ kitools")
reticulate::use_python(py, required = TRUE)
}
User's Experience
> library(kitools)
# It appears that kitools has not been configured...
# Run 'kitools_configure()' for a one-time setup.
#
# Attaching package: ‘kitools’
>
> kitools_configure()
# Using path for conda installation:
# /Users/hafen/Library/rminiconda/kitools
# Downloading miniconda installer...
# Source: https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
# Destination: /Users/hafen/Library/rminiconda/kitools
# trying URL 'https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh'
# Content type 'application/x-sh' length 54554851 bytes (52.0 MB)
# ==================================================
# downloaded 52.0 MB
#
# By installing, you accept the Conda license:
# https://conda.io/en/latest/license.html
# Installing isolated miniconda distribution...
# ...
Considerations
- Disk space: a basic Miniconda install with a few additional packages is ~250Mb
- If developing many packages with a common theme, use and maintain the same rminiconda "namespace"
- Versions: support specific versions of Python? (Miniconda versions do not match Python versions)
- Other convenience functions?
Thoughts on reticulate interfaces
- Build wrapped R-natural interfaces on Python packages? Or import classes and use methods?
- Class methods are a bit unnatural for R users
-
obj$method(...) <-> method(obj, ...)
-
- How to document / discover obj$method()?
- Wrapping is probably always best
- A best practices document might help set standards for quality
- Class methods are a bit unnatural for R users
- A few notes on reticulate
- Issues with interactive input in RStudio IDE
- Handling Python errors / print methods
Thank You
rminiconda
By Ryan Hafen
rminiconda
- 2,878