MLflow for ML Lifecycle Management

Automatic animations between code

Dr. Srijith Rajamohan

MLflow for ML Experiment Management

Post-discovery experimentation

Discovery

Deployment

MLflow for ML Experiment Management

Model Registry

MLflow Tracking

Organized into experiments, and runs within experiments
Experiment runs save metrics, parameters, artifacts (output files, images etc.)
Models can be saved and version controlled
Models can also be saved in the Model registry
- Enables discovery and reuse across an organization
- Track model stages, i.e. staging -> production
- Use directly for inference
ML code can be saved in MLProjects for reproducibility

MLflow

MLflow Storage

Storage consists of two components
- Backend store - Store experiment and run parameters, metrics, tags and other metadata
  - File store
    - Path can be ./path_to_store or file:/path_to_store
  - Database store
    - mysql, sqlite, postgresql etc.
- Artifact store - Store by-products of a model run such as images, files etc.
  - Stored in:
    - Local file system
    - S3,Azure blob storage, GCP, FTP, NFS, HDFS etc.

High-level and concise API
Starting and managing MLflow runs, for e.g.
- Log parameters
- Log metrics
- Save models
Emphasizes productivity
- E.g. autolog() enables autologging for supported libraries

The Fluent API

import mlflow

mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()

with mlflow.start_run() as run:
    mlflow.log_param("my", "param")
    mlflow.log_metric("score", 100)

Low-level API
CRUD interface that translates directly to the REST API
Access run and experiment attributes such as metrics, parameters etc.

Tracking API

from mlflow.tracking import MlflowClient

# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()
experiment_id = client.create_experiment("Social NLP Experiments")
client.set_experiment_tag(experiment_id, "nlp.framework", "Spark NLP")

# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

Fluent API

Use this framework to:
- Minimize boilerplate code
- Manage a single run

Which one should I use?

Tracking API

Use this framework to:
- Get access to all runs
- Have access to the full functionality of MLflow

Tracking Server

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root PATH_TO_WORKING_FOLDER/artifacts \
    --host 0.0.0.0
    --port 5000

Set up a tracking server to use

The sqlite backend so that it can log models
The default artifact to store artifacts locally
- Provide full path for the artifact store
Provide the host address and optionally the port

MLflow Demo

https://github.com/sjster/MLflowAnsible/blob/master/MLProject_folder/train_diabetes.py

https://github.com/sjster/MLflowAnsible/blob/master/MLProject_folder/MLflow_training.py

Tracking UI

MLProject

Enable reproducible research
Format for packaging ML code (not models) that can reside in
- Local directories
- Git repositories
MLProjects can be deployed and run
- Locally on a system
- Remotely on
  - Amazon Sagemaker
  - Databricks cluster
  - Kubernetes (experimental)

MLflow Projects

$  (mlflow_demos)  mlflow_project % ls
MLProject		MLflow_training.py	conda.yaml

MLflow project structure is shown below

MLProject file (YAML file) that indicates how to run the project
ML code
Environment file
- Conda environment
- Docker

Call 'mlflow run' one level above this folder

MLProject file

name: My Project

conda_env: conda.yaml
# Can have a docker_env instead of a conda_env, e.g.
# docker_env:
#    image:  mlflow-docker-example

entry_points:
  main:
    parameters:
      data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
    command: "python MLflow_training.py {data_file}"

YAML file that contains the environment filename, entry point, commands (along with parameters)

MLflow Projects - Parameters

$  mlflow run mlflow_project

You can also leave out the parameters
This will use the default parameters in the MLProject file
The data type can be
- string
- float
- path
- uri

Note that I used string instead of path as the data type in the example, so that I can use relative paths

MLflow Projects - Running a project

$  mlflow run mlflow_project -P data_file=../data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects - Running a project

$  mlflow run mlflow_project -P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects from Git

$  mlflow run https://github.com/sjster/MLflowAnsible#MLProject_folder 
  -P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects - Additional entrypoints

name: My Project

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
    command: "python MLflow_training.py {data_file}"
  validate:
    parameters:
      X_val: { type: string, default: "../data/X_val.csv" }
      y_val: { type: string, default: "../data/y_val.csv" }
    command: "python MLflow_validate.py {X_val} {y_val}"

mlflow run mlflow_project -e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv

Tracking Server - Run a Project

To run the MLproject with this tracking server, append the MLFLOW_TRACKING_URI before calling 'mlflow run'

MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow run mlflow_project \
        --experiment-name="XGBoost_mlflow_validate"  \
        -e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv

MLflow Models

MLflows models are a way for packaging models for reuse
- Real-time inference using REST APIs
- Batch inference with Apache Spark/other supported frameworks
Models are saved in different flavors
- Supported frameworks ->
  - Python flavor to run the model as a Python function
  - Scikit-learn flavor can load the model as a Pipeline object

MLflow Models

Model Directory

mlflow_training % ls mlflow_project/my_local_model/
MLmodel			conda.yaml		model.pkl		requirements.txt



mlflow_training % cat mlflow_project/my_local_model/MLmodel
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
utc_time_created: '2021-11-02 19:48:30.135900'

Save and Load Models

mlflow.sklearn.save_model(model, "my_local_model")
my_model_reload = mlflow.sklearn.load_model('my_local_model')
mlflow.sklearn.eval_and_log_metrics(my_model_reload, X_val, y_val, prefix="val_")

Out[27]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}

https://github.com/sjster/MLflowAnsible/blob/master/MLProject_folder/MLflow_validate.py

Logged Models

Using a Logged Model

logged_model = 'runs:/314035cfab2245d5ad266b84751dff8a/model'
model_loaded = mlflow.sklearn.load_model(logged_model)
mlflow.sklearn.eval_and_log_metrics(model_loaded, X_val, y_val, prefix="val_")

Out[27]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}

Load the logged model from a run
Note that we use mlflow.sklearn.load_model instead of mlflow.pyfunc.load_model

Text

https://github.com/sjster/MLflowAnsible/blob/master/MLProject_folder/MLflow_inference.py

Saved vs. Logged Models

Saved models
- Download/copy/share the model folder
- Reusability and portability
Logged models
- Use this when you want to reuse a model from a previous run

Register a Model in the Model Registry

In addition to being logged, the model can be registered to the model registry
Enables discoverability and reusability

mlflow.sklearn.log_model(lr, 
                         artifact_path="artifacts", 
                         registered_model_name="lr")

# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")

Can also register the model from a run

Registered Models

Inspecting the Registered Model

Stage the model

Use the Registered Model

# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")

model_loaded_from_registry = mlflow.sklearn.load_model(
    model_uri=f"models:/XGBoost_sr/1"
)

mlflow.sklearn.eval_and_log_metrics(model_loaded_from_registry, X_val, y_val, prefix="val_")

Out[31]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}

https://github.com/sjster/MLflowAnsible/blob/master/MLProject_folder/MLflow_inference.py

Serve the Model

mlflow models serve -m my_local_model

2021/11/03 17:09:03 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2021/11/03 17:09:04 INFO mlflow.utils.conda: === Creating conda environment mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 ===
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
...
...

2021/11/03 17:10:24 INFO mlflow.pyfunc.backend: === Running command 'source /databricks/conda/bin/../etc/profile.d/conda.sh && conda activate mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2021-11-03 17:10:24 +0000] [23870] [INFO] Starting gunicorn 20.1.0
[2021-11-03 17:10:24 +0000] [23870] [INFO] Listening at: http://127.0.0.1:5000 (23870)
[2021-11-03 17:10:24 +0000] [23870] [INFO] Using worker: sync
[2021-11-03 17:10:24 +0000] [23877] [INFO] Booting worker with pid: 23877

Serve the Model - Requests

curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
    {"customerID": "8232-CTLKO", "gender": "Female", "SeniorCitizen": 0,
 "Partner": "Yes",
 "Dependents": "Yes",
 "tenure": 66,
 "PhoneService": "Yes",
 "MultipleLines": "No",
 "InternetService": "DSL",
 "OnlineSecurity": "Yes",
 "OnlineBackup": "No",
 "DeviceProtection": "No",
 "TechSupport": "No",
 "StreamingTV": "Yes",
 "StreamingMovies": "No",
 "Contract": "Two year",
 "PaperlessBilling": "Yes",
 "PaymentMethod": "Electronic check",
 "MonthlyCharges": 59.75,
 "TotalCharges": 3996.8}
]'

Thank you

Questions?