MLflow for ML Lifecycle Management

 

Automatic animations between code

Dr. Srijith Rajamohan

 

MLflow for ML Experiment Management

Post-discovery experimentation

Discovery

Deployment

MLflow for ML Experiment Management

MLflow for ML Experiment Management

Model Registry

MLflow Tracking

  • Organized into experiments, and runs within experiments
  • Experiment runs save metrics, parameters, artifacts (output files, images etc.)
  • Models can be saved and version controlled
  • Models can also be saved in the Model registry
    • Enables discovery and reuse across an organization
    • Track model stages, i.e. staging -> production
    • Use directly for inference 
  • ML code can be saved in MLProjects for reproducibility

 

 

MLflow 

MLflow Storage

  • Storage consists of two components
    • Backend store - Store experiment and run parameters, metrics, tags and other metadata
      • File store
        • Path can be ./path_to_store or file:/path_to_store
      • Database store
        • mysql, sqlite, postgresql etc.
    • Artifact store - Store by-products of a model run such as images, files etc.
      • Stored in:
        •   Local file system
        •   S3,Azure blob storage, GCP, FTP, NFS, HDFS etc. 

 

 

  • High-level and concise API
  • Starting and managing MLflow runs, for e.g.
    • Log parameters
    • Log metrics
    • Save models
  • Emphasizes productivity
    • E.g. autolog() enables autologging for supported libraries

 

 

 

The Fluent API

import mlflow

mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()

with mlflow.start_run() as run:
    mlflow.log_param("my", "param")
    mlflow.log_metric("score", 100)
  • Low-level API
  • CRUD interface that translates directly to the REST API
  • Access run and experiment attributes such as metrics, parameters etc.

 

 

 

Tracking API

from mlflow.tracking import MlflowClient

# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()
experiment_id = client.create_experiment("Social NLP Experiments")
client.set_experiment_tag(experiment_id, "nlp.framework", "Spark NLP")

# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

      Fluent API

 

  • Use this framework to:
    • Minimize boilerplate code
    • Manage a single run

Which one should I use?

     Tracking API

 

  • Use this framework to:
    • Get access to all runs
    • Have access to the full functionality of MLflow

Tracking Server

mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root PATH_TO_WORKING_FOLDER/artifacts \
    --host 0.0.0.0
    --port 5000

Set up a tracking server to use

  • The sqlite backend so that it can log models
  • The default artifact to store artifacts locally
    • Provide full path for the artifact store
  • Provide the host address and optionally the port 

MLflow Demo

MLProject

  • Enable reproducible research
  • Format for packaging ML code (not models) that can reside in
    •  Local directories
    •  Git repositories
  • MLProjects can be deployed and run
    •  Locally on a system
    •  Remotely on
      • Amazon Sagemaker
      • Databricks cluster
      • Kubernetes (experimental)

   

MLflow Projects

MLflow Projects 

$  (mlflow_demos)  mlflow_project % ls
MLProject		MLflow_training.py	conda.yaml

MLflow project structure is shown below

  • MLProject file (YAML file) that indicates how to run the project
  • ML code
  • Environment file
    • Conda environment
    • Docker

 

Call 'mlflow run' one level above this folder

MLProject file

name: My Project

conda_env: conda.yaml
# Can have a docker_env instead of a conda_env, e.g.
# docker_env:
#    image:  mlflow-docker-example

entry_points:
  main:
    parameters:
      data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
    command: "python MLflow_training.py {data_file}"

YAML file that contains the environment filename, entry point, commands (along with parameters)

 

MLflow Projects - Parameters

$  mlflow run mlflow_project 
  • You can also leave out the parameters
  • This will use the default parameters in the MLProject file
  • The data type can be
    • string
    • float
    • path
    • uri

Note that I used string instead of path as the data type in the example, so that I can use relative paths

MLflow Projects - Running a project

$  mlflow run mlflow_project -P data_file=../data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects - Running a project

$  mlflow run mlflow_project -P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects from Git

$  mlflow run https://github.com/sjster/MLflowAnsible#MLProject_folder 
  -P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv

MLflow Projects - Additional entrypoints

name: My Project

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
    command: "python MLflow_training.py {data_file}"
  validate:
    parameters:
      X_val: { type: string, default: "../data/X_val.csv" }
      y_val: { type: string, default: "../data/y_val.csv" }
    command: "python MLflow_validate.py {X_val} {y_val}"
mlflow run mlflow_project -e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv

Tracking Server - Run a Project

To run the MLproject with this tracking server, append the MLFLOW_TRACKING_URI before calling 'mlflow run'

MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow run mlflow_project \
        --experiment-name="XGBoost_mlflow_validate"  \
        -e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv

MLflow Models

  • MLflows models are a way for packaging models for reuse
    • Real-time inference using REST APIs
    • Batch inference with Apache Spark/other supported frameworks
  • Models are saved in different flavors
    • Supported frameworks ->
      • Python flavor to run the model as a Python function
      • Scikit-learn flavor can load the model as a Pipeline object

 

MLflow Models

Model Directory

mlflow_training % ls mlflow_project/my_local_model/
MLmodel			conda.yaml		model.pkl		requirements.txt



mlflow_training % cat mlflow_project/my_local_model/MLmodel
flavors:
  python_function:
    env: conda.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    python_version: 3.8.10
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.24.1
utc_time_created: '2021-11-02 19:48:30.135900'

           Save and Load Models

mlflow.sklearn.save_model(model, "my_local_model")
my_model_reload = mlflow.sklearn.load_model('my_local_model')
mlflow.sklearn.eval_and_log_metrics(my_model_reload, X_val, y_val, prefix="val_")

Out[27]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}

Logged Models

Using a Logged Model

logged_model = 'runs:/314035cfab2245d5ad266b84751dff8a/model'
model_loaded = mlflow.sklearn.load_model(logged_model)
mlflow.sklearn.eval_and_log_metrics(model_loaded, X_val, y_val, prefix="val_")

Out[27]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}
  • Load the logged model from a run
  • Note that we use mlflow.sklearn.load_model instead of mlflow.pyfunc.load_model

Text

Saved vs. Logged Models

  • Saved models
    • Download/copy/share the model folder
    • Reusability and portability
  • Logged models
    • Use this when you want to reuse a model from a previous run

Register a Model in the Model Registry

  • In addition to being logged, the model can be registered to the model registry
  • Enables discoverability and reusability
mlflow.sklearn.log_model(lr, 
                         artifact_path="artifacts", 
                         registered_model_name="lr")
# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")

Can also register the model from a run

Registered Models

Inspecting the Registered Model

Stage the model

Use the Registered Model

Use the Registered Model

# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")

model_loaded_from_registry = mlflow.sklearn.load_model(
    model_uri=f"models:/XGBoost_sr/1"
)

mlflow.sklearn.eval_and_log_metrics(model_loaded_from_registry, X_val, y_val, prefix="val_")

Out[31]: {'val_precision_score': 0.7968969091728362,
 'val_recall_score': 0.805170239596469,
 'val_f1_score': 0.799325453841428,
 'val_accuracy_score': 0.805170239596469,
 'val_log_loss': 0.406791751504339,
 'val_roc_auc_score': 0.8524996656137788,
 'val_score': 0.805170239596469}

Serve the Model

mlflow models serve -m my_local_model

2021/11/03 17:09:03 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2021/11/03 17:09:04 INFO mlflow.utils.conda: === Creating conda environment mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 ===
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
...
...

2021/11/03 17:10:24 INFO mlflow.pyfunc.backend: === Running command 'source /databricks/conda/bin/../etc/profile.d/conda.sh && conda activate mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2021-11-03 17:10:24 +0000] [23870] [INFO] Starting gunicorn 20.1.0
[2021-11-03 17:10:24 +0000] [23870] [INFO] Listening at: http://127.0.0.1:5000 (23870)
[2021-11-03 17:10:24 +0000] [23870] [INFO] Using worker: sync
[2021-11-03 17:10:24 +0000] [23877] [INFO] Booting worker with pid: 23877
  

Serve the Model - Requests

curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
    {"customerID": "8232-CTLKO", "gender": "Female", "SeniorCitizen": 0,
 "Partner": "Yes",
 "Dependents": "Yes",
 "tenure": 66,
 "PhoneService": "Yes",
 "MultipleLines": "No",
 "InternetService": "DSL",
 "OnlineSecurity": "Yes",
 "OnlineBackup": "No",
 "DeviceProtection": "No",
 "TechSupport": "No",
 "StreamingTV": "Yes",
 "StreamingMovies": "No",
 "Contract": "Two year",
 "PaperlessBilling": "Yes",
 "PaymentMethod": "Electronic check",
 "MonthlyCharges": 59.75,
 "TotalCharges": 3996.8}
]'

Thank you

Questions?