Automatic animations between code
Dr. Srijith Rajamohan
Post-discovery experimentation
Discovery
Deployment
Model Registry
import mlflow
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()
with mlflow.start_run() as run:
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
from mlflow.tracking import MlflowClient
# Create an experiment with a name that is unique and case sensitive.
client = MlflowClient()
experiment_id = client.create_experiment("Social NLP Experiments")
client.set_experiment_tag(experiment_id, "nlp.framework", "Spark NLP")
# Fetch experiment metadata information
experiment = client.get_experiment(experiment_id)
print("Name: {}".format(experiment.name))
print("Experiment_id: {}".format(experiment.experiment_id))
print("Artifact Location: {}".format(experiment.artifact_location))
print("Tags: {}".format(experiment.tags))
print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))
Fluent API
Tracking API
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root PATH_TO_WORKING_FOLDER/artifacts \
--host 0.0.0.0
--port 5000
Set up a tracking server to use
$ (mlflow_demos) mlflow_project % ls
MLProject MLflow_training.py conda.yaml
MLflow project structure is shown below
Call 'mlflow run' one level above this folder
name: My Project
conda_env: conda.yaml
# Can have a docker_env instead of a conda_env, e.g.
# docker_env:
# image: mlflow-docker-example
entry_points:
main:
parameters:
data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
command: "python MLflow_training.py {data_file}"
YAML file that contains the environment filename, entry point, commands (along with parameters)
$ mlflow run mlflow_project
Note that I used string instead of path as the data type in the example, so that I can use relative paths
$ mlflow run mlflow_project -P data_file=../data/WA_Fn-UseC_-Telco-Customer-Churn.csv
$ mlflow run mlflow_project -P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv
$ mlflow run https://github.com/sjster/MLflowAnsible#MLProject_folder
-P data_file=data/WA_Fn-UseC_-Telco-Customer-Churn.csv
name: My Project
conda_env: conda.yaml
entry_points:
main:
parameters:
data_file: { type: string, default: "../data/WA_Fn-UseC_-Telco-Customer-Churn.csv" }
command: "python MLflow_training.py {data_file}"
validate:
parameters:
X_val: { type: string, default: "../data/X_val.csv" }
y_val: { type: string, default: "../data/y_val.csv" }
command: "python MLflow_validate.py {X_val} {y_val}"
mlflow run mlflow_project -e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv
To run the MLproject with this tracking server, append the MLFLOW_TRACKING_URI before calling 'mlflow run'
MLFLOW_TRACKING_URI=http://0.0.0.0:5000 mlflow run mlflow_project \
--experiment-name="XGBoost_mlflow_validate" \
-e validate -P X_val=../data/X_val.csv -P y_val=../data/y_val.csv
mlflow_training % ls mlflow_project/my_local_model/
MLmodel conda.yaml model.pkl requirements.txt
mlflow_training % cat mlflow_project/my_local_model/MLmodel
flavors:
python_function:
env: conda.yaml
loader_module: mlflow.sklearn
model_path: model.pkl
python_version: 3.8.10
sklearn:
pickled_model: model.pkl
serialization_format: cloudpickle
sklearn_version: 0.24.1
utc_time_created: '2021-11-02 19:48:30.135900'
mlflow.sklearn.save_model(model, "my_local_model")
my_model_reload = mlflow.sklearn.load_model('my_local_model')
mlflow.sklearn.eval_and_log_metrics(my_model_reload, X_val, y_val, prefix="val_")
Out[27]: {'val_precision_score': 0.7968969091728362,
'val_recall_score': 0.805170239596469,
'val_f1_score': 0.799325453841428,
'val_accuracy_score': 0.805170239596469,
'val_log_loss': 0.406791751504339,
'val_roc_auc_score': 0.8524996656137788,
'val_score': 0.805170239596469}
logged_model = 'runs:/314035cfab2245d5ad266b84751dff8a/model'
model_loaded = mlflow.sklearn.load_model(logged_model)
mlflow.sklearn.eval_and_log_metrics(model_loaded, X_val, y_val, prefix="val_")
Out[27]: {'val_precision_score': 0.7968969091728362,
'val_recall_score': 0.805170239596469,
'val_f1_score': 0.799325453841428,
'val_accuracy_score': 0.805170239596469,
'val_log_loss': 0.406791751504339,
'val_roc_auc_score': 0.8524996656137788,
'val_score': 0.805170239596469}
Text
mlflow.sklearn.log_model(lr,
artifact_path="artifacts",
registered_model_name="lr")
# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")
Can also register the model from a run
# Get this id from the UI
result=mlflow.register_model('runs:/314035cfab2245d5ad266b84751dff8a/model', "XGBoost_sr")
model_loaded_from_registry = mlflow.sklearn.load_model(
model_uri=f"models:/XGBoost_sr/1"
)
mlflow.sklearn.eval_and_log_metrics(model_loaded_from_registry, X_val, y_val, prefix="val_")
Out[31]: {'val_precision_score': 0.7968969091728362,
'val_recall_score': 0.805170239596469,
'val_f1_score': 0.799325453841428,
'val_accuracy_score': 0.805170239596469,
'val_log_loss': 0.406791751504339,
'val_roc_auc_score': 0.8524996656137788,
'val_score': 0.805170239596469}
mlflow models serve -m my_local_model
2021/11/03 17:09:03 INFO mlflow.models.cli: Selected backend for flavor 'python_function'
2021/11/03 17:09:04 INFO mlflow.utils.conda: === Creating conda environment mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 ===
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done
...
...
2021/11/03 17:10:24 INFO mlflow.pyfunc.backend: === Running command 'source /databricks/conda/bin/../etc/profile.d/conda.sh && conda activate mlflow-a95404aa2487b42dc9f39755daafc1fe62e52876 1>&2 && gunicorn --timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} -- mlflow.pyfunc.scoring_server.wsgi:app'
[2021-11-03 17:10:24 +0000] [23870] [INFO] Starting gunicorn 20.1.0
[2021-11-03 17:10:24 +0000] [23870] [INFO] Listening at: http://127.0.0.1:5000 (23870)
[2021-11-03 17:10:24 +0000] [23870] [INFO] Using worker: sync
[2021-11-03 17:10:24 +0000] [23877] [INFO] Booting worker with pid: 23877
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
{"customerID": "8232-CTLKO", "gender": "Female", "SeniorCitizen": 0,
"Partner": "Yes",
"Dependents": "Yes",
"tenure": 66,
"PhoneService": "Yes",
"MultipleLines": "No",
"InternetService": "DSL",
"OnlineSecurity": "Yes",
"OnlineBackup": "No",
"DeviceProtection": "No",
"TechSupport": "No",
"StreamingTV": "Yes",
"StreamingMovies": "No",
"Contract": "Two year",
"PaperlessBilling": "Yes",
"PaymentMethod": "Electronic check",
"MonthlyCharges": 59.75,
"TotalCharges": 3996.8}
]'
Questions?