MLOps

https://renatocf.xyz/cain-2025-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
 Technical University of Eindhoven (TUe) / Tilburg University (TiU) – The Netherlands

From Data Science to
ML-Enabled Systems

Former Principal ML Engineer at Elo7 (BR)
4 years of industry experience designing, building, and operating ML products with multidisciplinary teams

B.Sc. and M.Sc. at University of São Paulo (BR)

Theoretical and practical experience with Machine Learning and Software Engineering

Scientific Programmer at JADS (NL)

Currently participating of the MARIT-D European project, using ML techniques for more secure seas

Ph.D. candidate at USP + JADS

Research about SE4AI, in particular about MLOps and software architecture of ML-Enabled Systems

Renato Cordeiro Ferreira

https://renatocf.xyz/contacts

My goal is to
explain the fundamentals of
Continuous Delivery for Machine Learning
the key idea behind putting
models into production

Continuous Delivery
Machine Learning

Data

Model

Code

Schema

Sampling

Volume

Algorithms

More Training

Experiments

Business Needs

Bug Fixes

Configuration

Axis of Change for ML

Based on "Continuous Delivery for Machine Learning", by Danilo Sato, Arif Wider, and Christoph Windheuser -- https://martinfowler.com/articles/cd4ml.html

"Continuous Delivery is the ability to get changes of all types -- including new features, configuration changes, bug fixes, and experiments -- into production, or in the hands of uses, safely and quickly in a sustainable way."

-- Jez Humble and David Farley

Continuous Delivery

"Continuous Delivery for Machine Learning is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles."

-- Danilo Sato, Arif Wider, Christoph Windheuser

"Continuous Delivery for Machine Learning is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles."

-- Danilo Sato, Arif Wider, Christoph Windheuser

Continuous Delivery for Machine Learning

Software engineering approach
It enables teams to efficiently produce high quality software

Cross-functional team
Experts with different skill sets and workflows across data engineering, data science, machine learning engineering, development, operations, and other knowledge areas are working together in a collaborative way emphasising the skills and strengths of each team member

Continuous Delivery for Machine Learning

Producing software based on code, data and models
All artifacts of the ML software production process require different tools and workflows that must be versioned and managed accordingly

Small and safe increments
The release of the software artifacts is divided into small increments, which allows visibility and control around the levels of variance of its outcomes, adding safety into the process

Continuous Delivery for Machine Learning

Reproducible and reliable software release
While the model ouputs can be non-deterministic and hard to reproduce, the process of releasing ML software into production is reliable and reproducible, leveraging automation as much as possible

Software release at any time
It is important that the ML software could be delivered into production at any time. Even if organizations do not want to deliver software all the time, it should always be in a releasable state. This makes the decision about when to release it a business decision rather than a technical one

Continuous Delivery for Machine Learning

Short adaptation cycles
Short cycles means development cycles are in the order of days or even hours, not weeks, months or even years. Automation of the process with quality built in is key to achieve this. This creates a feedback loop that allows you to adapt your models by learning from its behavior in production

Continuous Delivery for Machine Learning

"Continuous Delivery for Machine Learning is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles."

-- Danilo Sato, Arif Wider, Christoph Windheuser

Continuous Delivery for Machine Learning

Reference Architecture
for ML-Enabled Systems

Serving
Serving
Continuous Training
Continuous Training
Monitoring
Monitoring
Continuous Delivery
Continuous Delivery
Data Acquisition
Data Acquisition
Development
Development
Data
Store
Data...
Feature
Store
Feature...
Data
Pipeline
Data...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Metadata
Store
Metadata...
Raw Data
Store
Raw Data...
Model
Registry
Model...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Scheduler
Service
Scheduler...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
Streaming
Prediction
Service
Streaming...
Batch
Prediction
Pipeline
Batch...
API
Prediction Service
API...
Prediction
Store
Prediction...
Code Editor
IDE
Code Editor...
Notebooks
Notebooks
Physical
Sensors

(IoT)
Physical...
Data
Crawlers
Data...
Labeling
Application
(Mechanical Turk)
Labeling...
Label
Store
Label...
CD
Pipeline
CD...
Consumer
Application
Consumer...
Monitoring
Service
Monitoring...
[ trigger ]
[ trigger ]
Data
Collection
Application
Data...
1
1
2
2
3
3
5
5
I
I
II
II
III
III
6
6
7
7
IV
IV
VI
VI
V
V
13
13
12
12
11
11
VII
VII
10
10
8
8
A
A
B
B
D
D
H
H
G
G
F
F
E
E
J
J
C
C
[ build ]
[ build ]
[ trigger ]
[ trigger ]
[ trigger ]
[ trigger ]
Governance
Application
Governance...
[ trigger ]
[ trigger ]
9
9
Manual
Trigger
Manual...
Metrics
Store
Metrics...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
I
I
Queue
Broker
Queue...
K
K
[ train ]
[ train ]
[ redeploy ]
[ redeploy ]
[ run ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Telemetry
Store
Telemetry...
L
L
3rd party
Application
(External)
3rd party...
4
4
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...

Doctoral Symposium - CAIN 2025

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/cain25-paper

read / write
request / response
Service
Application
Pipeline
Storage
Subsystem
Legend
1
I
A
Service Label
Pipeline Label
Storage Label
continuation
start
[action]
Data Flow
Execution Flow

Case Studies

read / write
read / write
request / response
request / respon...
Service
Application
Service...
Pipeline
Pipeline
Storage
Storage
Subsystem
Subsystem
Legend
Legend
1
1
I
I
A
A
Service Label
Service Label
Pipeline Label
Pipeline Label
Storage Label
Storage Label
continuation
continuation
start
start
[action]
[action]
Data Flow
Data Flow
Execution Flow
Execution Flow

Doctoral Symposium - ECSA 2025

A Tale of Two Systems:
Characterizing Architectural Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/ecsa25-paper

Serving
Serving
Continuous Training
Continuous Training
Monitoring
Monitoring
Continuous Delivery
Continuous Delivery
Data Acquisition
Data Acquisition
Development
Development
Data
Store
Data...
Feature
Store
Feature...
Data
Pipeline
Data...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Metadata
Store
Metadata...
Raw Data
Store
Raw Data...
Model
Registry
Model...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Scheduler
Service
Scheduler...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
Streaming
Prediction
Service
Streaming...
Batch
Prediction
Pipeline
Batch...
API
Prediction Service
API...
Prediction
Store
Prediction...
Code Editor
IDE
Code Editor...
Notebooks
Notebooks
Physical
Sensors

(IoT)
Physical...
Data
Crawlers
Data...
Labeling
Application
(Mechanical Turk)
Labeling...
Label
Store
Label...
CD
Pipeline
CD...
Consumer
Application
Consumer...
Monitoring
Service
Monitoring...
[ trigger ]
[ trigger ]
Data
Collection
Application
Data...
1
1
2
2
3
3
5
5
I
I
II
II
III
III
6
6
7
7
IV
IV
VI
VI
V
V
13
13
12
12
11
11
VII
VII
10
10
8
8
A
A
B
B
D
D
H
H
G
G
F
F
E
E
J
J
C
C
[ build ]
[ build ]
[ trigger ]
[ trigger ]
[ trigger ]
[ trigger ]
Governance
Application
Governance...
[ trigger ]
[ trigger ]
9
9
Manual
Trigger
Manual...
Metrics
Store
Metrics...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
I
I
Queue
Broker
Queue...
K
K
[ train ]
[ train ]
[ redeploy ]
[ redeploy ]
[ run ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Telemetry
Store
Telemetry...
L
L
3rd party
Application
(External)
3rd party...
4
4
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Continuous Delivery
Continuous Delivery
Data Collection
Data Collection
Continuous Training
Continuous Training
Developing
Developing
Serving
Serving
Monitoring
Monitoring
[trigger]
[trigger]
Data
Collection
App
Data...
Data
Collection
API
Data...
Document
DB
Document...
KV DB
(Audio)
KV DB...
Model
Registry
Model...
Metadata
Store
Metadata...
Streaming
Prediction
Server
Streaming...
Client
API
Client...
KV DB
(Audio)
KV DB...
Feature
Store
Feature...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Message
Queue
Message...
CD
Pipeline
CD...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Client
App
Client...
Code Editor
IDE
Code Editor...
Governance
Application
Governance...
Scheduler
Service
Scheduler...
[train]
[train]
[trigger]
[trigger]
Label
Store
Label...
Hospital
API
Hospital...
Telemetry
Store
Telemetry...
1
1
2
2
3
3
I
I
II
II
A
A
B
B
C
C
D
D
E
E
I
I
7
7
6
6
5
5
H
H
G
G
F
F
4
4
III
III
IV
IV
V
V
8
8
9
9
J
J
K
K
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...

Data Collection App
Scientific Initiation 2021
Francisco Wernke

Streaming
Prediction Server
+ Client API / App

Capstone Project 2022
Vitor Tamae

Highly Availability
with Kubernetes

Capstone Project 2023
Vitor Guidi

Redesign Continuous Training Subsystem
Capstone Project 2023
Daniel Lawand

CI/CD/CD4ML on
Training Pipeline

Capstone Project 2024
Lucas Quaresma
+ Roberto Bolgheroni

Continuous Training
Serving
Continuous Delivery
Data Acquisition
Development
Monitoring
Data
Store
Synthetic
Data Gen.
Pipeline
Rule-Based
Training
Pipeline
Metadata
Store
Raw Data
Store
Model
Registry
CI
Pipeline
Artifact
Registry
Scheduler
Service
Code
Repository
CD4ML
Pipeline
Batch
Prediction
Pipeline
API
Prediction Service
Prediction
Store
Code Editor
IDE
Notebooks
3rd party
Providers
Label
Store
CD
Pipeline
Web
Application
Physical
Sensors
(Marine Objects)
1
2
I
III
4
5
V
VII
VI
9
8
VIII
6
B
A
D
G
F
E
H
C
[ build ]
[ trigger ]
[ trigger ]
Manual
Trigger
[ redeploy ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
Data
Crawlers
3
Data
Augmentation
Pipeline
II
ML-Based
Training
Pipeline
IV
Governance
Application
7
Telemetry
Store
I
[ train ]
[ train ]
[ trigger ]

Research Team
MSc Students

Innovation Team
PDEng Trainees

Ui Dev Team
Hired Developers

Core Dev Team
Scientific Programmers

MLOps

https://renatocf.xyz/cain-2025-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
 Technical University of Eindhoven (TUe) / Tilburg University (TiU) – The Netherlands

From Data Science to
ML-Enabled Systems