Characterizing the Complexity of
Machine Learning-Enabled Systems

https://renatocf.xyz/jads25-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
Technical University of Eindhoven (TUe) / Tilburg University (TiU)  – The Netherlands

Paper

Slides

Former Principal ML Engineer at Elo7 (BR)
4 years of industry experience designing, building, and operating ML products with multidisciplinary teams

B.Sc. and M.Sc. at University of São Paulo (BR)

Theoretical and practical experience with Machine Learning and Software Engineering

Scientific Programmer at JADS (NL)

Currently participating of the MARIT-D European project, using ML techniques for more secure seas

Ph.D. candidate at USP + JADS

Research about SE4AI, in particular about MLOps and software architecture of ML-Enabled Systems

Renato Cordeiro Ferreira

https://renatocf.xyz/contacts

My goal is to
use metrics to identify
where complexity emerges
in the software architecture
of ML-enabled systems

Research Questions

What are the measurable dimensions of complexity
in the architecture of ML-enabled systems?

How can complexity metrics be operationalized
over the architecture of ML-enabled systems?

RQ1

RQ2

RQ3

How can complexity metrics be used to aid
the development, operation, and evolution

of real-world ML-enabled systems?

Research Questions

How can complexity metrics be used to choose between architecture proposals for an ML-enabled system?

How can complexity metrics be used to identify refactoring opportunities in an ML-enabled systems?

RQ3.1

RQ3.2

RQ3

How can complexity metrics be used to aid
the development, operation, and evolution

of real-world ML-enabled systems?

Continuous Delivery
Machine Learning

"Continuous Delivery for Machine Learning is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles."

-- Danilo Sato, Arif Wider, Christoph Windheuser

"Continuous Delivery for Machine Learning is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles."

-- Danilo Sato, Arif Wider, Christoph Windheuser

Continuous Delivery for Machine Learning

Data

Model

Code

Schema

Sampling

Volume

Algorithms

More Training

Experiments

Business Needs

Bug Fixes

Configuration

Axis of Change for ML

Based on "Continuous Delivery for Machine Learning", by Danilo Sato, Arif Wider, and Christoph Windheuser -- https://martinfowler.com/articles/cd4ml.html

Reference Architecture
for ML-Enabled Systems

Research Track - ECSA 2025

MLOps in Practice: Requirements and a Reference Architecture from Industry

https://renatocf.xyz/ecsa25-paper

Doctoral Symposium - CAIN 2025

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/cain25-paper

Serving
Serving
Continuous Training
Continuous Training
Monitoring
Monitoring
Continuous Delivery
Continuous Delivery
Data Acquisition
Data Acquisition
Development
Development
Data
Store
Data...
Feature
Store
Feature...
Data
Pipeline
Data...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Metadata
Store
Metadata...
Raw Data
Store
Raw Data...
Model
Registry
Model...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Scheduler
Service
Scheduler...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
Streaming
Prediction
Service
Streaming...
Batch
Prediction
Pipeline
Batch...
API
Prediction Service
API...
Prediction
Store
Prediction...
Code Editor
IDE
Code Editor...
Notebooks
Notebooks
Physical
Sensors

(IoT)
Physical...
Data
Crawlers
Data...
Labeling
Application
(Mechanical Turk)
Labeling...
Label
Store
Label...
CD
Pipeline
CD...
Consumer
Application
Consumer...
Monitoring
Service
Monitoring...
[ trigger ]
[ trigger ]
Data
Collection
Application
Data...
1
1
2
2
3
3
5
5
I
I
II
II
III
III
6
6
7
7
IV
IV
VI
VI
V
V
13
13
12
12
11
11
VII
VII
10
10
8
8
A
A
B
B
D
D
H
H
G
G
F
F
E
E
J
J
C
C
[ build ]
[ build ]
[ trigger ]
[ trigger ]
[ trigger ]
[ trigger ]
Governance
Application
Governance...
[ trigger ]
[ trigger ]
9
9
Manual
Trigger
Manual...
Metrics
Store
Metrics...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
I
I
Queue
Broker
Queue...
K
K
[ train ]
[ train ]
[ redeploy ]
[ redeploy ]
[ run ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Telemetry
Store
Telemetry...
L
L
3rd party
Application
(External)
3rd party...
4
4
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...

Doctoral Symposium - CAIN 2025

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/cain25-paper

read / write
request / response
Service
Application
Pipeline
Storage
Subsystem
Legend
1
I
A
Service Label
Pipeline Label
Storage Label
continuation
start
[action]
Data Flow
Execution Flow

Case Study
SPIRA

Making a Pipeline Production-Ready

https://renatocf.xyz/sadis25-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
Technical University of Eindhoven (TUe) / Tilburg University (TiU)  – The Netherlands

Paper

Slides

Challenges and Lessons Learned in the Healthcare Domain

Our paper describes
challenges and lessons learned
on evolving the training pipeline of SPIRA:
from a BIG BALL OF MUD (v1)
to a MODULAR MONOLITH (v2)
to a set of MICROSERVICES (v3).

Productionized Architecture (v3)
Trainer
Feature Engineer
Publisher
Evaluator
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Modularized Architecture (v2)
Trainer
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Experimental Architecture (v1)
Dataset Generation
Scripts
Model Training
Scripts
Dataset
Bad Separation of Concerns
v1 -> v2
Lack of Automated Testing
v2 -> v3
Extensibility
Maintainability
Robustness
Resiliency
Collaboration between
DS and MLEs
Testing as a
First-Class Concern
Continuous Delivery
Continuous Delivery
Data Collection
Data Collection
Continuous Training
Continuous Training
Developing
Developing
Serving
Serving
Monitoring
Monitoring
[trigger]
[trigger]
Data
Collection
App
Data...
Data
Collection
API
Data...
Document
DB
Document...
KV DB
(Audio)
KV DB...
Model
Registry
Model...
Metadata
Store
Metadata...
Streaming
Prediction
Server
Streaming...
Client
API
Client...
KV DB
(Audio)
KV DB...
Feature
Store
Feature...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Message
Queue
Message...
CD
Pipeline
CD...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Client
App
Client...
Code Editor
IDE
Code Editor...
Governance
Application
Governance...
Scheduler
Service
Scheduler...
[train]
[train]
[trigger]
[trigger]
Label
Store
Label...
Hospital
API
Hospital...
Telemetry
Store
Telemetry...
1
1
2
2
3
3
I
I
II
II
A
A
B
B
C
C
D
D
E
E
I
I
7
7
6
6
5
5
H
H
G
G
F
F
4
4
III
III
IV
IV
V
V
8
8
9
9
J
J
K
K
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...

Data Collection App
Scientific Initiation 2021
Francisco Wernke

Streaming
Prediction Server
+ Client API / App

Capstone Project 2022
Vitor Tamae

Highly Availability
with Kubernetes

Capstone Project 2023
Vitor Guidi

Redesign Continuous Training Subsystem
Capstone Project 2023
Daniel Lawand

CI/CD/CD4ML on
Training Pipeline

Capstone Project 2024
Lucas Quaresma
+ Roberto Bolgheroni

Case Study
OCEAN GUARD

MLOps with Microservices
A Case Study on the Maritime Domain

https://renatocf.xyz/ssoc25-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
Technical University of Eindhoven (TUe) / Tilburg University (TiU)  – The Netherlands

Paper

Slides

Our paper describes
challenges and lessons learned
on building OCEAN GUARD:
a system for anomaly detection in the
maritime domain

Published at SADIS @ ECSA 2025

Making a Pipeline Production-Ready:
Challenges and Lessons Learned
in the Healthcare Domain

https://renatocf.xyz/sadis25-paper

Document the expected formats of data exchange between two services or pipelines, which interact as consumer and producer via a data storage

Document the expected protocol of behavior between two services,
which interact synchronously or asynchronously via the network

Document the expected input and output between a trainer and a server, which interact by storing and loading models in a model registry

Code Contracts

Data Contracts

Model Contracts

Continuous Training
Serving
Continuous Delivery
Data Acquisition
Development
Monitoring
Data
Store
Synthetic
Data Gen.
Pipeline
Rule-Based
Training
Pipeline
Metadata
Store
Raw Data
Store
Model
Registry
CI
Pipeline
Artifact
Registry
Scheduler
Service
Code
Repository
CD4ML
Pipeline
Batch
Prediction
Pipeline
API
Prediction Service
Prediction
Store
Code Editor
IDE
Notebooks
3rd party
Providers
Label
Store
CD
Pipeline
Web
Application
Physical
Sensors
(Marine Objects)
1
2
I
III
4
5
V
VII
VI
9
8
VIII
6
B
A
D
G
F
E
H
C
[ build ]
[ trigger ]
[ trigger ]
Manual
Trigger
[ redeploy ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
Data
Crawlers
3
Data
Augmentation
Pipeline
II
ML-Based
Training
Pipeline
IV
Governance
Application
7
Telemetry
Store
I
[ train ]
[ train ]
[ trigger ]

Research Team
MSc Students

Innovation Team
PDEng Trainees

Ui Dev Team
Hired Developers

Core Dev Team
Scientific Programmers

Research Methodology

State of the art

about metrics

regarding

ML-Enabled

Systems

Industry- and academic-based case study on complexity metrics for
ML-Enabled Systems

Mixed-method approach to assess the impact of complexity in development tasks for
ML-Enabled Systems

I

C

E

R

E

C

Data from

knowledge bases

Researcher

Ontology

Design

Choice of

Case Studies

Selection of

Metrics

Constructed
Examples

 Sampling

Population of

Practitioners 

I

C

E

R

E

C

 Established

Publication

Databases
 

Guidelines

from

Literature

Inclusion

Criteria for

Case Studies

Exploratory +

Confirmatory

Case Studies

 Sampling

Population of

Practitioners 

Constructed
Examples

Threats to Validity

Construct Validity
The study can measure what it proposed to measure

Internal Validity
The study can produce the results it reported

External Validity
The study can be generalized to other contexts

Conclusion
The study can be replicated by other researchers

C

I

E

R

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/phd-quali-live

2025

Renato Cordeiro Ferreira

Supervisor: Prof. Dr. Alfredo Goldman

Co-Supervisor: Prof. Dr. Damian Tamburri

IME-USP

[JADS 2025] Characterizing the Complexity on ML-Enabled Systems

By Renato Cordeiro Ferreira

[JADS 2025] Characterizing the Complexity on ML-Enabled Systems

How can the complexity of ML-enabled systems be managed effectively? The goal of this research is to investigate how complexity affects ML-Enabled Systems (MLES). To address this question, this research aims to introduce a metrics-based architectural model to characterize the complexity of MLES. The goal of the metrics is to support architectural decisions, providing a guideline for the inception and evolution of these systems.

  • 41