Making a Pipeline Production-Ready

https://renatocf.xyz/sadis25-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
Technical University of Eindhoven (TUe) / Tilburg University (TiU)  – The Netherlands

Paper

Slides

Challenges and Lessons Learned in the Healthcare Domain

Former Principal ML Engineer at Elo7 (BR)

4 years of industry experience designing, building, and operating ML products with multidisciplinary teams

B.Sc. and M.Sc. at University of São Paulo (BR)

Theoretical and practical experience with Machine Learning and Software Engineering

Scientific Programmer at JADS (NL)

Currently participating in the MARIT-D European project, using ML techniques for more secure seas

Ph.D. candidate at USP + JADS (BR + NL)

Research about SE4AI, in particular about MLOps and the software architecture of ML-Enabled Systems

Renato Cordeiro Ferreira

https://renatocf.xyz/contacts

Renato Cordeiro Ferreira

Lucas Quaresma Lam

Daniel Lawand

renatocf@ime.usp.br

lucasqml08@alumni.usp.br

daniel.lawand@alumni.usp.br

Alfredo Goldman

gold@ime.usp.br

Marcelo Finger

mfinger@ime.usp.br

Roberto Oliveira Bolgheroni

robertobolgheroni@alumni.usp.br

Our paper describes
challenges and lessons learned
on evolving the training pipeline of SPIRA:
from a BIG BALL OF MUD (v1)
to a MODULAR MONOLITH (v2)
to a set of MICROSERVICES (v3).

Bad Separation of Concerns
v1 -> v2
Lack of Automated Testing
v2 -> v3
Extensibility
Maintainability
Robustness
Resiliency
Collaboration between
DS and MLEs
Testing as a
First-Class Concern

System
Architecture

Research Track - ECSA 2025

MLOps in Practice: Requirements and a Reference Architecture from Industry

https://renatocf.xyz/ecsa25-paper

Doctoral Symposium - CAIN 2025

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/cain25-paper

read / write
read / write
request / response
request / respon...
Service
Application
Service...
Pipeline
Pipeline
Storage
Storage
Subsystem
Subsystem
Legend
Legend
1
1
I
I
A
A
Service Label
Service Label
Pipeline Label
Pipeline Label
Storage Label
Storage Label
continuation
continuation
start
start
[action]
[action]
Data Flow
Data Flow
Execution Flow
Execution Flow
Serving
Serving
Continuous Training
Continuous Training
Monitoring
Monitoring
Continuous Delivery
Continuous Delivery
Data Acquisition
Data Acquisition
Development
Development
Data
Store
Data...
Feature
Store
Feature...
Data
Pipeline
Data...
Feature
Engineering
Pipeline
Feature...
Training
Pipeline
Training...
Metadata
Store
Metadata...
Raw Data
Store
Raw Data...
Model
Registry
Model...
CI
Pipeline
CI...
Artifact
Registry
Artifact...
Scheduler
Service
Scheduler...
Code
Repository
Code...
CD4ML
Pipeline
CD4ML...
Streaming
Prediction
Service
Streaming...
Batch
Prediction
Pipeline
Batch...
API
Prediction Service
API...
Prediction
Store
Prediction...
Code Editor
IDE
Code Editor...
Notebooks
Notebooks
Physical
Sensors

(IoT)
Physical...
Data
Crawlers
Data...
Labeling
Application
(Mechanical Turk)
Labeling...
Label
Store
Label...
CD
Pipeline
CD...
Consumer
Application
Consumer...
Monitoring
Service
Monitoring...
[ trigger ]
[ trigger ]
Data
Collection
Application
Data...
1
1
2
2
3
3
5
5
I
I
II
II
III
III
6
6
7
7
IV
IV
VI
VI
V
V
13
13
12
12
11
11
VII
VII
10
10
8
8
A
A
B
B
D
D
H
H
G
G
F
F
E
E
J
J
C
C
[ build ]
[ build ]
[ trigger ]
[ trigger ]
[ trigger ]
[ trigger ]
Governance
Application
Governance...
[ trigger ]
[ trigger ]
9
9
Manual
Trigger
Manual...
Metrics
Store
Metrics...
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
I
I
Queue
Broker
Queue...
K
K
[ train ]
[ train ]
[ redeploy ]
[ redeploy ]
[ run ]
[ run ]
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...
Telemetry
Store
Telemetry...
L
L
3rd party
Application
(External)
3rd party...
4
4
[  deploy  ]
[  update  ]
[ rollback ]
[  deploy  ]...

Doctoral Symposium - CAIN 2025

A Metrics-Oriented Architectural Model
to Characterize Complexity on
Machine Learning-Enabled Systems

https://renatocf.xyz/cain25-paper

Data Collection App
Scientific Initiation 2021
Francisco Wernke

Streaming
Prediction Server
+ Client API / App

Capstone Project 2022
Vitor Tamae

Highly Availability
with Kubernetes

Capstone Project 2023
Vitor Guidi

Redesign Continuous Training Subsystem
Capstone Project 2023
Daniel Lawand

CI/CD/CD4ML on
Training Pipeline

Capstone Project 2024
Lucas Quaresma
+ Roberto Bolgheroni

Pipeline
Reimplementation

Productionized Architecture (v3)
Trainer
Feature Engineer
Publisher
Evaluator
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Modularized Architecture (v2)
Trainer
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Experimental Architecture (v1)
Dataset Generation
Scripts
Model Training
Scripts
Dataset
random.seed(c.train_config["seed"])
torch.manual_seed(c.train_config["seed"])
torch.cuda.manual_seed(c.train_config["seed"])
np.random.seed(c.train_config["seed"])
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

self.c = c
self.ap = ap
self.train = train
self.test = test
self.test_insert_noise = test_insert_noise
self.num_test_additive_noise = num_test_additive_noise
self.num_test_specaug = num_test_specaug
self.dataset_csv = \
  c.dataset["train_csv"] if train else c.dataset["eval_csv"]

assert os.path.isfile(self.dataset_csv), \
  "Test or Train CSV file don't exists! Fix it in config.json")

accepted_tc = [ 'overlapping', 'padding', 'one_window' ]
assert self.c.dataset['temporal_control'] in accepted_tc), \
  "You cannot use the padding_with_max_length option with the \
   split_wav_using_overlapping option, disable one of them !!")

self.control_class = c.dataset['control_class']
self.patient_class = c.dataset['patient_class']

self.dataset_list = \
  pd.read_csv(self.dataset_csv, sep=',') \
  .replace({'?': -1}) \
  .replace({'negative': self.control_class}, regex=True) \
  .replace({'positive': self.patient_class}, regex=True) \
  .values

Misplaced

Responsabilities

Lines 1-7 handle random number generation

Lines 8-16 assign values to
attributes

Lines 18-19 and 22-24
handle assertions

Line 30 handles data loading

Productionized Architecture (v3)
Trainer
Feature Engineer
Publisher
Evaluator
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Modularized Architecture (v2)
Trainer
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Experimental Architecture (v1)
Dataset Generation
Scripts
Model Training
Scripts
Dataset
# Setup

config_path = ValidPath.from_str("/app/spira/spira.json")
config = load_config(config_path)

operation_mode = OperationMode.TRAIN
randomizer = initialize_random(config, operation_mode)

# Data Loading

patients_paths = read_valid_paths_from_csv(config.patients_csv)
controls_paths = read_valid_paths_from_csv(config.controls_csv)
noises_paths = read_valid_paths_from_csv(config.noises_csv)

patients_inputs = Audios.load(
  patients_paths, config.audio, config.dataset
)
controls_inputs = Audios.load(
  controls_paths, config.audio, config.dataset
)
noises = Audios.load(noises_paths, config)
  noises_path, config.audio, config.dataset
)

# Feature Engineering

audio_processor = create_audio_processor(config.audio)

patient_feature_transformer = create_audio_feature_transformer(
    randomizer, audio_processor, config, noises,
)
control_feature_transformer = create_audio_feature_transformer(
    randomizer, audio_processor, config, noises,
)

Design-Pattern

Modularization

Lines 15-23 handles data loading by using the
Audio ADAPTER

Line 27 builds an audio_processor via a CHAIN OF RESPONSIBILITY

Lines 29-34 build two
feature_transformer's
via a STRATEGY

Productionized Architecture (v3)
Trainer
Feature Engineer
Publisher
Evaluator
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Modularized Architecture (v2)
Trainer
Data
Loading
Dataset
Generation
Feature
Engineering
Model
Training
Model
Evaluation
Model
Publishing
Audios
Features
Model
Dataset
Model + Metrics
Experimental Architecture (v1)
Dataset Generation
Scripts
Model Training
Scripts
Dataset
Bad Separation of Concerns
v1 -> v2
Lack of Automated Testing
v2 -> v3
Extensibility
Maintainability
Robustness
Resiliency
Collaboration between
DS and MLEs
Testing as a
First-Class Concern

By bringing ML Engineers to
work with Data Scientists and
employing automated testing
since the beginning
,
projects may reach production sooner

Making a Pipeline Production-Ready

https://renatocf.xyz/sadis25-slides

2025

Renato Cordeiro Ferreira

Institute of Mathematics and Statistics (IME)
University of São Paulo (USP)  Brazil

Jheronimus Academy of Data Science (JADS)
Technical University of Eindhoven (TUe) / Tilburg University (TiU)  – The Netherlands

Paper

Slides

Challenges and Lessons Learned in the Healthcare Domain

[SADIS - ECSA 2025] Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

By Renato Cordeiro Ferreira

[SADIS - ECSA 2025] Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

Deploying a Machine Learning (ML) training pipeline into production requires good software engineering practices. Unfortunately, the typical data science workflow often leads to code that lacks critical software quality attributes. This experience report investigates this problem in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. This paper presents an overview of the architecture of the MLES, then compares three versions of its Continuous Training subsystem: from a proof of concept Big Ball of Mud (v1), to a design pattern-based Modular Monolith (v2), to a test-driven set of Microservices (v3). Each version improved its overall extensibility, maintainability, robustness, and resiliency. The paper shares challenges and lessons learned in this process, offering insights for researchers and practitioners seeking to productionize their pipelines.

  • 46