Implementation Details

LAB

BECM33MLE - Ing. David Pařil

Introduction + Agenda

Building End-2-End ML apps

Your projects are not just about the models. Today we will focus on implementation details, that may bring you a step further to production release.

Configuration

Secrets

Auth

Storage

Accessibility

UI/UX design

Observability

Performance tests

CI/CD

Privacy/GDPR

Resilience

LLM implementation

Configuration And Secrets

Do not hard-code configurations or secrets in your source code.

Externalize configs (e.g. use environment variables or config files) and load them at runtime.

This makes an application more portable and prevents leaking of sensitive data. Make sure to use versioning in your configs.

Separate config from code:

Never hard-code yours secrets!

Store sensitive credentials (API keys, DB passwords, etc.) in secure vaults or key management services instead of plain text files.

Use environment variables or secured config files separated between

dev/test/prod.

Secure secrets storage:

Limit who/what can access secrets.

Apply the principle of least privilege so components only see the secrets they absolutely need.

Implement different user roles and regularly review who has access.

Least privilege access:

Huge advantage of Flet ->

Use tooling to scan and detect secrets in code and rotate any that leak.

Never commit secrets to version control

Rotate secrets

Change secrets periodically to reduce impact of leaks.

Implement secret rotation - ideally automated.

Log all secret access attempts for monitoring - detect outliers.

Authentication and authorization

Use well tested authentication frameworks (OAuth 2.0) instead of DIY.

Use standard protocols:

If managing passwords, always hash with a strong algorithm. (bcrypt)

Use salt to stop password crackers (John the Ripper).

Never store plaintext.

Enforce strong passwords and consider MFAs.

Prefer short-lived tokens for client-server auth.

Secure password handling:

Implement authorization checks on the server for every request.

RBAC - permissions based on user roles (easy to implement)
ABAC - permissions based on user attributes (more flexible, complex to build and maintain) - can utilize time, location, ...

Role-based access control:

Always use HTTPS to protect credentials in transit.

Validate all inputs to authentication flows (injection attacks).

Rate-limit authentication attempt detector (brute force).

Best practices:

Secure local storage

If your application stores sensitive data on the client (mobile device, browser, user's machine), use encryption.

Leverage OS-level keystores or encrypted databases so that data like tokens or personal info isn't just in plaintext on the device.

Encrypt data at rest:

Flet client storage:

Flet session storage:

Use platform best practices which often provide encryption and sandboxing by default.

MacOs - Keychain
Windows - Credential Manager / DPAPI
Linux - Secret Service / GNOME Keyring / libsecret
IOs - KeyChain
Android - KeyStore

Platform specific secure storage:

Use envelope encryption:

Generate a random DEK (data-encryption key) per secret, encrypt the secret with DEK, then wrap (encrypt) the DEK with a KEK (key-encryption key) stored in the OS keychain.

Accessibility

Use sufficient color contrast for

text - aim for at least 4.5:1.

Support resizing of at least up-to 200% without breaking the layout.

Ensure text readability:

Keyboard navigation:

All functionality must be accessible via keyboard alone. Controls should be focusable with TAB.

Avoid seizures:

Do not auto-play video or audio without user consent. May trigger seizure.

Accessible controls:

Interactive controls should be large enough.

22px for pointer targets
44px for touch targets

Clear error messages:

When users make mistakes (i.e. form errors), provide helpful feedback.

UI/UX Design best practices

What is wrong with the first design?

Try to achieve visual and functional consistency across your app.

Use familiar icons, terminology, and

layouts so user do not have to relearn

UI patterns on each screen.

Follow platform conventions.

Maintain consistency in navigation.

Consistency & Standards:

Always keep user informed of what is happening

Loading spinner
Confirmation message on success/failure
Visible system status
Progress bar

Never leave the user wondering if something is working.

Feedback and visibility:

Simplify interfaces by removing unnecessary complexity.

Reduce cognitive load by using clear information hierarchy and focusing on one primary action per screen if possible.

Favor recognition over recall - use menus or prompts rather than expecting users to remember commands.

A clean, uncluttered UI with plenty of whitespace can help users focus on the task.

Simplicity and minimalism:

Not all cultures are the same :)

Image source: www.scmp.com

European Alibaba app

European Aliexpress app

Czech Alza store app

Allow users to easily correct mistakes of change their mind.

Undo button or snackbar
Always offer an exit or back

User control:

Before they happen: input validation

After they happen:

Polite error message with guidance
Never blame the user:
"Your password is too simple" vs "Password must contain a number"

Handle errors:

?

What are we still missing?

Observability

Collect Logs, Metrics, Traces:

Three pillars of observability:

logs - record discrete events/errors
metrics - numeric KPIs (Key Performance Indicator) (latency, CPU usage, # of users)
traces - end-to-end request flows
- timestamp
- payload
- input node
- output node

Centralized monitoring:

You can use a centralized observability platform stack (i.e. Prometheus, cloud monitoring service, ...) to aggregate data from all

components/instances.

Alert on key indicators:

Critical tresholds (ML model latency) -> alert

Testing for ML Applications

Test the Data:

Include tests for dataset quality to ensure that no garbage is fed into your model. The same tests can be used for inference.

Inputs are within an expected range
There are no NaNs or unexpected categories
The file is correctly formatted (1/3/4 channel image, resolution, ...)

End-2-end tests:

Write tests that simulate a full flow with known input and verify the output.

Testing in production:

You can perform few user-based tests during deployment:

Canary tests - deploy to small subset of users first
A/B testing - compare new models to current ones

And of course unit tests :)

Python Performance and profiling

Test the Data:

Use profiling tools to find bottlenecks in your Python code. Don’t guess! Built-in modules like:

cProfile
profile

or third-party profilers:

pyinstrument (image source)
py-spy

can show which functions are consuming the most time.

Vectorization and optimization:

Leverage optimized libraries: NumPy, SciPy, PyTorch, pandas, ...

Avoid repetitive Python-loops by using vectorized operations.

-> also consider offloading them to the GPU :).

Consider caching repetitive and expensive operations.

Understand python's GIL (Global Interpreter Lock) - CPU-bound python code will not speed up with threads. For heavy-duty tasks, use multiprocessing (or C/C++).

Background Jobs & Queues

Test the Data:

Use background job processing for tasks that are too slow or not suitable to handle during a user request.

Examples:

training an ML model
sending emails
generating reports
lengthy computations

You can use queue managers:

Celery
RQ
...

Reliability:

Make sure the queue messages are stored persistently until processed.

Ensure the queue is configured to not loose tasks even if workers restart or crash.

It is common to have separate worker processes that pull tasks from the queue and execute them.

Monitor and timeout background jobs.

CI/CD

Automate build and test:

Continuous Integration (CI) so that whenever code is pushed, an automated pipeline runs your tests.
Continuous Delivery (CD) automates deployment, runs final tests, recomputes hashes, updates urls, ...

Quality gates - define the criteria, that your app needs to meet:

"All tests pass, code coverage >= 80%, no critical issues in static analysis, model's AUC (ROC curve) is above 50%."

Privacy, GDPR

Data minimization principle:

Only collect and retain personal data that is truly necessary.

Under GDPR, this is a legal requirement - you shouldn't hoard data "just in case". For example if age/gender is not needed for your ML model or features, do not ask for it. Less data == less risk.

Deleting user data:

You should scan for unused data after X months. You are obliged to delete user data if they ask you to do so (under GDPR's right to be forgotten).

Their data (including logs) must be removed.

This induces a "fun" challenge of deleting data from databases, which were never designed to do so. :)

PS: Due to automated "Right to be forgotten" services, you may run into more requests, than you'd expect.

Be transparent:

Be clear with users about what data you collect and why.

Obtain explicit consent for sensitive data.

Allow user's to opt out of data collection, that isn't essential.

Under GDPR you need:

consent
contract
legitimate interest
etc...

Privacy by design:

Include privacy measures into your system architecture:

Techniques include pseudonymizing or anonymizing data (i.e. use IDs instead of names)
Apply least privilege to data access
Consider using techniques like differential privacy (compute statistics on noisy data, thus protecting one's privacy)

Key GDPR principles:

Right to access - users can request their data
Right to rectification - users can correct data
Right to deletion

You must ensure, that any third-party services you use also comply with these requirements.

Minimize ML Model Data Leakage:

If you train a ML model on personal data, be cautious about what the model could reveal. Services like OpenAI may use your prompts for training!

Internationalization

Externalize GUI text:

Design your app for translation by extracting all user-facing strings into a resource file.

Time zones / numerics

Internally used your local / global time. Be sure to translate values.

Resilience engineering

Isolate failures:

Architect your system so that a failure of one component has a limited blast radius.

Timeouts and Retries:

Graceful degradation:

Have fallbacks in case of partial failure. (i.e. fallback server)

Never wait indefinitely. Set timeouts. Handle failures, attempt limited retries, alert administrator.

Chaos engineering:

Test the system with injected failures to test your safeguards.

Model Optimization

Quantization:

Convert model weights from high precision (32-bit floats) to a lower precission (8-bit integers) to reduce model size.

Knowledge Distillation:

Prunning:

Remove unnecessary weights or neurons (near-zero importance).

Train a smaller "student" model to replicate the "teacher" model. Aim: similar accuracy, fewer parameters.

Optimize architecture:

Simplify model architecture for faster inference.

LLM Specifics

Prompt injection safeguards:

Concatenate user input into system prompts. User must be clearly separated. Validate user input (malicious directives).

Manage long context and cost:

Content filtering:

Guardrail input and output for toxicity/bias/banned content.

Implement strategies to manage costs, summarize prior conversation. Use context window wisely.

Hallucination handling:

You can use Retrieval-Augmented Generation (RAG)

Implementation Details

Introduction + Agenda

Building End-2-End ML apps

Configuration And Secrets

Authentication and authorization

Secure local storage

Accessibility

UI/UX Design best practices

?

Observability

Testing for ML Applications

Python Performance and profiling

Background Jobs & Queues

CI/CD

Privacy, GDPR

Internationalization

Resilience engineering

Model Optimization

LLM Specifics

See you Next week :)

See you
Next week :)