LAB
BECM33MLE - Ing. David Pařil
Your projects are not just about the models. Today we will focus on implementation details, that may bring you a step further to production release.
Configuration
Secrets
Auth
Storage
Accessibility
UI/UX design
Observability
Performance tests
CI/CD
Privacy/GDPR
Resilience
LLM implementation
Do not hard-code configurations or secrets in your source code.
Externalize configs (e.g. use environment variables or config files) and load them at runtime.
This makes an application more portable and prevents leaking of sensitive data. Make sure to use versioning in your configs.
Separate config from code:
Never hard-code yours secrets!
Store sensitive credentials (API keys, DB passwords, etc.) in secure vaults or key management services instead of plain text files.
Use environment variables or secured config files separated between
dev/test/prod.
Secure secrets storage:
Limit who/what can access secrets.
Apply the principle of least privilege so components only see the secrets they absolutely need.
Implement different user roles and regularly review who has access.
Least privilege access:
Huge advantage of Flet ->
Use tooling to scan and detect secrets in code and rotate any that leak.
Never commit secrets to version control
Rotate secrets
Change secrets periodically to reduce impact of leaks.
Implement secret rotation - ideally automated.
Log all secret access attempts for monitoring - detect outliers.
Use well tested authentication frameworks (OAuth 2.0) instead of DIY.
Use standard protocols:
If managing passwords, always hash with a strong algorithm. (bcrypt)
Use salt to stop password crackers (John the Ripper).
Never store plaintext.
Enforce strong passwords and consider MFAs.
Prefer short-lived tokens for client-server auth.
Secure password handling:
Implement authorization checks on the server for every request.
Role-based access control:
Always use HTTPS to protect credentials in transit.
Validate all inputs to authentication flows (injection attacks).
Rate-limit authentication attempt detector (brute force).
Best practices:
If your application stores sensitive data on the client (mobile device, browser, user's machine), use encryption.
Leverage OS-level keystores or encrypted databases so that data like tokens or personal info isn't just in plaintext on the device.
Encrypt data at rest:
Flet client storage:
Flet session storage:
Use platform best practices which often provide encryption and sandboxing by default.
Platform specific secure storage:
Use envelope encryption:
Generate a random DEK (data-encryption key) per secret, encrypt the secret with DEK, then wrap (encrypt) the DEK with a KEK (key-encryption key) stored in the OS keychain.
Use sufficient color contrast for
text - aim for at least 4.5:1.
Support resizing of at least up-to 200% without breaking the layout.
Ensure text readability:
Keyboard navigation:
All functionality must be accessible via keyboard alone. Controls should be focusable with TAB.
Avoid seizures:
Do not auto-play video or audio without user consent. May trigger seizure.
Accessible controls:
Interactive controls should be large enough.
Clear error messages:
When users make mistakes (i.e. form errors), provide helpful feedback.
What is wrong with the first design?
What is wrong with the first design?
Try to achieve visual and functional consistency across your app.
Use familiar icons, terminology, and
layouts so user do not have to relearn
UI patterns on each screen.
Follow platform conventions.
Maintain consistency in navigation.
Consistency & Standards:
Always keep user informed of what is happening
Never leave the user wondering if something is working.
Feedback and visibility:
Simplify interfaces by removing unnecessary complexity.
Reduce cognitive load by using clear information hierarchy and focusing on one primary action per screen if possible.
Favor recognition over recall - use menus or prompts rather than expecting users to remember commands.
A clean, uncluttered UI with plenty of whitespace can help users focus on the task.
Simplicity and minimalism:
Not all cultures are the same :)
Image source: www.scmp.com
European Alibaba app
European Aliexpress app
Czech Alza store app
Allow users to easily correct mistakes of change their mind.
User control:
Before they happen: input validation
After they happen:
Handle errors:
What are we still missing?
Collect Logs, Metrics, Traces:
Three pillars of observability:
Centralized monitoring:
You can use a centralized observability platform stack (i.e. Prometheus, cloud monitoring service, ...) to aggregate data from all
components/instances.
Alert on key indicators:
Critical tresholds (ML model latency) -> alert
Test the Data:
Include tests for dataset quality to ensure that no garbage is fed into your model. The same tests can be used for inference.
End-2-end tests:
Write tests that simulate a full flow with known input and verify the output.
Testing in production:
You can perform few user-based tests during deployment:
And of course unit tests :)
Test the Data:
Use profiling tools to find bottlenecks in your Python code. Don’t guess! Built-in modules like:
or third-party profilers:
can show which functions are consuming the most time.
Vectorization and optimization:
Leverage optimized libraries: NumPy, SciPy, PyTorch, pandas, ...
Avoid repetitive Python-loops by using vectorized operations.
-> also consider offloading them to the GPU :).
Consider caching repetitive and expensive operations.
Understand python's GIL (Global Interpreter Lock) - CPU-bound python code will not speed up with threads. For heavy-duty tasks, use multiprocessing (or C/C++).
Test the Data:
Use background job processing for tasks that are too slow or not suitable to handle during a user request.
Examples:
You can use queue managers:
Reliability:
Make sure the queue messages are stored persistently until processed.
Ensure the queue is configured to not loose tasks even if workers restart or crash.
It is common to have separate worker processes that pull tasks from the queue and execute them.
Monitor and timeout background jobs.
Automate build and test:
Quality gates - define the criteria, that your app needs to meet:
"All tests pass, code coverage >= 80%, no critical issues in static analysis, model's AUC (ROC curve) is above 50%."
Data minimization principle:
Only collect and retain personal data that is truly necessary.
Under GDPR, this is a legal requirement - you shouldn't hoard data "just in case". For example if age/gender is not needed for your ML model or features, do not ask for it. Less data == less risk.
Deleting user data:
You should scan for unused data after X months. You are obliged to delete user data if they ask you to do so (under GDPR's right to be forgotten).
Their data (including logs) must be removed.
This induces a "fun" challenge of deleting data from databases, which were never designed to do so. :)
PS: Due to automated "Right to be forgotten" services, you may run into more requests, than you'd expect.
Be transparent:
Be clear with users about what data you collect and why.
Obtain explicit consent for sensitive data.
Allow user's to opt out of data collection, that isn't essential.
Under GDPR you need:
Privacy by design:
Include privacy measures into your system architecture:
Key GDPR principles:
You must ensure, that any third-party services you use also comply with these requirements.
Minimize ML Model Data Leakage:
If you train a ML model on personal data, be cautious about what the model could reveal. Services like OpenAI may use your prompts for training!
Externalize GUI text:
Design your app for translation by extracting all user-facing strings into a resource file.
Time zones / numerics
Internally used your local / global time. Be sure to translate values.
Isolate failures:
Architect your system so that a failure of one component has a limited blast radius.
Timeouts and Retries:
Graceful degradation:
Have fallbacks in case of partial failure. (i.e. fallback server)
Never wait indefinitely. Set timeouts. Handle failures, attempt limited retries, alert administrator.
Chaos engineering:
Test the system with injected failures to test your safeguards.
Quantization:
Convert model weights from high precision (32-bit floats) to a lower precission (8-bit integers) to reduce model size.
Knowledge Distillation:
Prunning:
Remove unnecessary weights or neurons (near-zero importance).
Train a smaller "student" model to replicate the "teacher" model. Aim: similar accuracy, fewer parameters.
Optimize architecture:
Simplify model architecture for faster inference.
Prompt injection safeguards:
Concatenate user input into system prompts. User must be clearly separated. Validate user input (malicious directives).
Manage long context and cost:
Content filtering:
Guardrail input and output for toxicity/bias/banned content.
Implement strategies to manage costs, summarize prior conversation. Use context window wisely.
Hallucination handling:
You can use Retrieval-Augmented Generation (RAG)