Building Production-Ready GenAI
on AWS
Ivan Casco - linktr.ee/icasco
COM202 - AWS Summit Amsterdam 2025
Over 97% have difficulties showing GenAI’s business value
67% couldn't move half of their GenAI pilots into production
Informatica’s CDO Insights 2025 survey
87% of organizations adopting GenAI expect
increased investment by 2025
88% of AI pilots fail to reach production (33 PoCs ➡ 4 Production)
IDC CIO Playbook 2025 Survey
🇪🇸 ✈️ 🇮🇪
Principal Solutions Architect
@ StratusGrid
10+yr building Cloud Solutions
All opinions expressed are my own.
Real-World Applications of AI
⚠️ 1152 pages
Saves me over 5 hours a week on average!
A problem well-stated is a problem half-solved.
– Charles Kettering
Head of Research at General Motors from 1920-1947
Goals
Desired Outcomes
Domain Complexity
Data Requirements
Resource Constraints
Classify in many categories
Needs Reply
Notifications
Marketing
Summarize multiple emails in timeframe
Draft a reply
Sounds like the user
Open-ended text understanding
Complex classification
Generative responses
Adaptive and Extensible System
It did work...
right?
GenAI App
Development
Collect Data
100,000 emails
Data Pre-Processing
Remove duplicates
Fill missing values
Label data
Analyze
Data quality metrics
Validation
Available in the US (N. Virginia, Ohio, Oregon) and the EU (Ireland, Frankfurt, Paris, Stockholm)
I show a video with Nova Micro with unoptimized prompt, then optimized, compare results
| Model | Cost 1M Input Tokens | Cost 1M Output Tokens |
|---|---|---|
| Claude 3.7 | $3.00 | $15.00 |
| Nova Lite | $0.06 | $0.24 |
| Nova Micro | $0.035 | $0.140 |
| Model | Tokens | Cost | Latency ms |
|---|---|---|---|
| Claude 3.7 + Reasoning | 656 | $0.024798 | 6,541 |
| Claude 3.7 | 322 | $0.006747 | 664 |
| Nova Lite | 298 | $0.000080 | 144 |
| Nova Micro | 298 | $0.000046 | 131 |
| Model | Tokens | Cost / 100K | Cum. Latency |
|---|---|---|---|
| Claude 3.7 + Reasoning | 65.6 M | $570.00 | 7.57 days |
| Claude 3.7 | 32.2 M | $102.60 | 18.4 hours |
| Nova Lite | 29.8 M | $1.806 | 4 hours |
| Nova Micro | 29.8 M | $1.054 | 3.63 hours |
Imagine we have 100,000 emails to process
| Model | Tokens | Cost / 100K | Cum. Latency |
|---|---|---|---|
| Claude 3.7 + Reasoning | 65.6 M | $570.00 | 7.57 days |
| Claude 3.7 | 32.2 M | $102.60 | 18.4 hours |
| Nova Lite | 29.8 M | $1.806 | 4 hours |
| Nova Micro | 29.8 M | $1.054 | 3.63 hours |
Nova Micro vs Claude 3.7 + R
540x Cheaper
50x Faster
Nova Micro vs Nova Lite
1.7x Cheaper
10% Faster
Check model cards
Run evaluations and benchmarks
Consider operational constraints
https://www.anthropic.com/research/building-effective-agents
AI is the new electricity and will transform and improve nearly all areas of human lives.
– Dr. Andrew Ng
DeepLearning.AI
Thank you for coming, get in touch!
I’m always happy to connect with the AWS community, chat about new ideas, and offer guidance. Got feedback? Questions? Or just want to say hi? I'm down for a matcha latte anytime!
Ivan Casco
linktr.ee/icasco