Beyond PoCs

Building Production-Ready GenAI

on AWS

Ivan Casco - linktr.ee/icasco

COM202 - AWS Summit Amsterdam 2025

Over 97% have difficulties showing GenAI’s business value

67% couldn't move half of their GenAI pilots into production

Informatica’s CDO Insights 2025 survey

87% of organizations adopting GenAI expect
increased investment by 2025

Reliability & Trust

Reliability of the results
Lack of trust in data quality
Issues scaling in Production
Model drift

Security & Privacy

Responsible AI Usage
Cybersecurity threats
Data Privacy
Regulatory Compliance

Costs & ROI

Unclear ROI
High Infrastructure cost
Complex operations
Lack of Skills

88% of AI pilots fail to reach production (33 PoCs ➡ 4 Production)

IDC CIO Playbook 2025 Survey

About me - Ivan Casco

🇪🇸 ✈️ 🇮🇪
Principal Solutions Architect
@ StratusGrid
10+yr building Cloud Solutions

All opinions expressed are my own.

Real-World Applications of AI

Intelligent Agents
Problem Solving
Knowledge, reasoning, and planning
Uncertainty
Learning
Communicating, perceiving, and acting

⚠️ 1152 pages

Agenda

Can GenAI Help Solve My Problem?

From PoC to Production: Blueprint

Best Practices and Patterns

Can GenAI Help Solve My Problem?

Feeling Overwhelmed?

GenAI can help!

AI Powered Efficiency

Automated Categorization
Intelligent Summaries
Draft Replies

Saves me over 5 hours a week on average!

〞

A problem well-stated is a problem half-solved.

– Charles Kettering
Head of Research at General Motors from 1920-1947

Goals
Desired Outcomes
Domain Complexity
Data Requirements
Resource Constraints

Understanding
the Problem

One Component at a Time

Classify in many categories
- Needs Reply
- Notifications
- Marketing
Summarize multiple emails in timeframe
Draft a reply
- Sounds like the user

Breaking down the Features

Open-ended text understanding
Complex classification
Generative responses
Adaptive and Extensible System

GenAI

Well-defined rules
Low error tolerance
Prediction/forecasting

Something else

Is GenAI the right tool?

Quick Prototype in Playground

Evaluate the Results

It did work...

right?

From PoC to Production: Blueprint

GenAI App
Development

Collect Data
- 100,000 emails
Data Pre-Processing
- Remove duplicates
- Fill missing values
- Label data
Analyze
- Data quality metrics
- Validation

Data Processing

Choosing a Model

Claude 3.7 on Amazon Bedrock

Hybrid Reasoning: standard mode and reasoning.
Adjustable Reasoning Budget: more control.
Longer Outputs: Up to 128K tokens, 15x > than 3.5 Sonnet
State-of-the Art for Coding: 70.3% SWE-bench Verified

Available in the US (N. Virginia, Ohio, Oregon) and the EU (Ireland, Frankfurt, Paris, Stockholm)

I show a video with Nova Micro with unoptimized prompt, then optimized, compare results

Amazon Nova: Best Price Performance

Model Pricing Overview

Model	Cost 1M Input Tokens	Cost 1M Output Tokens
Claude 3.7	$3.00	$15.00
Nova Lite	$0.06	$0.24
Nova Micro	$0.035	$0.140

Prompting Demo Results

Model	Tokens	Cost	Latency ms
Claude 3.7 + Reasoning	656	$0.024798	6,541
Claude 3.7	322	$0.006747	664
Nova Lite	298	$0.000080	144
Nova Micro	298	$0.000046	131

Prompting Demo Results

Model	Tokens	Cost / 100K	Cum. Latency
Claude 3.7 + Reasoning	65.6 M	$570.00	7.57 days
Claude 3.7	32.2 M	$102.60	18.4 hours
Nova Lite	29.8 M	$1.806	4 hours
Nova Micro	29.8 M	$1.054	3.63 hours

Imagine we have 100,000 emails to process

Prompting Demo Results

Model	Tokens	Cost / 100K	Cum. Latency
Claude 3.7 + Reasoning	65.6 M	$570.00	7.57 days
Claude 3.7	32.2 M	$102.60	18.4 hours
Nova Lite	29.8 M	$1.806	4 hours
Nova Micro	29.8 M	$1.054	3.63 hours

Nova Micro vs Claude 3.7 + R
540x Cheaper

50x Faster

Nova Micro vs Nova Lite
1.7x Cheaper

10% Faster

Check model cards
Run evaluations and benchmarks
Consider operational constraints

Do

Pick based on hype
Neglect data privacy and regulatory reqs
Use just one for all problems
Start with fine-tuning

Don't

Choosing the right Model

Zero-shot: perform a task with no examples
Few-shot: give a few examples
Chain-of-Thought (CoT): intermediate reasoning steps or ask to think step by step
Meta Prompting:
- High level instructions
- Provide structure
- Abstract examples

Prompting Techniques

Prompt Optimization

XML Syntax Delimiters
Be Specific
Provide Context
- Get in the LLM shoes
Keep it simple
Handle edge cases in Guardrails
Don't forget Evals!

Always Perform Evaluations

Key Success Metrics
Test Driven Development
- Models
- Prompts
- Retrieval-Augmented Generation
- Agents
Automated or Humans

Security and Guardrails

Security

Red-Teaming LLMs
- Privacy violations
- Prompt injections
- Jailbreaking
- Unwanted content
Remember best practices:
- OWASP LLM Top 10
- Secure coding, input validation
- Continuous testing and monitoring

Guardrails

Check every output
Toxicity: Content filter
Data Leakage: Sensitive information (PII)
Hallucination:
- Grounding
- Relevance
- Automated Reasoning checks
Edge cases
Test the guardrails too, Evals!

GenAI Observability

Real-time monitoring
Quality metrics & Guardrails
Logging
End to end Tracing
Alerting
Add metadata!
- Version / Commit

Feedback Loop

User Input & Feedback

Prompt optimization
Fine-tuning
Data augmentation

Refinement

Evaluation & Benchmark

Deployment

Version Control
- Pin the versions!
Continuous Integration
- Automated testing
Continuous Deployment
Infrastructure as Code
Human in the Loop

GenAI Ops

Best Practices and Patterns

https://www.anthropic.com/research/building-effective-agents

Building Effective Agents

Make sure you're solving a real business need
Break down the problem first, tackle one component at a time
Start simple, then move to more complex
Make sure you use the right tool for the job
Evals, Evals, Evals: scientific rigor & engineering excellence
Quality Data & Feedback loop are the foundation

My GenAI Principles

〞

AI is the new electricity and will transform and improve nearly all areas of human lives.

– Dr. Andrew Ng
DeepLearning.AI

AWS Community

Thank you for coming, get in touch!

I’m always happy to connect with the AWS community, chat about new ideas, and offer guidance. Got feedback? Questions? Or just want to say hi? I'm down for a matcha latte anytime!

Ivan Casco
linktr.ee/icasco

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

By Ivan Casco

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

Over 80% of AI projects fail and never reach production at scale. Break the cycle with real-world tactics and patterns for deploying generative AI solutions on AWS. In this session, discover where GenAI shines (and doesn't), learn prompting techniques, how to leverage Amazon Bedrock, and choose the right approach to solve your business problem. We'll highlight key architecture decisions and how to avoid common pitfalls around cost optimization, scaling, and reliability drawn from real-world deployments. Walk away with practical techniques to design resilient, cost-effective architectures and a clear roadmap for transforming POCs into production-ready GenAI solutions on AWS that scale.

Ivan Casco PRO

AWS Community Builder, Ambassador, and 14x AWS certified with 10+ years in startups, banking, media, and consulting. I specialize in high-scale solutions that balance innovation and business needs and share insights on Cloud, DevOps, and GenAI.

Beyond PoCs

Reliability & Trust

Security & Privacy

Costs & ROI

About me - Ivan Casco

Agenda

Can GenAI Help Solve My Problem?

Feeling Overwhelmed?

GenAI can help!

AI Powered Efficiency

〞

Understanding the Problem

One Component at a Time

Breaking down the Features

GenAI

Something else

Is GenAI the right tool?

Quick Prototype in Playground

Quick Prototype in Playground

Evaluate the Results

From PoC to Production: Blueprint

Data Processing

Choosing a Model

Claude 3.7 on Amazon Bedrock

Amazon Nova: Best Price Performance

Model Pricing Overview

Prompting Demo Results

Prompting Demo Results

Prompting Demo Results

Do

Don't

Choosing the right Model

Prompting Techniques

Prompt Optimization

Always Perform Evaluations

Security and Guardrails

Security

Guardrails

GenAI Observability

Feedback Loop

User Input & Feedback

Refinement

Evaluation & Benchmark

Deployment

GenAI Ops

Best Practices and Patterns

Building Effective Agents

My GenAI Principles

〞

AWS Community

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

Ivan Casco PRO

Understanding
the Problem