Beyond PoCs

Building Production-Ready GenAI

on AWS

Ivan Casco - linktr.ee/icasco

COM202 - AWS Summit Amsterdam 2025

Over 97% have difficulties showing GenAI’s business value

67% couldn't move half of their GenAI pilots into production

Informatica’s CDO Insights 2025 survey

87% of organizations adopting GenAI expect
increased investment by 2025

Reliability & Trust

  • Reliability of the results
  • Lack of trust in data quality
  • Issues scaling in Production
  • Model drift

Security & Privacy

  • Responsible AI Usage
  • Cybersecurity threats
  • Data Privacy
  • Regulatory Compliance

Costs & ROI

  • Unclear ROI
  • High Infrastructure cost
  • Complex operations
  • Lack of Skills

88% of AI pilots fail to reach production (33 PoCs  4 Production) 

IDC CIO Playbook 2025 Survey

About me - Ivan Casco

  • 🇪🇸 ✈️ 🇮🇪

  • Principal Solutions Architect
    @ StratusGrid

  • 10+yr building Cloud Solutions

All opinions expressed are my own.

Real-World Applications of AI

  • Intelligent Agents
  • Problem Solving
  • Knowledge, reasoning, and planning
  • Uncertainty
  • Learning
  • Communicating, perceiving, and acting

⚠️ 1152 pages

Agenda

Can GenAI Help Solve My Problem?

Feeling Overwhelmed?

GenAI can help!

AI Powered Efficiency

  • Automated Categorization
  • Intelligent Summaries
  • Draft Replies

Saves me over 5 hours a week on average!

A problem well-stated is a problem half-solved.

– Charles Kettering
Head of Research at General Motors from 1920-1947

  • Goals

  • Desired Outcomes

  • Domain Complexity

  • Data Requirements

  • Resource Constraints

Understanding
the Problem

One Component at a Time

  • Classify in many categories

    • Needs Reply

    • Notifications

    • Marketing

  • Summarize multiple emails in timeframe

  • Draft a reply

    • Sounds like the user

Breaking down the Features

  • Open-ended text understanding

  • Complex classification

  • Generative responses

  • Adaptive and Extensible System

GenAI

  • Well-defined rules
  • Low error tolerance
  • Prediction/forecasting

Something else

Is GenAI the right tool?

Quick Prototype in Playground

Quick Prototype in Playground

Evaluate the Results

It did work...

right?

From PoC to Production:  Blueprint

GenAI App
Development

  • Collect Data

    • ​100,000 emails

  • Data Pre-Processing

    • ​Remove duplicates

    • Fill missing values

    • Label data

  • Analyze

    • Data quality metrics

    • Validation

Data Processing

Choosing a Model

Claude 3.7 on Amazon Bedrock

  • Hybrid Reasoning: standard mode and reasoning.
  • Adjustable Reasoning Budget: more control.
  • Longer Outputs: Up to 128K tokens, 15x > than 3.5 Sonnet
  • State-of-the Art for Coding: 70.3% SWE-bench Verified

Available in the US (N. Virginia, Ohio, Oregon) and the EU (Ireland, Frankfurt, Paris, Stockholm)

I show a video with Nova Micro with unoptimized prompt, then optimized, compare results

Amazon Nova: Best Price Performance

Model Pricing Overview

Model Cost 1M Input Tokens Cost 1M Output Tokens
Claude 3.7 $3.00 $15.00
Nova Lite $0.06 $0.24
Nova Micro $0.035 $0.140

Prompting Demo Results

Model Tokens Cost Latency ms
Claude 3.7 + Reasoning 656 $0.024798 6,541
Claude 3.7 322 $0.006747 664
Nova Lite 298 $0.000080 144
Nova Micro 298 $0.000046 131

Prompting Demo Results

Model Tokens Cost / 100K Cum. Latency
Claude 3.7 + Reasoning 65.6 M $570.00 7.57 days
Claude 3.7 32.2 M $102.60 18.4 hours
Nova Lite 29.8 M $1.806 4 hours
Nova Micro 29.8 M $1.054 3.63 hours

Imagine we have 100,000 emails to process

Prompting Demo Results

Model Tokens Cost / 100K Cum. Latency
Claude 3.7 + Reasoning 65.6 M $570.00 7.57 days
Claude 3.7 32.2 M $102.60 18.4 hours
Nova Lite 29.8 M $1.806 4 hours
Nova Micro 29.8 M $1.054 3.63 hours

Nova Micro vs Claude 3.7 + R
540x
 Cheaper

50x Faster

Nova Micro vs Nova Lite
1.7x
 Cheaper

10% Faster

  • Check model cards

  • Run evaluations and benchmarks

  • Consider operational constraints

Do

  • Pick based on hype
  • Neglect data privacy and regulatory reqs
  • Use just one for all problems
  • Start with fine-tuning

Don't

Choosing the right Model

  • Zero-shot: perform a task with no examples
  • Few-shot: give a few examples
  • Chain-of-Thought (CoT): intermediate reasoning steps or ask to think step by step
  • Meta Prompting:
    • High level instructions
    • Provide structure
    • Abstract examples

Prompting Techniques

Prompt Optimization

  • XML Syntax Delimiters 
  • Be Specific
  • Provide Context
    • Get in the LLM shoes
  • Keep it simple
  • Handle edge cases in Guardrails
  • Don't forget Evals!

Always Perform Evaluations

  • Key Success Metrics
  • Test Driven Development
    • Models
    • Prompts
    • Retrieval-Augmented Generation
    • Agents
  • Automated or Humans

Security and Guardrails

Security

  • Red-Teaming LLMs
    • Privacy violations
    • Prompt injections
    • Jailbreaking
    • Unwanted content
  • Remember best practices:
    • OWASP LLM Top 10
    • Secure coding, input validation
    • Continuous testing and monitoring

Guardrails

  • Check every output
  • Toxicity: Content filter
  • Data Leakage: Sensitive information (PII)
  • Hallucination:
    • Grounding
    • Relevance
    • Automated Reasoning checks
  • Edge cases
  • Test the guardrails too, Evals!

GenAI Observability

  • Real-time monitoring
  • Quality metrics & Guardrails
  • Logging
  • End to end Tracing
  • Alerting
  • Add metadata!
    • Version / Commit

Feedback Loop

User Input & Feedback

  • Prompt optimization
  • Fine-tuning
  • Data augmentation

Refinement

Evaluation & Benchmark

Deployment

  • Version Control
    • Pin the versions!
  • Continuous Integration
    • Automated testing
  • Continuous Deployment
  • Infrastructure as Code
  • Human in the Loop

GenAI Ops

Best Practices and Patterns

https://www.anthropic.com/research/building-effective-agents

Building Effective Agents

  1. Make sure you're solving a real business need
  2. Break down the problem first, tackle one component at a time
  3. Start simple, then move to more complex
  4. Make sure you use the right tool for the job
  5. Evals, Evals, Evals: scientific rigor & engineering excellence
  6. Quality Data & Feedback loop are the foundation

My GenAI Principles

AI is the new electricity and will transform and improve nearly all areas of human lives.

– Dr. Andrew Ng
DeepLearning.AI

AWS Community

Thank you for coming, get in touch!

I’m always happy to connect with the AWS community, chat about new ideas, and offer guidance. Got feedback? Questions? Or just want to say hi? I'm down for a matcha latte anytime!

Ivan Casco
linktr.ee/icasco

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

By Ivan Casco

Beyond PoCs: Building Production-Ready GenAI on AWS - AWS Summit Amsterdam 2025 - COM202

Over 80% of AI projects fail and never reach production at scale. Break the cycle with real-world tactics and patterns for deploying generative AI solutions on AWS. In this session, discover where GenAI shines (and doesn't), learn prompting techniques, how to leverage Amazon Bedrock, and choose the right approach to solve your business problem. We'll highlight key architecture decisions and how to avoid common pitfalls around cost optimization, scaling, and reliability drawn from real-world deployments. Walk away with practical techniques to design resilient, cost-effective architectures and a clear roadmap for transforming POCs into production-ready GenAI solutions on AWS that scale.

  • 261