GENAI: PROTOTYPE TO PRODUCTION
Chris Price & Fanis Vlachos
A Typical GenAI Project
(Nightmare)





Problem statement
LLM
Booking data extractor
Email sender


Booking data extractor
Email Sender
LLM
+
Evaluation Strategy
Evaluation Criteria
Start building
LLM
Booking data extractor
Email Sender

Breaking down complexity

LLM
Generate itinerary
Draft
Systematic evaluation
Test-driven prompt engineering
Non deterministic
Promptfoo
prompts: 'Create a trip itinerary using the following details: {{trip_plans}}. The itinerary should be formatted as a JSON string.'
providers:
- openai:gpt-4
tests:
- description: "Output is a valid JSON string"
vars:
trip_plans: |
{
"name": "John Doe",
"destination": "New York",
"arrival": "2024-08-01",
"departure": "2024-08-04",
}
assert:
- type: is-json
prompts:
- "You are working for a travel agency. Based on this trip itinerary {{itinerary}}, write an email to the customer, presenting the suggested trip plan. The items on the itinerary should be formatted in a list. With each item on the list, invite the customer to use our booking services."
# ...
tests:
- description: "Booking services are suggested for each item on the list"
assert:
- type: llm-rubric
value: "Does each item on the list promote the booking services?"
- description: "All items from the itinerary are included in the email"
assert:
- type: javascript
value: file://./emailContainsAllItems.js
Other Benefits of promptfoo
- Caching
- Cost-tracking
- Local history
- Concurrency
- CI/CD integration
Conclusion
The QA
- "Shift-left"
The Product Owner
- Minimise scope creep
- Report transparently
The Dev
- Tackle complexity early
- Break down complexity
- Automate testing
Succeeding with GenAI
- Define an evaluation strategy
- Focus on the GenAI part
- Utilise tools like promptfoo
GENAI: PROTOTYPE TO PRODUCTION
Chris Price & Fanis Vlachos
GenAI: Prototype to Production
By Chris Price
GenAI: Prototype to Production
- 127