GENAI: PROTOTYPE TO PRODUCTION

Chris Price & Fanis Vlachos

A Typical GenAI Project

(Nightmare)

Problem statement

LLM

Booking data extractor

Email sender

Booking data extractor

Email Sender

LLM

+

Evaluation Strategy

Evaluation Criteria

Start building

LLM

Booking data extractor

Email Sender

Breaking down complexity

LLM

Generate itinerary

Draft

email

Systematic evaluation

Test-driven prompt engineering

Non deterministic

Promptfoo

prompts: 'Create a trip itinerary using the following details: {{trip_plans}}. The itinerary should be formatted as a JSON string.'

providers: 
  - openai:gpt-4

tests:
  - description: "Output is a valid JSON string"
      vars:
        trip_plans: |
          {
            "name": "John Doe",
            "destination": "New York",
            "arrival": "2024-08-01",
            "departure": "2024-08-04",
          }
    assert:
      - type: is-json

prompts:
  - "You are working for a travel agency. Based on this trip itinerary  {{itinerary}}, write an email to the customer, presenting the suggested trip plan. The items on the itinerary should be formatted in a list. With each item on the list, invite the customer to use our booking services."

# ...

tests:
  - description: "Booking services are suggested for each item on the list"
    assert:
      - type: llm-rubric
        value: "Does each item on the list promote the booking services?"

  - description: "All items from the itinerary are included in the email"
    assert:
      - type: javascript
        value: file://./emailContainsAllItems.js

Other Benefits of promptfoo

  • Caching
  • Cost-tracking
  • Local history
  • Concurrency
  • CI/CD integration

Conclusion

The QA

  • "Shift-left"

The Product Owner

  • Minimise scope creep
  • Report transparently

The Dev

  • Tackle complexity early
  • Break down complexity
  • Automate testing

Succeeding with GenAI

  • Define an evaluation strategy
  • Focus on the GenAI part
  • Utilise tools like promptfoo

GENAI: PROTOTYPE TO PRODUCTION

Chris Price & Fanis Vlachos