DS Ops day Nov 2024
Go drink too much and have fun
Recall:
That means we have to inject information the LLM doesn't know if we want it to do well
In AVA right now we use "few shot learning"
Instead inject only relevant examples
Now decoupled, can add/update knowledge at any time
How do we pick what the right examples are?
Lots of ways to do this:
Idea is you have an embedding model which:
quick demo
Pros:
Cons:
How should we evaluate this?
We should evaluate end-to-end, this is what any user sees
We also want to evaluate the components, this is how we spot weak points/improve:
Our systems use code to decide what to do when, they use LLMs as a component in a system
We can invert this and give an LLM system "agency", ie control over what is called when - these are Agents
Agents use LLMs to make decisions including what LLMs or code to call
Typically you chain these together into a graph of Agents ("Agentic")
Let's dive into tool use
With Agents we can call functions (tools) directly, the LLM decides when it needs to do this to complete it's task
note: you don't need Agents to call tools, some LLMs support this directly, but Langchain (and others) do this and more for you
quick demo
How to evaluate?
End to end
Tool use:
Idea is to create a graph of agents which talk to one another to solve problems
These can be cyclic, so if the result isn't good enough it can dynamically retry
Agents typically have access to tools (eg we can build them to call KAPI) and RAG
End to end becomes very important
Agents can and should be evaluated individually
Miro