RAG in 2024:

Advancing to Agents

2024-10-30 ODSC West

What are we talking about?

  • What is LlamaIndex
  • Why you should use it
  • What can it do
    • Retrieval augmented generation
    • World class parsing
    • Agents and multi-agent systems

What is LlamaIndex?

Python: docs.llamaindex.ai

TypeScript: ts.llamaindex.ai

LlamaParse

World's best parser of complex documents

Free for 1000 pages/day!

cloud.llamaindex.ai

LlamaCloud

Turn-key RAG API for Enterprises

Available as SaaS or private cloud deployment

1. Sign up:

cloud.llamaindex.ai

2. Get on the waitlist!

bit.ly/llamacloud-waitlist

LlamaHub

llamahub.ai

  • Data loaders
  • Embedding models
  • Vector stores
  • LLMs
  • Agent tools
  • Pre-built strategies
  • More!

Why LlamaIndex?

  • Build faster
  • Skip the boilerplate
  • Avoid early pitfalls
  • Get best practices for free
  • Go from prototype to production

What can LlamaIndex

do for me?

Why RAG

is necessary

How RAG works

Basic RAG pipeline

5 line starter

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("What did the author do growing up?")

print(response)

npx create-llama

Limitations of RAG

  1. Summarization
  2. Comparison
  3. Multi-part questions

Naive RAG failure points:

RAG is necessary

but not sufficient

Two ways

to improve RAG:

  1. Improve your data
  2. Improve your querying

What is an agent anyway?

  • Semi-autonomous software
  • Accepts a goal
  • Uses tools to achieve that goal
  • Exact steps to resolution not specified

RAG pipeline

⚠️ Single-shot
⚠️ No query understanding/planning
⚠️ No tool use
⚠️ No reflection, error correction
⚠️ No memory (stateless)

Agentic RAG

✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization

From simple to advanced agents

Routing

Conversation memory

Query planning

Tool use

Tools unleash the power of LLMs

Combine agentic strategies

and then go further

  • Routing
  • Memory
  • Planning
  • Tool use

Agentic strategies

  • Multi-turn
  • Reasoning
  • Reflection

Full agent

3 agent

reasoning loops

  1. Sequential
  2. DAG-based
  3. Tree-based

Sequential reasoning

DAG-based reasoning

Self reflection

Tree-based reasoning

Exploration vs exploitation

Workflows

Why workflows?

Workflows primer

from llama_index.llms.openai import OpenAI

class OpenAIGenerator(Workflow):
    @step()
    async def generate(self, ev: StartEvent) -> StopEvent:
        query = ev.get("query")
        llm = OpenAI()
        response = await llm.acomplete(query)
        return StopEvent(result=str(response))

w = OpenAIGenerator(timeout=10, verbose=False)
result = await w.run(query="What's LlamaIndex?")
print(result)

Visualization

draw_all_possible_flows()

Workflows enable arbitrarily complex applications

Multi-agent concierge

Deploying agents

to production

pip install llama-deploy

Agents as microservices

Try out llama-deploy

Recap

  • What is LlamaIndex
    • LlamaCloud, LlamaHub, create-llama
  • Why RAG is necessary
  • How to build RAG in LlamaIndex
  • Limitations of RAG
  • Agentic RAG
    • Routing, memory, planning, tool use
  • Reasoning patterns
    • Sequential, DAG-based, tree-based
  • Workflows
    • Loops, state, customizability
  • Deploying workflows

What's next?

Thanks!

Follow me on BlueSky:

@seldo.com

RAG in 2024: Advancing to Agents (ODSC West)

By seldo

RAG in 2024: Advancing to Agents (ODSC West)

  • 391