Advanced RAG Techniques with LlamaIndex

2024-05-08 Pulumi Webinar

What are we talking about?

What is RAG?
What is LlamaIndex
The stages of RAG
- Ingestion
- Indexing
- Storing
- Querying
Advanced querying strategies x7
Getting into production

RAG recap

Retrieve most relevant data
Augment query with context
Generate response

A solution to limited context windows

You have to be selective

and that's tricky

Accuracy

RAG challenges:

Faithfulness

RAG challenges:

Recency

RAG challenges:

Provenance

RAG challenges:

How do we do RAG?

1. Keyword search

How do we do RAG?

2. Structured queries

How do we do RAG?

3. Vector search

Vector embeddings

Turning words into numbers

Search by meaning

What is LlamaIndex?

llamaindex.ai

OSS libraries in Python and TypeScript
LlamaParse - PDF parsing as a service
LlamaCloud - managed ingestion service

Supported LLMs

5 line starter

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What's up?")
print(response)

LlamaHub

llamahub.ai

LlamaParse

part of LlamaCloud

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

parser = LlamaParse(
    result_type="markdown"
)

file_extractor = {".pdf": parser}
reader = SimpleDirectoryReader(
  "./data", 
  file_extractor=file_extractor
)
documents = reader.load_data()

cloud.llamaindex.ai

Supported embedding models

OpenAI
Langchain
CohereAI
Qdrant FastEmbed
Gradient
Azure OpenAI

Elasticsearch
Clarifai
LLMRails
Google PaLM
Jina
Voyage

...plus everything on Hugging Face!

Supported Vector databases

Apache Cassandra
Astra DB
Azure Cognitive Search
Azure CosmosDB
BaiduVector DB
ChatGPT Retrieval Plugin
Chroma
DashVector
Databricks
Deeplake
DocArray
DuckDB
DynamoDB
Elasticsearch

FAISS
Jaguar
LanceDB
Lantern
Metal
MongoDB Atlas
MyScale
Milvus / Zilliz
Neo4jVector
OpenSearch
Pinecone
Postgres
pgvecto.rs

Qdrant
Redis
Rockset
Simple
SingleStore
Supabase
Tair
TiDB
TencentVectorDB
Timescale
Typesense
Upstash
Weaviate

Advanced query strategies

SubQuestionQueryEngine

Problems with precision

Small-to-big retrieval

query_engine = index.as_query_engine(
    similarity_top_k=2,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
response = query_engine.query(
    "What happened on August 3rd?"
)
print(response)

Precision through preprocessing

Metadata filtering

query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[ExactMatchFilter(key="year", value="2021")]
    )
)
response = query_engine.query(
    "What was the annual profit in 2021?"
)
print(response)

Auto-retrieval

vector_store_info = VectorStoreInfo(
    content_info="Brief summary of a movie",
    metadata_info=[
        MetadataInfo(
            name="year",
            description="The year the movie was released",
            type="integer",
        ),
        MetadataInfo(
            name="director",
            description="The name of the movie director",
            type="string",
        ),
    ],
)
retriever = VectorIndexAutoRetriever(
    index, vector_store_info=vector_store_info
)

Metadata support

Apache Cassandra
Astra DB
Azure AI Search
BaiduVector DB
Chroma
DashVector
Databricks
Deeplake
DocArray
DuckDB
Elasticsearch

Qdrant
Redis
Simple
SingleStore
Supabase
Tair
TiDB
TencentVectorDB
Timescale
Typesense
Weaviate

Jaguar
LanceDB
Lantern
Metal
MongoDB Atlas
MyScale
Milvus / Zilliz
OpenSearch
Pinecone
Postgres
pgvecto.rs

Hybrid Search

Hybrid search

query_engine = index.as_query_engine(
  vector_store_query_mode="hybrid",
  similarity_top_k=2,
  alpha=0.5
)
response = query_engine.query(
    "What did the author do growing up?",
)

Hybrid search support

Azure Cognitive Search
BaiduVector DB
DashVector
Elasticsearch
Jaguar
Lantern
MyScale

OpenSearch
Pinecone
Postgres
pgvecto.rs
Qdrant
TencentVectorDB
Weaviate

Text to SQL

Querying SQLDatabase

# connect to database
engine = create_engine("sqlite:///:memory:")
sql_database = SQLDatabase(
  engine, 
  include_tables=["city_stats"]
)
# create SQL query engine
query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats"],
)
query_str = "Which city has the highest population?"
response = query_engine.query(query_str)

SQLTableRetrieverQueryEngine

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats"))
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
    sql_database, obj_index.as_retriever(similarity_top_k=1)
)

Manually add table metadata

city_stats_text = (
    "This table gives information regarding the population and country of a"
    " given city. The user will query with codewords, where 'foo' corresponds"
    " to population and 'bar'corresponds to city."
)

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]

Multi-document agents

SECinsights.ai

Create query engines

documents = SimpleDirectoryReader("2020").load_data()
index2020 = VectorStoreIndex.from_documents(documents)
query_engine_2020 = index2020.as_query_engine()

documents = SimpleDirectoryReader("2021").load_data()
index2021 = VectorStoreIndex.from_documents(documents)
query_engine_2021 = index2021.as_query_engine()

documents = SimpleDirectoryReader("2022").load_data()
index2022 = VectorStoreIndex.from_documents(documents)
query_engine_2022 = index2022.as_query_engine()

Define tools

query_engine_tools = [
  QueryEngineTool(
    query_engine=query_engine_2020,
    metadata=ToolMetadata(
      name="2020_facts_tool",
      description=(
        "Contains facts about filings "
        "about the company from the year 2020"
      ),
    ),
  ),
  # ... etc ...
]

Define agent

function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
  query_engine_tools,
  llm=function_llm,
  system_prompt=f"""\
You are a specialized agent designed to answer queries about financial filings.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)

Composability

"2024 is the year of LlamaIndex in production"

– Shawn "swyx" Wang, Latent.Space podcast

npx create-llama

npmjs.com/package/create-llama

What next?

docs.llamaindex.ai

Follow me on Twitter: @seldo

ts.llamaindex.ai

Python:

TypeScript:

Advanced RAG techniques (Pulumi webinar)

By seldo

Advanced RAG techniques (Pulumi webinar)

Advanced RAG Techniques with LlamaIndex

What are we talking about?

RAG recap

You have to be selective

Accuracy

RAG challenges:

Faithfulness

RAG challenges:

Recency

RAG challenges:

Provenance

RAG challenges:

How do we do RAG?

1. Keyword search

How do we do RAG?

2. Structured queries

How do we do RAG?

3. Vector search

Vector embeddings

Search by meaning

What is LlamaIndex?

Supported LLMs

5 line starter

LlamaHub

LlamaParse

Supported embedding models

Supported Vector databases

Advanced query strategies

SubQuestionQueryEngine

Problems with precision

Small-to-big retrieval

Small-to-big retrieval

Precision through preprocessing

Metadata filtering

Auto-retrieval

Metadata support

Hybrid Search

Hybrid search

Hybrid search support

Text to SQL

Querying SQLDatabase

SQLTableRetrieverQueryEngine

Manually add table metadata

Multi-document agents

SECinsights.ai

Create query engines

Define tools

Define agent

Composability

"2024 is the year of LlamaIndex in production"

npx create-llama

What next?

Advanced RAG techniques (Pulumi webinar)

More from seldo