Advanced RAG Techniques with LlamaIndex

2024-05-08 Pulumi Webinar

What are we talking about?

  • What is RAG?
  • What is LlamaIndex
  • The stages of RAG
    • Ingestion
    • Indexing
    • Storing
    • Querying
  • Advanced querying strategies x7
  • Getting into production

RAG recap

  • Retrieve most relevant data
  • Augment query with context
  • Generate response

A solution to limited context windows

You have to be selective

and that's tricky

Accuracy

RAG challenges:

Faithfulness

RAG challenges:

Recency

RAG challenges:

Provenance

RAG challenges:

How do we do RAG?

1. Keyword search

How do we do RAG?

2. Structured queries

How do we do RAG?

3. Vector search

Vector embeddings

Turning words into numbers

Search by meaning

What is LlamaIndex?

llamaindex.ai

  • OSS libraries in Python and TypeScript
  • LlamaParse - PDF parsing as a service
  • LlamaCloud - managed ingestion service

Supported LLMs

5 line starter

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What's up?")
print(response)

LlamaHub

LlamaParse

part of LlamaCloud

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

parser = LlamaParse(
    result_type="markdown"
)

file_extractor = {".pdf": parser}
reader = SimpleDirectoryReader(
  "./data", 
  file_extractor=file_extractor
)
documents = reader.load_data()

Supported embedding models

  • OpenAI 
  • Langchain 
  • CohereAI 
  • Qdrant FastEmbed 
  • Gradient 
  • Azure OpenAI
  • Elasticsearch 
  • Clarifai
  • LLMRails 
  • Google PaLM 
  • Jina 
  • Voyage 

...plus everything on Hugging Face!

Supported Vector databases

  • Apache Cassandra
  • Astra DB
  • Azure Cognitive Search
  • Azure CosmosDB
  • BaiduVector DB
  • ChatGPT Retrieval Plugin
  • Chroma
  • DashVector
  • Databricks
  • Deeplake
  • DocArray
  • DuckDB
  • DynamoDB
  • Elasticsearch
  • FAISS
  • Jaguar
  • LanceDB
  • Lantern
  • Metal
  • MongoDB Atlas
  • MyScale
  • Milvus / Zilliz
  • Neo4jVector
  • OpenSearch
  • Pinecone
  • Postgres
  • pgvecto.rs
  • Qdrant
  • Redis
  • Rockset
  • Simple
  • SingleStore
  • Supabase
  • Tair
  • TiDB 
  • TencentVectorDB
  • Timescale
  • Typesense
  • Upstash
  • Weaviate

Advanced query strategies

SubQuestionQueryEngine

Problems with precision

Small-to-big retrieval

Small-to-big retrieval

query_engine = index.as_query_engine(
    similarity_top_k=2,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
)
response = query_engine.query(
    "What happened on August 3rd?"
)
print(response)

Precision through preprocessing

Metadata filtering

query_engine = index.as_query_engine(
    filters=MetadataFilters(
        filters=[ExactMatchFilter(key="year", value="2021")]
    )
)
response = query_engine.query(
    "What was the annual profit in 2021?"
)
print(response)

Auto-retrieval

vector_store_info = VectorStoreInfo(
    content_info="Brief summary of a movie",
    metadata_info=[
        MetadataInfo(
            name="year",
            description="The year the movie was released",
            type="integer",
        ),
        MetadataInfo(
            name="director",
            description="The name of the movie director",
            type="string",
        ),
    ],
)
retriever = VectorIndexAutoRetriever(
    index, vector_store_info=vector_store_info
)

Metadata support

  • Apache Cassandra
  • Astra DB
  • Azure AI Search
  • BaiduVector DB
  • Chroma
  • DashVector
  • Databricks
  • Deeplake
  • DocArray
  • DuckDB
  • Elasticsearch
  • Qdrant
  • Redis
  • Simple
  • SingleStore
  • Supabase
  • Tair
  • TiDB
  • TencentVectorDB
  • Timescale
  • Typesense
  • Weaviate
  • Jaguar
  • LanceDB
  • Lantern
  • Metal
  • MongoDB Atlas
  • MyScale
  • Milvus / Zilliz
  • OpenSearch
  • Pinecone
  • Postgres
  • pgvecto.rs

Hybrid Search

Hybrid search

query_engine = index.as_query_engine(
  vector_store_query_mode="hybrid",
  similarity_top_k=2,
  alpha=0.5
)
response = query_engine.query(
    "What did the author do growing up?",
)

Hybrid search support

  • Azure Cognitive Search
  • BaiduVector DB
  • DashVector
  • Elasticsearch
  • Jaguar
  • Lantern
  • MyScale
  • OpenSearch
  • Pinecone
  • Postgres
  • pgvecto.rs
  • Qdrant
  • TencentVectorDB
  • Weaviate

Text to SQL

Querying SQLDatabase

# connect to database
engine = create_engine("sqlite:///:memory:")
sql_database = SQLDatabase(
  engine, 
  include_tables=["city_stats"]
)
# create SQL query engine
query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["city_stats"],
)
query_str = "Which city has the highest population?"
response = query_engine.query(query_str)

SQLTableRetrieverQueryEngine

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats"))
]

obj_index = ObjectIndex.from_objects(
    table_schema_objs,
    table_node_mapping,
    VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
    sql_database, obj_index.as_retriever(similarity_top_k=1)
)

Manually add table metadata

city_stats_text = (
    "This table gives information regarding the population and country of a"
    " given city. The user will query with codewords, where 'foo' corresponds"
    " to population and 'bar'corresponds to city."
)

table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
    (SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]

Multi-document agents

SECinsights.ai

Create query engines

documents = SimpleDirectoryReader("2020").load_data()
index2020 = VectorStoreIndex.from_documents(documents)
query_engine_2020 = index2020.as_query_engine()

documents = SimpleDirectoryReader("2021").load_data()
index2021 = VectorStoreIndex.from_documents(documents)
query_engine_2021 = index2021.as_query_engine()

documents = SimpleDirectoryReader("2022").load_data()
index2022 = VectorStoreIndex.from_documents(documents)
query_engine_2022 = index2022.as_query_engine()

Define tools

query_engine_tools = [
  QueryEngineTool(
    query_engine=query_engine_2020,
    metadata=ToolMetadata(
      name="2020_facts_tool",
      description=(
        "Contains facts about filings "
        "about the company from the year 2020"
      ),
    ),
  ),
  # ... etc ...
]

Define agent

function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
  query_engine_tools,
  llm=function_llm,
  system_prompt=f"""\
You are a specialized agent designed to answer queries about financial filings.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)

Composability

"2024 is the year of LlamaIndex in production"

– Shawn "swyx" Wang, Latent.Space podcast

npx create-llama

What next?

Follow me on Twitter: @seldo

Python:

TypeScript:

Advanced RAG techniques (Pulumi webinar)

By seldo

Advanced RAG techniques (Pulumi webinar)

  • 442