Advanced RAG Techniques with LlamaIndex
2024-05-08 Pulumi Webinar
What are we talking about?
- What is RAG?
- What is LlamaIndex
- The stages of RAG
- Ingestion
- Indexing
- Storing
- Querying
- Advanced querying strategies x7
- Getting into production
RAG recap
- Retrieve most relevant data
- Augment query with context
- Generate response
A solution to limited context windows
You have to be selective
and that's tricky
Accuracy
RAG challenges:
Faithfulness
RAG challenges:
Recency
RAG challenges:
Provenance
RAG challenges:
How do we do RAG?
1. Keyword search
How do we do RAG?
2. Structured queries
How do we do RAG?
3. Vector search
Vector embeddings
Turning words into numbers
Search by meaning
What is LlamaIndex?
- OSS libraries in Python and TypeScript
- LlamaParse - PDF parsing as a service
- LlamaCloud - managed ingestion service
Supported LLMs
5 line starter
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What's up?")
print(response)
LlamaHub
LlamaParse
part of LlamaCloud
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader
parser = LlamaParse(
result_type="markdown"
)
file_extractor = {".pdf": parser}
reader = SimpleDirectoryReader(
"./data",
file_extractor=file_extractor
)
documents = reader.load_data()
Supported embedding models
- OpenAI
- Langchain
- CohereAI
- Qdrant FastEmbed
- Gradient
- Azure OpenAI
- Elasticsearch
- Clarifai
- LLMRails
- Google PaLM
- Jina
- Voyage
...plus everything on Hugging Face!
Supported Vector databases
- Apache Cassandra
- Astra DB
- Azure Cognitive Search
- Azure CosmosDB
- BaiduVector DB
- ChatGPT Retrieval Plugin
- Chroma
- DashVector
- Databricks
- Deeplake
- DocArray
- DuckDB
- DynamoDB
- Elasticsearch
- FAISS
- Jaguar
- LanceDB
- Lantern
- Metal
- MongoDB Atlas
- MyScale
- Milvus / Zilliz
- Neo4jVector
- OpenSearch
- Pinecone
- Postgres
- pgvecto.rs
- Qdrant
- Redis
- Rockset
- Simple
- SingleStore
- Supabase
- Tair
- TiDB
- TencentVectorDB
- Timescale
- Typesense
- Upstash
- Weaviate
Advanced query strategies
SubQuestionQueryEngine
Problems with precision
Small-to-big retrieval
Small-to-big retrieval
query_engine = index.as_query_engine(
similarity_top_k=2,
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window")
],
)
response = query_engine.query(
"What happened on August 3rd?"
)
print(response)
Precision through preprocessing
Metadata filtering
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[ExactMatchFilter(key="year", value="2021")]
)
)
response = query_engine.query(
"What was the annual profit in 2021?"
)
print(response)
Auto-retrieval
vector_store_info = VectorStoreInfo(
content_info="Brief summary of a movie",
metadata_info=[
MetadataInfo(
name="year",
description="The year the movie was released",
type="integer",
),
MetadataInfo(
name="director",
description="The name of the movie director",
type="string",
),
],
)
retriever = VectorIndexAutoRetriever(
index, vector_store_info=vector_store_info
)
Metadata support
- Apache Cassandra
- Astra DB
- Azure AI Search
- BaiduVector DB
- Chroma
- DashVector
- Databricks
- Deeplake
- DocArray
- DuckDB
- Elasticsearch
- Qdrant
- Redis
- Simple
- SingleStore
- Supabase
- Tair
- TiDB
- TencentVectorDB
- Timescale
- Typesense
- Weaviate
- Jaguar
- LanceDB
- Lantern
- Metal
- MongoDB Atlas
- MyScale
- Milvus / Zilliz
- OpenSearch
- Pinecone
- Postgres
- pgvecto.rs
Hybrid Search
Hybrid search
query_engine = index.as_query_engine(
vector_store_query_mode="hybrid",
similarity_top_k=2,
alpha=0.5
)
response = query_engine.query(
"What did the author do growing up?",
)
Hybrid search support
- Azure Cognitive Search
- BaiduVector DB
- DashVector
- Elasticsearch
- Jaguar
- Lantern
- MyScale
- OpenSearch
- Pinecone
- Postgres
- pgvecto.rs
- Qdrant
- TencentVectorDB
- Weaviate
Text to SQL
Querying SQLDatabase
# connect to database
engine = create_engine("sqlite:///:memory:")
sql_database = SQLDatabase(
engine,
include_tables=["city_stats"]
)
# create SQL query engine
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["city_stats"],
)
query_str = "Which city has the highest population?"
response = query_engine.query(query_str)
SQLTableRetrieverQueryEngine
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
(SQLTableSchema(table_name="city_stats"))
]
obj_index = ObjectIndex.from_objects(
table_schema_objs,
table_node_mapping,
VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
sql_database, obj_index.as_retriever(similarity_top_k=1)
)
Manually add table metadata
city_stats_text = (
"This table gives information regarding the population and country of a"
" given city. The user will query with codewords, where 'foo' corresponds"
" to population and 'bar'corresponds to city."
)
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
(SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]
Multi-document agents
SECinsights.ai
Create query engines
documents = SimpleDirectoryReader("2020").load_data()
index2020 = VectorStoreIndex.from_documents(documents)
query_engine_2020 = index2020.as_query_engine()
documents = SimpleDirectoryReader("2021").load_data()
index2021 = VectorStoreIndex.from_documents(documents)
query_engine_2021 = index2021.as_query_engine()
documents = SimpleDirectoryReader("2022").load_data()
index2022 = VectorStoreIndex.from_documents(documents)
query_engine_2022 = index2022.as_query_engine()
Define tools
query_engine_tools = [
QueryEngineTool(
query_engine=query_engine_2020,
metadata=ToolMetadata(
name="2020_facts_tool",
description=(
"Contains facts about filings "
"about the company from the year 2020"
),
),
),
# ... etc ...
]
Define agent
function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
system_prompt=f"""\
You are a specialized agent designed to answer queries about financial filings.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)
Composability
"2024 is the year of LlamaIndex in production"
– Shawn "swyx" Wang, Latent.Space podcast
npx create-llama
What next?
Follow me on Twitter: @seldo
Python:
TypeScript:
Advanced RAG techniques (Pulumi webinar)
By seldo
Advanced RAG techniques (Pulumi webinar)
- 480