A deep dive into Retrieval-Augmented Generation with LlamaIndex
2023-12-12 AI.Dev Conference
What are we talking about?
- What is LlamaIndex
- The stages of RAG
- Ingestion
- Indexing
- Storing
- Querying
- Advanced querying strategies x7
What you'll learn
- Practical guide to using LlamaIndex
- Overview of RAG application development
- Techniques for advanced RAG apps
What's RAG again?
- Retrieve most relevant data
- Augment query with context
- Generate response
A solution to limited context windows
What is LlamaIndex?
llamaindex.ai
Available in Python and TypeScript
LlamaHub
npx create-llama
Supported LLMs
Supported Vector databases
- Apache Cassandra
- Astra DB
- Azure Cognitive Search
- Azure CosmosDB
- ChatGPT Retrieval Plugin
- Chroma
- DashVector
- Deeplake
- DocArray
- DynamoDB
- Elasticsearch
- FAISS
- LanceDB
- Lantern
- Metal
- MongoDB Atlas
- MyScale
- Milvus / Zilliz
- Neo4jVector
- OpenSearch
- Pinecone
- Postgres
- pgvecto.rs
- Qdrant
- Redis
- Simple
- SingleStore
- Supabase
- Tair
- TencentVectorDB
- Timescale
- Typesense
- Weaviate
Hundreds of connectors
LlamaIndex is batteries-included
Basic ingestion
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
Ingestion pipeline
# create the pipeline with transformations
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
OpenAIEmbedding(),
]
)
# run the pipeline
nodes = pipeline.run(
documents=SimpleDirectoryReader("data").load_data()
)
index = VectorStoreIndex(nodes)
Ingestion caching
# save
pipeline.persist("./pipeline_storage")
# load and restore state
new_pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=25, chunk_overlap=0),
TitleExtractor(),
],
)
new_pipeline.load("./pipeline_storage")
# will run instantly due to the cache
nodes = pipeline.run(
documents=SimpleDirectoryReader("data").load_data()
)
index = VectorStoreIndex(nodes)
Supported embedding models
- OpenAI
- Langchain
- CohereAI
- Qdrant FastEmbed
- Gradient
- Azure OpenAI
- Elasticsearch
- Clarifai
- LLMRails
- Google PaLM
- Jina
- Voyage
VectorStoreIndex
KnowledgeGraphIndex
Storing
Querying
We don't talk about prompting (much)
See prompts
query_engine = index.as_query_engine()
prompts_dict = query_engine.get_prompts()
Modify prompts
# shakespeare!
qa_prompt_tmpl_str = (
"Context information is below.\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"Given the context information and not prior knowledge, "
"answer the query in the style of a Shakespeare play.\n"
"Query: {query_str}\n"
"Answer: "
)
qa_prompt_tmpl = PromptTemplate(qa_prompt_tmpl_str)
query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
Basic querying
query_engine = index.as_query_engine()
response = query_engine.query(
"What is the capital of South Dakota?"
)
print(response)
Configure retriever
index = VectorStoreIndex.from_documents(documents)
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=5,
)
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize",
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
response = query_engine.query(
"What is the capital of South Dakota?"
)
print(response)
Configure synthesizer
index = VectorStoreIndex.from_documents(documents)
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=5,
)
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize",
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
response = query_engine.query(
"What is the capital of South Dakota?"
)
print(response)
Create query engine
index = VectorStoreIndex.from_documents(documents)
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=5,
)
response_synthesizer = get_response_synthesizer(
response_mode="tree_summarize",
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
response = query_engine.query(
"What is the capital of South Dakota?"
)
print(response)
Advanced query strategies
SubQuestionQueryEngine
SubQuestionQueryEngine
# setup base query engine as tool
query_engine_tools = [
QueryEngineTool(
query_engine=simple_query_engine,
metadata=ToolMetadata(
name="pg_essay",
description="Paul Graham essay on What I Worked On",
),
),
]
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools
)
Problems with precision
Small-to-big retrieval
Small-to-big retrieval
query_engine = index.as_query_engine(
similarity_top_k=2,
node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window")
],
)
response = query_engine.query(
"What happened on August 3rd?"
)
print(response)
Precision through preprocessing
Metadata filtering
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[ExactMatchFilter(key="year", value="2021")]
)
)
response = query_engine.query(
"What was the annual profit in 2021?"
)
print(response)
Auto-retrieval
vector_store_info = VectorStoreInfo(
content_info="Brief summary of a movie",
metadata_info=[
MetadataInfo(
name="year",
description="The year the movie was released",
type="integer",
),
MetadataInfo(
name="director",
description="The name of the movie director",
type="string",
),
],
)
retriever = VectorIndexAutoRetriever(
index, vector_store_info=vector_store_info
)
Metadata support
- Apache Cassandra
- Astra DB
- Chroma
- DashVector
- Deeplake
- DocArray
- Elasticsearch
- LanceDB
- Lantern
- Metal
- MongoDB Atlas
- MyScale
- Milvus / Zilliz
- OpenSearch
- Pinecone
- Postgres
- pgvecto.rs
- Qdrant
- Redis
- Simple
- SingleStore
- Supabase
- Tair
- TencentVectorDB
- Timescale
- Typesense
- Weaviate
Hybrid Search
Hybrid search
query_engine = index.as_query_engine(
vector_store_query_mode="hybrid",
similarity_top_k=2,
alpha=0.5
)
response = query_engine.query(
"What did the author do growing up?",
)
Hybrid search support
- Azure Cognitive Search
- Elasticsearch
- Lantern
- MyScale
- Pinecone
- Postgres
- pgvecto.rs
- TencentVectorDB
- Weaviate
Complex document strategies
PandasQueryEngine
reader = PyMuPDFReader()
table_dfs = #...parse tables into pandas structures...
df_query_engines = [
PandasQueryEngine(table_df)
for table_df in table_dfs
]
IndexNodes
# define index nodes
summaries = [
(
"This node provides information about the world's richest billionaires"
" in 2023"
),
(
"This node provides information on the number of billionaires and"
" their combined net worth from 2000 to 2023."
),
]
df_nodes = [
IndexNode(text=summary, index_id=f"pandas{idx}")
for idx, summary in enumerate(summaries)
]
df_id_query_engine_mapping = {
f"pandas{idx}": df_query_engine
for idx, df_query_engine in enumerate(df_query_engines)
}
Sub-Retriever
doc_nodes = service_context.node_parser.get_nodes_from_documents(docs)
vector_index = VectorStoreIndex(doc_nodes + df_nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)
RecursiveRetriever
recursive_retriever = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever},
query_engine_dict=df_id_query_engine_mapping,
)
response_synthesizer = get_response_synthesizer()
query_engine = RetrieverQueryEngine.from_args(
recursive_retriever, response_synthesizer=response_synthesizer
)
response = query_engine.query(
"What's the net worth of the second richest billionaire in 2023?"
)
Text to SQL
SQLDatabase
engine = create_engine("sqlite:///:memory:")
sql_database = SQLDatabase(
engine,
include_tables=["city_stats"]
)
Querying SQLDatabase
query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=["city_stats"],
)
query_str = "Which city has the highest population?"
response = query_engine.query(query_str)
SQLTableRetrieverQueryEngine
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
(SQLTableSchema(table_name="city_stats"))
]
obj_index = ObjectIndex.from_objects(
table_schema_objs,
table_node_mapping,
VectorStoreIndex,
)
query_engine = SQLTableRetrieverQueryEngine(
sql_database, obj_index.as_retriever(similarity_top_k=1)
)
Manually add table metadata
city_stats_text = (
"This table gives information regarding the population and country of a"
" given city. The user will query with codewords, where 'foo' corresponds"
" to population and 'bar'corresponds to city."
)
table_node_mapping = SQLTableNodeMapping(sql_database)
table_schema_objs = [
(SQLTableSchema(table_name="city_stats", context_str=city_stats_text))
]
Multi-document agents
SECinsights.ai
Create query engines
documents = SimpleDirectoryReader("2020").load_data()
index2020 = VectorStoreIndex.from_documents(documents)
query_engine_2020 = index2020.as_query_engine()
documents = SimpleDirectoryReader("2021").load_data()
index2021 = VectorStoreIndex.from_documents(documents)
query_engine_2021 = index2021.as_query_engine()
documents = SimpleDirectoryReader("2022").load_data()
index2022 = VectorStoreIndex.from_documents(documents)
query_engine_2022 = index2022.as_query_engine()
Define tools
query_engine_tools = [
QueryEngineTool(
query_engine=query_engine_2020,
metadata=ToolMetadata(
name="2020_facts_tool",
description=(
"Contains facts about filings "
"about the company from the year 2020"
),
),
),
# ... etc ...
]
Define agent
function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
system_prompt=f"""\
You are a specialized agent designed to answer queries about financial filings.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)
Composability
Recap
- What LlamaIndex is
- Orchestration
- A registry of software
- A path to production
- Stages of RAG
- Ingestion
- Indexing
- Storing
- Querying
Advanced retrieval strategies
- SubQuestionQueryEngine
- Small-to-big retrieval
- Metadata filtering
- Hybrid search
- Recursive retrieval
- Text-to-SQL
- Multi-document agents
What next?
Follow me on twitter: @seldo
A deep dive into RAG
By seldo
A deep dive into RAG
- 1,340