Retrieval-Augmented Generation with LlamaIndex

and Azure Cosmos DB

2024-01-09 Azure Cosmos DB User Group

Who is this guy?

What are we talking about?

What is AI?
What is LlamaIndex?
Retrieval-Augmented Generation (RAG)
RAG with Azure Cosmos DB
7 advanced RAG strategies

What is AI?

Machine Learning (ML)

AI = ML + Marketing

Large Language Models (LLMs)

Are LLMs

"completing prompts" or "thinking"?

Retrieval-Augmented Generation (RAG)

Context

Selection

Hallucinations

Provenance

RAG

Retrieve context

Augment prompt

Generate answer

Vector embeddings

Turning words into numbers

Search by meaning

What is LlamaIndex?

llamaindex.ai

LlamaHub

llamahub.ai

Data loaders
Agent tools
Llama packs
Llama datasets

npx create-llama

Supported LLMs

Supported Vector databases

Apache Cassandra
Astra DB
Azure Cognitive Search
Azure CosmosDB
ChatGPT Retrieval Plugin
Chroma
DashVector
Deeplake
DocArray
DynamoDB
Elasticsearch

FAISS
LanceDB
Lantern
Metal
MongoDB Atlas
MyScale
Milvus / Zilliz
Neo4jVector
OpenSearch
Pinecone
Postgres

pgvecto.rs
Qdrant
Redis
Simple
SingleStore
Supabase
Tair
TencentVectorDB
Timescale
Typesense
Weaviate

LlamaIndex is the batteries-included framework

Get started in 6 lines

from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(reponse)

Get started in 6 lines

from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(reponse)

Get started in 6 lines

from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(reponse)

Get started in 6 lines

from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(reponse)

Get started in 6 lines

from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(reponse)

Azure Cosmos DB demo repo

https://github.com/run-llama/azure-cosmos-db-demo

Architecture

vCore Cluster creation

Import data

json_file = 'tinytweets.json'

# Load environment variables from local .env file
from dotenv import load_dotenv
load_dotenv()

import os
import json
from pymongo.mongo_client import MongoClient

# Load the tweets from a local file
with open(json_file, 'r') as f:
    tweets = json.load(f)

# Create a new client and connect to the server
client = MongoClient(os.getenv('MONGODB_URI'))
db = client[os.getenv("MONGODB_DATABASE")]
collection = db[os.getenv("MONGODB_COLLECTION")]

# Insert the tweets into mongo
collection.insert_many(tweets)

Load

query_dict = {}
reader = SimpleMongoReader(uri=os.getenv("MONGODB_URI"))
documents = reader.load_data(
    os.getenv("MONGODB_DATABASE"),
    os.getenv("MONGODB_COLLECTION"),
    field_names=["full_text"],
    query_dict=query_dict
)

Index

# Create a new client and connect to the server
client = MongoClient(os.getenv("MONGODB_URI"))

# create Azure Cosmos as a vector store
store = AzureCosmosDBMongoDBVectorSearch(
    client,
    db_name=os.getenv('MONGODB_DATABASE'),
    collection_name=os.getenv('MONGODB_VECTORS'),
    index_name=os.getenv('MONGODB_VECTOR_INDEX')
)

Store

storage_context = StorageContext.from_defaults(vector_store=store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context,
    show_progress=True
)

Query!

query_engine = index.as_query_engine(similarity_top_k=20)
response = query_engine.query("What does the author think of web frameworks?")
print(response)

Demo!

Going beyond

naive RAG

Why?

Scale
Precision of provenance
Complexity

SubQuestionQueryEngine

Small-to-big retrieval

Metadata filtering

Hybrid search

Recursive retrieval

Text to SQL

Multi-document agents

SECinsights.ai

Recap

What is AI?
What is RAG?
Vector search
What is LlamaIndex?
LlamaHub
create-llama
Building RAG with Azure Cosmos DB
7x Advanced query strategies

What next?

docs.llamaindex.ai

Retrieval-Augmented Generation with LlamaIndex

and Azure Cosmos DB

Who is this guy?

What are we talking about?

What is AI?

Machine Learning (ML)

Large Language Models (LLMs)

Are LLMs

"completing prompts" or "thinking"?

Retrieval-Augmented Generation (RAG)

Context

Selection

Hallucinations

Provenance

RAG

Vector embeddings

Search by meaning

What is LlamaIndex?

LlamaHub

npx create-llama

Supported LLMs

Supported Vector databases

LlamaIndex is the batteries-included framework

Get started in 6 lines

Get started in 6 lines

Get started in 6 lines

Get started in 6 lines

Get started in 6 lines

Azure Cosmos DB demo repo

Architecture

Architecture

Architecture

vCore Cluster creation

Import data

Load

Index

Store

Query!

Demo!

Going beyond

naive RAG

SubQuestionQueryEngine

Small-to-big retrieval

Metadata filtering

Hybrid search

Recursive retrieval

Text to SQL

Multi-document agents

SECinsights.ai

Recap

What next?

RAG with Azure Cosmos DB

More from seldo