Introduction to RAG
Representation of Words
Training Paradigm
word embedding
向量長度 =詞彙數量(e.g. length: 10000)
Network Architecture
輸入層
隱藏層
輸出層
Transformer, 2017/06
Word2Vec, 2013/01
Pre-trained, 2010/10
Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. ArXiv, abs/1409.3215.
seq2seq model, 2014/09
Encoder-Decoder Model for NLP tasks
文字型任務
編碼器
解碼器
attention, 2014/09
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473.
seq2seq model, 2014/09
Attention: focus on specific parts of input while generating output
認知科學:選擇性注意力
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (p./pp. 5998--6008), .
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
Self-Attention: input interact with each other
上下文context
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
BERT, 2018/10
token
embedding
(Encoder-only)
(BERT base)
預訓練pre-training
prediction (probability)
下游任務downstream task
FNN
(微調, Fine-tuning)
dim: 768
遷移學習:預訓練(pre-trained) + 目標領域資料集收集、訓練(fine-tune)
機器學習傳統做法
資料集1
資料集2
資料集3
任務1
任務2
任務3
兩階段: 非監督式、監督式
不同任務、不同資料集
Transfer Learning
來源領域任務
資料集S
目標領域任務
大量
資料集T
小量
預訓練unsupervised
微調supervised
Pan, S.J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359.
pre-trained, 2010
One-hot encoding
1
2
3
4
5
6
7
8
編碼:以向量表示
編號
詞彙
<SOS>
編號
0 0 0 0 0 0 0 0
<SOS> I played the piano
編碼
輸入層
隱藏層
輸出層
Word Embedding(詞嵌入)
字彙語意相近,編碼必須給予比較近的「距離」
Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
Word2Vec, 2013/01
Word Embedding
詞彙
維度
Word2Vec
Word2Vec學習大量詞彙,將字詞對應到100-300維度的空間。
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
decoder-only
117M Parameters
1,542M Parameters
預訓練
預訓練
預訓練
預訓練
微調
微調
微調
微調
微調
Transform-based Model
●●●
pre-trained data sets
NN1
Task 1
NN2
Task 2
NNn
Task N
target domain data sets
●●●
●●●
●●●
LLM
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
難以捕捉更高層次的語義概念
難以進行邏輯運算和因果推理
缺乏情境理解
Capabilities of Pattern Matching
Mirzadeh, I., Alizadeh-Vahid, K., Shahrokhi, H., Tuzel, O., Bengio, S., & Farajtabar, M. (2024). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.
One of the solutions: RAG
Hallucination
Prompt Engineering
Maleki, N., Padmanabhan, B., & Dutta, K. (2024, June). AI hallucinations: a misnomer worth clarifying. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 133-138). IEEE.
定義分歧!
Maleki, N., Padmanabhan, B., & Dutta, K. (2024, June). AI hallucinations: a misnomer worth clarifying. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 133-138). IEEE.
Maleki, N., Padmanabhan, B., & Dutta, K. (2024, June). AI hallucinations: a misnomer worth clarifying. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 133-138). IEEE.
部份研究建議的替代用語:
Maleki, N., Padmanabhan, B., & Dutta, K. (2024, June). AI hallucinations: a misnomer worth clarifying. In 2024 IEEE conference on artificial intelligence (CAI) (pp. 133-138). IEEE.
結論與建議
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
RAG, 2020/05
RAG for knowledge intensive tasks
1. parametric memory: a pre-trained seq2seq model
2. non-parametric memory: a dense vector index of Wikipedia
e.g. word embedding
embedding of x
向量搜尋
embedding of 文件zi
向量搜尋前n名
Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
1. Temperature(以OpenAI為例)
介於 0至2之間,數值越高生成結果越隨機。
語言模型生成內容時可調參數(非所有模型都支援):兼具正確性與創造性
2. Top-P (Nucleus Sampling, 以OpenAI為例)
p介於0.1至1之間,以0.1為例,代表生成「下一個token時」只考慮排序後,前10%的tokens。
3. Top-K (Top-K Sampling)
k為一正整數,以32為例,代表生成「下一個token時」只考慮排序後,前32名的tokens。
Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
non-parametric memory: a dense vector index of Wikipedia
內部知識
外部知識
語言模型
vector embedding
或
graph embedding
問題
解答
parametric memory: a pre-trained seq2seq model
實作方式
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
RAG, 2020/05
ReAct, 2022/10
Chain-of-Thought + Act
Act 1: Search [Q]
Act 2: Search [Obs 1]
Act 3: Search [Obs 2]
Act 4: Finish [answer]
context
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
Chain-of-Thought(CoT)
問題:如果今天是星期二,那麼 3 天後是星期幾?
一般模型回答(無推理):
星期五(如果沒有推理過程,可能會答錯)
CoT 推理過程:
今天是星期二。
過 1 天是星期三。
過 2 天是星期四。
過 3 天是星期五。
答案是星期五。
CoT 最終回答:
星期五
範例: 邏輯推理
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
讓模型交錯生成「推理」(reasoning)與「行動」(act),以便在決策過程中動態更新計畫、適應變化、並獲取外部資訊。
ReAct方法
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
「推理 → 行動 → 觀察」的迴圈來進行決策,每個步驟的推理與行動都是基於當前觀察到的結果:
ReAct 的推理與調整機制
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
調整策略 | 示例 |
---|---|
變更檢索策略 | 若搜尋「Apple Remote」無結果,則改搜尋「Apple TV 遙控器」 |
修正推理過程 | 若模型推理「A 是 B 的創辦人」,但查無資料,則改為「A 可能參與了 B 的早期發展」 |
重新規劃行動順序 | 若計畫「先檢查抽屜再看桌面」失敗,則改為「先檢查桌面再看抽屜」 |
嘗試不同的知識來源 | 若維基百科無法提供答案,則改用 Google 搜尋(若環境允許) |
調整策略
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
範例: ALFWorld(互動環境)
任務:「將胡椒罐放進抽屜」
初始推理:「胡椒罐可能在廚房的櫃子或餐桌上,我應該先去廚房。」
行動 1:「前往廚房」
觀察結果:「廚房內有一個櫃子。」
行動 2:「打開櫃子」
觀察結果:「櫃子裡沒有胡椒罐。」
調整推理:「胡椒罐可能在餐桌上,我應該去看看。」
行動 3:「前往餐桌」
觀察結果:「餐桌上有胡椒罐。」
行動 4:「拿起胡椒罐並放入抽屜」
最終結果:「任務完成!」
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
RAG, 2020/05
ReAct, 2022/10
Setting, Resouces
Interactive UI for Work Flow Design
Context
Query
Prompt
LLM
Output
Vector DB
❶ Dataset
❸ Embedding
➍ Similarity
❷
➎ Reranking algorithm
➏
web crawler
Context
Query
Prompt
LLM
Output
Vector DB
RAG
關懷理論
Scenario: 法律扶助
問責 合規
來源 透明
專家審查
defining an appropriate workflow
pip install llama-index
1. 安裝Python套件
●●●
2. 準備知識庫
支援各種檔案格式(參考官網文件)
3. 設定OpenAI API Key (可更換別的LLM)
2. 準備知識庫
data資料夾:5個PDF檔
範例程式
知識庫
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
輸入申請之API KEY
import os from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader,) def setCurrentWD(): abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) os.chdir(dname) setCurrentWD() # 設定工作目錄, 以免找不到data資料夾 # 1. Loading & Parsing documents = SimpleDirectoryReader("data").load_data() # 2. Indexing & vector store index = VectorStoreIndex.from_documents(documents) # 3. Query query_engine = index.as_query_engine() response = query_engine.query("Tell me about rag") print(response)
import os from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader,) def setCurrentWD(): abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) os.chdir(dname) setCurrentWD() # 設定工作目錄, 以免找不到data資料夾 # 1. Loading & Parsing documents = SimpleDirectoryReader("data").load_data() # 2. Indexing & vector store index = VectorStoreIndex.from_documents(documents) # 3. Query query_engine = index.as_query_engine() response = query_engine.query("Tell me about rag") print(response)
import os from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader,) def setCurrentWD(): abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) os.chdir(dname) setCurrentWD() # 設定工作目錄, 以免找不到data資料夾 # 1. Loading & Parsing documents = SimpleDirectoryReader("data").load_data() # 2. Indexing & vector store index = VectorStoreIndex.from_documents(documents) # 3. Query query_engine = index.as_query_engine() response = query_engine.query("Tell me about rag") print(response)
import os from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader,) def setCurrentWD(): abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) os.chdir(dname) setCurrentWD() # 設定工作目錄, 以免找不到data資料夾 # 1. Loading & Parsing documents = SimpleDirectoryReader("data").load_data() # 2. Indexing & vector store index = VectorStoreIndex.from_documents(documents) # 3. Query query_engine = index.as_query_engine() response = query_engine.query("Tell me about rag") print(response)
import os from llama_index.core import ( VectorStoreIndex, SimpleDirectoryReader,) def setCurrentWD(): abspath = os.path.abspath(__file__) dname = os.path.dirname(abspath) os.chdir(dname) setCurrentWD() # 設定工作目錄, 以免找不到data資料夾 # 1. Loading & Parsing documents = SimpleDirectoryReader("data").load_data() # 2. Indexing & vector store index = VectorStoreIndex.from_documents(documents) # 3. Query query_engine = index.as_query_engine() response = query_engine.query("Tell me about rag") print(response)
response = query_engine.query("Tell me about rag")
RAG models leverage a retriever to retrieve text documents based on an input query and use them as additional context when generating a target sequence. These models have been shown to achieve state-of-the-art results on various tasks such as open Natural Questions, WebQuestions, CuratedTrec, MS-MARCO, Jeopardy question generation, and FEVER fact verification. RAG models generate responses that are more factual, specific, and diverse compared to baseline models like BART. The retrieval mechanism in RAG plays a key role in improving results across different tasks.
實際回應
出處
$0.12
<$0.01
Embedding (indexes) 可以只算一次
Embedding預訓練是不是MultiLingual?效果差很多!
# if not installed
pip install vllm
1. CUDA環境 GPU運算能力值, 驅動程式, CUDA Toolkit
2. Python環境 虛擬環境設定
3. Start the local model using vLLM
python -m vllm.entrypoints.openai.api_server --model=模型名稱
可用模型
4. 啟動模型
部分模型需要token: 先註冊HuggingFace帳號
建立新token,注意開啟inference權限
取名
建立token後,複製到剪貼簿供後續使用
使用條件
模型資訊
於提示列貼上token
export HF_TOKEN=貼上TOKEN
pip install -U "huggingface_hub[cli]" # 如有必要
huggingface-cli login
或是設定HF_TOKEN環境變數
token貼於此
n
token合法
python -m vllm.entrypoints.openai.api_server --model=meta-llama/Llama-3.2-1B-Instruct --max_model_len 4096
啟動模型
成功啟動於http://localhost:8000
# 範例:使用 Meta Llama 3 8B Instruct 模型
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--host 0.0.0.0 \ # 本機測試,用 localhost 或 127.0.0.1 也可以
--port 8000 \
--gpu-memory-utilization 0.90 # 可選:調整 GPU 內存使用率
# --tensor-parallel-size 1 # 可選:如果有多 GPU 可以設定
# --max-model-len 4096 # 可選:設定模型最大長度
pip install llama-index llama-index-llms-openai openai
LlamaIndex 核心及 OpenAI Client 整合套件(vLLM api與OpenAI API相容)
本地端Embedding模型以HuggingFace模型為例
pip install llama-index-embeddings-huggingface sentence-transformers
pip install llama-index-embeddings-ollama
# 確保 Ollama 服務運行並已 pull embedding 模型,
# e.g., ollama pull nomic-embed-textpip install llama-index-embeddings-huggingface sentence-transformers
本地端Embedding模型(optional)以ollama模型為例
from openai import OpenAI client = OpenAI( base_url="http://localhost:8000/v1", api_key="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", ) completion = client.chat.completions.create( model="meta-llama/Llama-3.2-1B-Instruct", messages=[ {"role": "system", "content": "一律以台灣繁體中文慣用語回覆"}, {"role": "user", "content": "什麼是語言模型"} ], max_tokens=512 )
練習:改用其他model, 如https://huggingface.co/facebook/m2m100_1.2B, 檢視結果
Central to many NLP, recommendation, and search algorithms.
數值
物件、文字、圖像...
Vector Space: semantic similarity
Barančíková, P., & Bojar, O. (2019). In search for linear relations in sentence embedding spaces.
Pavan Belagatti, Vector Embeddings Explained for Developers!
pip install -U transformers torch
from transformers import AutoTokenizer, AutoModel import torch def get_huggingface_embedding(text, model_name='sentence-transformers/all-MiniLM-L6-v2'): tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name) inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) # You can choose how to derive the final embeddings, e.g., mean pooling embeddings = outputs.last_hidden_state.mean(dim=1).squeeze().numpy() return embeddings # Example usage text = "Pavan is a developer evangelist." embedding_huggingface = get_huggingface_embedding(text) print(embedding_huggingface)
Source: Jayita Bhattacharyya, A Brief Comparison of Vector Databases
Source: Jayita Bhattacharyya, A Brief Comparison of Vector Databases
1. 下載安裝postgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib -y
1.1 更新套件列表
1.2 安裝postgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
1.3 啟動並檢查 PostgreSQL 服務
# 安裝repository的public key(if not done previously):
curl -fsS https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo gpg --dearmor -o /usr/share/keyrings/packages-pgadmin-org.gpg
2.1 設定儲存庫(repository)
3. 啟動並檢查 PostgreSQL 服務
# 建立repository設定檔:
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/packages-pgadmin-org.gpg] https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/$(lsb_release -cs) pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'
# 安裝桌面版與webserver版
sudo apt install pgadmin4
2.2 安裝pgAdmin
# 設定webserver, 若有安裝pgadmin4-web:
sudo /usr/pgadmin4/bin/setup-web.sh
此帳號為資料庫之「超級管理員」
輸入前頁設定之email與密碼
1. 下載安裝postgreSQL
資料庫管理工具(GUI)
1. Stack Builder可取消
2. 預設已建立一個DB伺服器
3. 過程中可能要設定[超級管理員]密碼
也可另外新增伺服器
已安裝1伺服器
若未設定密碼,則點選後需設定[超級管理員
]的密碼
2. 開啟pgAdmin4,建立與管理伺服器
自訂名稱
網址
phAdmin的master password
2. 開啟pgAdmin4,建立與管理伺服器
輸入[超級管理員]的密碼
[超級管理員]: postgres
2. 開啟pgAdmin4,建立與管理伺服器
2. 開啟pgAdmin4,建立與管理伺服器
3. 建立使用者,設定相關權限
按右鍵建立使用者
3. 建立使用者,設定相關權限
設定帳號名稱
3. 建立使用者,設定相關權限
設定密碼
3. 建立使用者,設定相關權限
開啟此選項,保留其他預設值