AI應用研究教育中心
真理大學
王柳鋐
2024/11
Last modified
2024/11/5
☞ Revolution of LLM
☞ Blooming of LLM
☞ More Than You Think, Lower Than Expected
☞ Practice using LlamaIndex & GPT
Representation of Words
Training Paradigm
word embedding
向量長度 =詞彙數量(e.g. length: 10000)
Network Architecture
輸入層
隱藏層
輸出層
Transformer, 2017/06
Word2Vec, 2013/01
Pre-trained, 2010/10
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554-2558.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4), 541-551
Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to Sequence Learning with Neural Networks. ArXiv, abs/1409.3215.
seq2seq model, 2014/09
Encoder-Decoder Model for NLP tasks
文字型任務
編碼器
解碼器
attention, 2014/09
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473.
seq2seq model, 2014/09
Attention: focus on specific parts of input while generating output
認知科學:選擇性注意力
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (p./pp. 5998--6008), .
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
Self-Attention: input interact with each other
上下文context
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
BERT, 2018/10
token
embedding
(Encoder-only)
(BERT base)
預訓練pre-training
prediction (probability)
下游任務downstream task
FNN
(微調, Fine-tuning)
dim: 768
遷移學習:預訓練(pre-trained) + 目標領域資料集收集、訓練(fine-tune)
機器學習傳統做法
資料集1
資料集2
資料集3
任務1
任務2
任務3
兩階段: 非監督式、監督式
不同任務、不同資料集
Transfer Learning
來源領域任務
資料集S
目標領域任務
大量
資料集T
小量
預訓練unsupervised
微調supervised
Pan, S.J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345-1359.
pre-trained, 2010
One-hot encoding
1
2
3
4
5
6
7
8
編碼:以向量表示
編號
詞彙
<SOS>
編號
0 0 0 0 0 0 0 0
<SOS> I played the piano
編碼
輸入層
隱藏層
輸出層
Word Embedding(詞嵌入)
字彙語意相近,編碼必須給予比較近的「距離」
Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. International Conference on Learning Representations.
Word2Vec, 2013/01
Word Embedding
詞彙
維度
Word2Vec
Word2Vec學習大量詞彙,將字詞對應到100-300維度的空間。
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
decoder-only
117M Parameters
1,542M Parameters
預訓練
預訓練
預訓練
預訓練
微調
微調
微調
微調
微調
Transform-based Model
●●●
pre-trained data sets
NN1
Task 1
NN2
Task 2
NNn
Task N
target domain data sets
●●●
●●●
●●●
LLM
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
Nathan Bos, Ph.D,Embeddings Are Kind of Shallow, 2024
難以捕捉更高層次的語義概念
難以進行邏輯運算和因果推理
缺乏情境理解
Gemini Pro 1.5
GTP-4o
Claude 3.5 Sonnet
LLaMa 3.2
LLaMa 3.1 Sonar Large
Microsoft 365 Copilot
Open AI
GTP-o1 preview
Capabilities of Pattern Matching
Mirzadeh, I., Alizadeh-Vahid, K., Shahrokhi, H., Tuzel, O., Bengio, S., & Farajtabar, M. (2024). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.
One of the solutions: RAG
Hallucination
Prompt Engineering
McCann, B., Keskar, N.S., Xiong, C., & Socher, R. (2018). The Natural Language Decathlon: Multitask Learning as Question Answering. ArXiv, abs/1806.08730.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
A prompt is natural language text describing the task that an AI should perform.
It can be a query, specifying a style, providing relevant context, or assigning a role.
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Brown, T.B., Mann, B., Ryder, N., et. al. (2020). Language Models are Few-Shot Learners. ArXiv, abs/2005.14165.
Prompt Engineering, 2018/06
Meta-learning: learning in context, not real training.
Few-Shot Learner, 2020/05
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Few-Shot Learner, 2020/05
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Xia, F., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv, abs/2201.11903.
Prompt Engineering, 2018/06
Chain-of-Thought, 2022/01
Step-by-Step
as Input
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks." Advances in Neural Information Processing Systems 33 (2020): 9459-9474.
RAG, 2020/05
RAG for knowledge intensive tasks
1. parametric memory: a pre-trained seq2seq model
2. non-parametric memory: a dense vector index of Wikipedia
e.g. word embedding
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ArXiv, abs/2210.03629.
RAG, 2020/05
ReAct, 2022/10
Chain-of-Thought
Act 1: Search [Q]
Act 2: Search [Obs 1]
Act 3: Search [Obs 2]
Act 4: Finish [answer]
context
attention, 2014/09
seq2seq model, 2014/09
Transformer, 2017/06
GPT-2, 2019
BERT, 2018/10
Prompt Engineering, 2018/06
RAG, 2020/05
ReAct, 2022/10
Setting, Resouces
Work Flow
LangChain + LangSmith
Ji, S., Pan, S., Cambria, E., Marttinen, P., & Yu, P.S. (2020). A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Transactions on Neural Networks and Learning Systems, 33, 494-514.
Zou, X. (2020). A Survey on Application of Knowledge Graph. Journal of Physics: Conference Series, 1487.
web crawler
Context
Query
Prompt
LLM
Output
Vector DB
RAG
關懷理論
Scenario: 法律扶助
問責 合規
來源 透明
專家審查
defining an appropriate workflow
Context
Query
Prompt
LLM
Output
Vector DB
❶ Dataset
❸ Embedding
➍ Similarity
❷
➎ Reranking algorithm
➏
pip install llama-index
1. 安裝Python套件
●●●
2. 準備知識庫
支援各種檔案格式(參考官網文件)
3. 設定OpenAI API Key (可更換別的LLM)
2. 準備知識庫
data資料夾:5個PDF檔
範例程式
知識庫
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
3. 設定OpenAI API Key (可更換別的LLM)
輸入申請之API KEY
import os
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,)
def setCurrentWD():
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
setCurrentWD() # 設定工作目錄, 以免找不到data資料夾
# 1. Loading & Parsing
documents = SimpleDirectoryReader("data").load_data()
# 2. Indexing & vector store
index = VectorStoreIndex.from_documents(documents)
# 3. Query
query_engine = index.as_query_engine()
response = query_engine.query("Tell me about rag")
print(response)
response = query_engine.query("Tell me about rag")
RAG models leverage a retriever to retrieve text documents based on an input query and use them as additional context when generating a target sequence. These models have been shown to achieve state-of-the-art results on various tasks such as open Natural Questions, WebQuestions, CuratedTrec, MS-MARCO, Jeopardy question generation, and FEVER fact verification. RAG models generate responses that are more factual, specific, and diverse compared to baseline models like BART. The retrieval mechanism in RAG plays a key role in improving results across different tasks.
實際回應
出處
$0.12
<$0.01
Embedding (indexes) 可以只算一次
Embedding預訓練是不是MultiLingual?效果差很多!