Practical Application of Chatbot

RAG Application

Application

Prerequisite

  • Windows System with GPU RTX 4060-8G
    • 安裝顯卡驅動程式
    • 確認GPU的Compute Capability >= 7
  • Windows Subsystem for Linux version 2(WSL 2)
  • 安裝CUDA Toolkit for WSL 2
    • 確認linux環境設定: 須能找到nvcc,及CUDA函式庫
  • 安裝python
  • 安裝vs code(含python, wsl套件) (optional)
  • 其他

Using word embeddings in spaCy

Industrial-Strength Natural Language Processing
in Python

Word Embedding in spaCy

  • 300-dimensional, several languages learned from large corpora

下載的語言模型

Word Embedding in spaCy

Word Embedding in spaCy

python -m spacy download en_core_web_lg

安裝語言模型(英文, web, large)

pip install spacy

安裝python模組

python -m spacy download zh_core_web_md

或其他語言模型(中文, web, middle)

# 檢查spacy安裝資訊
python -m spacy info

Word Embedding in spaCyipynb範例

# (1)Import spacy
import spacy

# 載入語言模型
nlp_lg = spacy.load('en_core_web_lg')
# (2)Define example sentence
text = "The Shiba Inu is a dog that is more like a cat."

# Feed example sentence to the language model under 'nlp_lg'
doc = nlp_lg(text)

# Call the variable to examine the output
doc
# (3)Retrieve the second Token in the Doc at index 1, and 
# the first 30 dimensions of its vector representation
doc[1].vector[:30]

範例: 使用spaCy, 計算相似度

Word Embedding in spaCyipynb範例

# (4)Assing 5th and 11th items in the Doc into variables
dog = doc[5]
cat = doc[11]

# Compare the similarity between Tokens 'dog' and 'cat'
dog.similarity(cat)
# (5)Feed "snake" to the model; store result under 'snake'
snake = nlp_lg("snake")

# Compare the similarity of 'snake' and 'dog'
snake.similarity(dog)
# (6)Feed "car" to the model and calculate similarity
snake.similarity(nlp_lg("car"))

Word Embedding in spaCyipynb範例

# (7)Call the variable to examine the output again
doc
# (8)The vector for this *Doc* object
doc.vector
# (9)Retrieve the 'shape' attribute for the vector
doc.vector.shape
# (10)Get the noun chunks under the attribute 'noun_chunks'. 
# we cast the output into a list named 'n_chunks'.
n_chunks = list(doc.noun_chunks)

# Call the variable to examine the output
n_chunks

Word Embedding in spaCyipynb範例

# (11)Get the shape of the vector for the first noun 
# chunk in the list
n_chunks[0].vector.shape
# (12)Compare the similarity of the two noun chunks
n_chunks[0].similarity(n_chunks[1])

Word Embedding in spaCy

import spacy
import json

nlp = spacy.load('en_core_web_lg') # 模型
# 讀取檔案, 共5個英文文字檔
pdf_list = ['Product_1.txt','Product_2.txt','Product_3.txt','Product_4.txt','Product_5.txt']
descriptions = []
for file in pdf_list:
    with open(f'data/{file}','r', encoding='utf-8') as f:
        text = f.read()
    descriptions.append(text) 

description_vectors_list = []

for description in descriptions:
    doc = nlp(description)    # 產生分析結果
    reduced_vector = doc.vector[:128].tolist()  #取出vector部分
    entry = {"vector": reduced_vector, "text": description}
    description_vectors_list.append(entry)

with open('Product_data.json', 'w') as json_file:
    json.dump(description_vectors_list, json_file, indent=2)

print("JSON file created: Product_data.json")

spacy_embedding.py

Word Embedding in spaCy

spacy_embedding.py

在 Docker  Desktop中執行 Milvus(WSL 2環境)

  1. 開啟 Docker Desktop,以管理員身分執行
  2. Powershell 中下載指令碼,存入standalone.bat





  3. 執行下載的腳本,以 Docker 容器啟動 Milvus
Invoke-WebRequest https://raw.githubusercontent.com/milvus-io/milvus/refs/heads/master/scripts/standalone_embed.bat -OutFile standalone.bat
.\standalone.bat start

Word Embedding in spaCy

RAG Standlone Example

# (1) importing

from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, connections, MilvusException
import json
from ctransformers import AutoModelForCausalLM
import spacy
pip install pymilvus
pip install ctransformers
pip install spacy

修改llm呼叫的參考程式碼

Word Embedding in spaCy

# (2) Collection config
id = FieldSchema(
  name="id",
  dtype=DataType.INT64,
  is_primary=True,
  auto_id=True
)
text = FieldSchema(
  name="text",
  dtype=DataType.VARCHAR,
  max_length=5000,
  # The default value will be used if this field is left empty during data inserts or upserts.
  # The data type of `default_value` must be the same as that specified in `dtype`.
  default_value="Unknown"
)
vector = FieldSchema(
  name="vector",
  dtype=DataType.FLOAT_VECTOR,
  dim=128
)
schema = CollectionSchema(
  fields=[id, text, vector],
  enable_dynamic_field=True
)
collection_name = "Products"

Word Embedding in spaCy

# (3)connect to milvus server
connections.connect(host="localhost", port="19530")

Docker Desktop 已安裝並啟動milvus

# (4) Create collection
collection = Collection(
    name=collection_name,
    schema=schema,
    using='default', # Milvus server alias
    shards_num=2 # Number of data nodes to use
    )
# (5) Add index to vector
index_params = {
  "metric_type":"L2",
  "index_type":"IVF_FLAT",
  "params":{"nlist":2}
}
from pymilvus import Collection, utility
collection = Collection("Products") # Get an existing collection.     
collection.create_index(
  field_name="vector", 
  index_params=index_params
)
utility.index_building_progress("Products")

Word Embedding in spaCy

# (6) Get an existing collection and load
collection = Collection("Products")      
collection.load()

# Check the loading progress and loading status
utility.load_state("Products")
# Output: <LoadState: Loaded>

utility.loading_progress("Products")
# Output: {'loading_progress': 100%}
# (7) prepare data for dabase and upload
with open('Product_data.json','r') as f:
    products = json.load(f)  
data = []
text = []
vector = []
for ele in products:
    text.append(ele['text'])
    vector.append(ele['vector'])
data.append(text)
data.append(vector)
from pymilvus import Collection
collection = Collection("Products")      # Get an existing collection.
mr = collection.insert(data)

Word Embedding in spaCy

# (8) Lists existing collections
utility.list_collections()
# (9) Load LLM model
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
# llm = AutoModelForCausalLM.from_pretrained("llama-2-7b-chat.Q2_K.gguf", model_file="llama-2-7b-chat.Q2_K.gguf",model_type="llama", gpu_layers=0,context_length=4096)
from vllm import LLM
llm = LLM(model="facebook/opt-125m")
# Load the spaCy model for English language

spacy_model = spacy.load('en_core_web_lg')

# Connect to Milvus server
connections.connect(host="localhost", port="19530")

進行推論

Word Embedding in spaCy

## Run inference in loop for different queries
try:
    while True:
        user_input = input("\nDescribe query (or type 'exit' to quit):\n")
        print(f"User input : {user_input}\n")

        # Exit loop if user types 'exit'
        if user_input.lower() == 'exit':
            break

        # Process user input using spaCy model to get embedding vector
        user_input_doc = spacy_model(user_input)
        user_vector = user_input_doc.vector[:128].tolist()

        # Define search parameters for similarity search
        search_params = {
            "metric_type": "L2",
            "offset": 0, # Number of entities to skip during the search
            "ignore_growing": False, # Whether to ignore growing segments during similarity searches
            "params": {"nprobe": 1} # Number of clusters to search
        }

        # Connect to the Milvus collection named "Products"
        collection = Collection("Products")

        # Perform similarity search using Milvus
        similarity_search_result = collection.search(
            data=[user_vector],
            anns_field="vector",
            param=search_params,
            limit=1,
            output_fields=['text']
        )

        # Display search results to the user
        # for idx, hit in enumerate(similarity_search_result[0]):
        score = similarity_search_result[0][0].distance
        description = similarity_search_result[0][0].entity.text
        print(f"{description} \ndistance: {score}\n")

        prompt = f"Use context to answer the query.\n context : {description}\n query:{user_input}\n answer : "
        input_text = f"[INST] <<SYS>>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>[/INST] \n{prompt}"
        # print(f'LLM response : {llm(prompt)}\n')
        outputs = llm.generate(prompt)
        
        for output in outputs:
            prompt = output.prompt
            generate_text = output.outputs[0].text
            print(f'指令: {prompt!r}, 答案: {generate_text!r}')

except MilvusException as e:
    # Handle Milvus exceptions
    print(e)
finally:
    # Disconnect from Milvus server
    connections.disconnect(alias="localhost")

query = 'What is capacity and price of FrostBite Pro 3000 - Smart Refrigerator'

Application

RAG using Milvus

%pip install llama_index
# (1) 讀取檔案
from llama_index.core import SimpleDirectoryReader

# 設定資料夾路徑
path = "output_intros"
# 載入資料夾中所有txt檔
reader = SimpleDirectoryReader(input_dir=path, recursive=False, required_exts=[".txt"])
documents = reader.load_data()

# 顯示載入文件數量與預覽內容
print(f"Loaded {len(documents)} documents")
print(documents[0].text)
print(documents[0].metadata)

RAG using Milvus

%pip install torch
%pip install sentence_transformers
# (2) 使用HuggingFace免費的embedding模型
import torch
from sentence_transformers import SentenceTransformer

# 設定DEVICE, 如有GPU則使用GPU
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# 使用多語言E5模型
model_name = "intfloat/multilingual-e5-large"
encoder = SentenceTransformer(model_name)
encoder.to(DEVICE)

# 模型參數
EMBEDDING_DIM = encoder.get_sentence_embedding_dimension()
MAX_SEQ_LENGTH = encoder.get_max_seq_length()

# 顯示模型資訊
print(f'model_name: {model_name}')
print(f'EMBEDDING_DIM: {EMBEDDING_DIM}')
print(f'MAX_SEQ_LENGTH: {MAX_SEQ_LENGTH}')

RAG using Milvus

# (3) 將資料切分(chunking),建立embedding vector
# multilingual-e5-large max_seq_length是512所以chunk size不能超過
# 設定chunk size為512,但有10%的重疊
import numpy as np
from llama_index.core.node_parser import SentenceSplitter
from torch.nn.functional import normalize
from sentence_transformers import SentenceTransformer

# 設定 chunking 參數
CHUNK_SIZE = 512
CHUNK_OVERLAP = int(CHUNK_SIZE * 0.10)
print(f"chunk_size: {CHUNK_SIZE}, chunk_overlap: {CHUNK_OVERLAP}")

# 建立 chunk 分割器(Node Parser)
splitter = SentenceSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
nodes = splitter.get_nodes_from_documents(documents)
print(f"{len(documents)} docs split into {len(nodes)} chunks.")

# 轉換為文字串列(list)
texts = [node.text for node in nodes]

# 計算embedding vectors 並且做正規化(normalization)
embeddings = encoder.encode(sentences=texts, convert_to_tensor=True, device=DEVICE)
embeddings = normalize(embeddings, p=2, dim=1)
embeddings = embeddings.cpu().numpy().astype(np.float32)

# 整理為 dict list(包含 chunk 文字、來源、向量)
dict_list = []
for node, vector in zip(nodes, embeddings):
    chunk_dict = {
        'chunk': node.text,
        'source': node.metadata.get('file_path', ""),
        'vector': vector.tolist(),  # 向量轉為 list 方便序列化或儲存
    }
    dict_list.append(chunk_dict)

RAG using Milvus

# (4) 存入資料至資料庫
from pymilvus import MilvusClient
import time
import math
# 設定MilvusClient
mc = MilvusClient( uri="http://localhost:19530", token="")

# 定義collection名稱, 與embedding dim
COLLECTION_NAME = "MilvusDocs"

# 若已有 EMBEDDING_DIM 和 dict_list
mc.create_collection(
    collection_name=COLLECTION_NAME,
    dimension=EMBEDDING_DIM,
    consistency_level="Eventually",
    auto_id=True,
    overwrite=True
)
# 控制每批最多插入的筆數(例如:500 or 1000)
BATCH_SIZE = 500

num_vectors = len(dict_list)
num_batches = math.ceil(num_vectors / BATCH_SIZE)

print(f"Inserting {num_vectors} vectors in {num_batches} batches (batch size = {BATCH_SIZE})")

print("Start inserting entities...")
start_time = time.time()

# 插入資料(dict_list 應包含 'chunk', 'source', 'vector' 三欄)
for i in range(0, num_vectors, BATCH_SIZE):
    batch = dict_list[i: i + BATCH_SIZE]
    mc.insert(
        collection_name=COLLECTION_NAME,
        data=batch,
        progress_bar=False  # 避免太多小進度條
    )
    print(f"Inserted batch {i // BATCH_SIZE + 1}/{num_batches}")
end_time = time.time()
print(f"Total insert time: {round(end_time - start_time, 2)} seconds")
%pip install pymilvus

RAG using Milvus

# (5) 查詢
SAMPLE_QUESTION = "王建民的生平?"

query_embeddings = torch.tensor(encoder.encode(SAMPLE_QUESTION))
query_embeddings = normalize(query_embeddings, p=2, dim=0)
query_embeddings = query_embeddings.cpu().numpy().astype(np.float32)

# 選擇輸出的欄位(不包含向量)
OUTPUT_FIELDS = ["chunk", "source"]  # 或你自定欄位

# 搜尋 Top K 相似結果
TOP_K = 3
results = mc.search(
    collection_name=COLLECTION_NAME,
    data=[query_embeddings],               # 必須是 list of vectors
    output_fields=OUTPUT_FIELDS,
    limit=TOP_K,
    consistency_level="Eventually"
)
# 顯示結果
for i, hit in enumerate(results[0]):
    print(f"\n🔹 Top {i+1} result:")
    print(f"Score: {hit['distance']:.4f}")
    print(f"Chunk: {hit['entity']['chunk']}")
    print(f"Source: {hit['entity'].get('source', '')}")

RAG using Milvus

%pip install vllm transformers torch
import vllm, torch
from vllm import LLM, SamplingParams

# 清除GPU記憶體cache
torch.cuda.empty_cache()
# Check the GPU.
!nvidia-smi

登入HuggingFace, 取得新Token

from huggingface_hub import login
import os

# 這裡直接貼上你自己的 Hugging Face Access Token
hf_token = "hf_fYVWiFyGWpYnj---------PNtahgSHubNASn"

# 登入 HuggingFace
login(token=hf_token)
# login(token=os.environ["HF_TOKEN"])

RAG using Milvus

# 1. Choose a model
MODELTORUN = "p208p2002/llama-traditional-chinese-120M"


# 2. Clear the GPU memory cache, you're going to need it all!
torch.cuda.empty_cache()


# 3. Instantiate a vLLM model instance.
llm = LLM(model=MODELTORUN,
         enforce_eager=True,
         dtype=torch.bfloat16,
         gpu_memory_utilization=0.5,
         max_model_len=1000,
         seed=415,
         max_num_batched_tokens=3000)

以vllm啟動 https://huggingface.co/p208p2002/llama-traditional-chinese-120M

RAG using Milvus

# Separate all the context together by space.
contexts_combined = ' '.join(contexts)
# Lance Martin, LangChain, says put the best contexts at the end.
contexts_combined = ' '.join(reversed(contexts))


# Separate all the unique sources together by comma.
source_combined = ' '.join(reversed(list(dict.fromkeys(sources))))


SYSTEM_PROMPT = f"""First, check if the provided Context is relevant to
the user's question.  Second, only if the provided Context is strongly relevant, answer the question using the Context.  Otherwise, if the Context is not strongly relevant, answer the question without using the Context. 
Be clear, concise, relevant.  Answer clearly, in fewer than 2 sentences.
Grounding sources: {source_combined}
Context: {contexts_combined}
User's question: {SAMPLE_QUESTION}
"""

prompts = [SYSTEM_PROMPT]

# Sampling parameters
sampling_params = SamplingParams(temperature=0.2, top_p=0.95)
prompts = []

# Invoke the vLLM model.
outputs = llm.generate(prompts, sampling_params)


# Print the outputs.
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   # !r calls repr(), which prints a string inside quotes.
   print()
   print(f"Question: {SAMPLE_QUESTION!r}")
   pprint.pprint(f"Generated text: {generated_text!r}")

RAG查詢結果的運用(示意, 非可執行)

Practical Application of Chatbot

By Leuo-Hong Wang

Practical Application of Chatbot

  • 47