Understanding and Applying Embedding Vectors in AI Solutions

Rainer Stropek

Passionate software developer
IT-Entrepreneur, CoderDojo-Mentor, Teacher

software architects gmbh
rainer@software-architects.at
https://rainerstropek.me

https://bit.ly/oop-embeddings

What are Embedding Vectors?

  • On a scale between 0 and 10, how open are you to new experiences?
  • What about Agreeableness ("Verträglichkeit")?
  • Conscientiousness ("Pflichtbewusstsein")?
  • Extraversion ("Extrovertiertheit")?

100

100

  • We can use these personality embeddings for:
    • Sorting based on similarity
    • Clustering
    • etc.

Example: Personality Embeddings

Working With Vectors

  • Cosine simiarity
    • OpenAI Embeddings are normalized -> just dot product
  • Getting an embedding vector with OpenAI
POST https://oai-4.openai.azure.com/openai/deployments/embeddings/
	embeddings?api-version=2023-03-15-preview
api-key: {{$dotenv OAI_API_KEY}}

{
    "input": "Here is the text for which you want to calculate the vector",
    "model": "text-embedding-3-large"
}

Working With Vectors

Word/Phrase 1 Word/Phrase 2 Cosine Similarity
king queen 0.9154
king König 0.8854
king frog 0.8209
king programming language 0.7570

Embedding Vectors

  • LLMs can convert text into vectors
    • OpenAI: 1536 dimensions
  • Similarity of vector reflects similarity of text
    • Independent of language
    • Based on meaning, not technical difference of characters
  • Areas of application (examples)
    • Find duplicates
    • Clustering
    • Search for relevant data in databases (vector search)
  • Can be combined with other search techniques

OpenAI

ChatGPT

End User

  • Rebuilding ChatGPT is boring and meaningless
    • Exception: Customized ChatGPT version embedded in company network
  • What we want:
    • Specific behavior
    • Integration of app-specific logic and data (structured or unstructured)
  • Smart automation of processes

Your
Application

OpenAI
API 🔗

Application Programming Interface (API)

Model

Messages

(Chat History)

System, User, Assistant, Function

Functions

Options

OpenAI

Choices

Usage

Content filtering data
(Azure OpenAI 🔗)

POST https://oai-4.openai.azure.com/openai/deployments/
	complete-4/chat/completions?api-version=2023-09-01-preview
api-key: {{$dotenv OAI_API_KEY}}
Content-Type: application/json

{
    "messages": [
        {
            "role": "system",
            "content": "You are a salesperson at a car dealer."
        },
        {
            "role": "user",
            "content": "Hi! I am unsure if I should buy a van or a sports car. What do you think?"
        }
    ]
}

OpenAI

Application

End User

Retrieval Augmented Generation (RAG)

  • Query
  • Full-text search
  • Vector search
  • ...

Question,
Query,
Command,
...

DB

Prompt

RAG With C#

  • NuGet Packages
    • Azure.AI.OpenAI
    • Azure.Identity
    • Azure.Search.Documents
// Generate embeddings
async Task<float[]> GenerateEmbeddings(string text, OpenAIClient openAIClient)
{
    var response = await openAIClient.GetEmbeddingsAsync(
    	new EmbeddingsOptions(embeddingsDeployment, [text]));
    return response.Value.Data[0].Embedding.ToArray();
}

RAG With C#

// Generate the embedding for the query  
var queryEmbeddings = await GenerateEmbeddings(query, openAIClient);

// Perform the vector similarity search  
var searchOptions = new SearchOptions
{
    VectorSearch = new()
    {
        Queries =
        {
            new VectorizedQuery(queryEmbeddings)
            {
                Fields = { "contentVector" }, Exhaustive = true, KNearestNeighborsCount = 3
            }
        },
    }
};

SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);

Build prompt from results

Document Processing

Why Document Pre-Processing?

  • Split documents in segments to avoid hitting token limits
    • Handle large documents that would exceed token limits
  • Split documents in segments to reduce costs
    • Feed smaller text chunks to the LLM
    • Completion requests use fewer tokens, i.e. cost less
  •  Format conversion
    • Convert non-text formats (e.g. scans, PDFs) into text
    • ⚠️ Can be difficult because of complex layout 🔗

Segmentation Strategies

  • Fixed-length Segmentation
    • Simple, but naive
  • Semantic Segmentation
    • Sentences, paragraphs, sections
  • Topic-based segmentation
    • Split based on identified topics or thematic shifts
  • Query-based segmentation
    • Useful if typical queries are known upfront
  • Hybrid approaches
    • Combine some of the strategies mentioned above

Challenges and Solutions

  • Context Preservation
    • Include key information at the beginning of each new segment
    • E.g. titles, tags, keywords, summaries
Living with the Volkswagen ID.4

[...]

First impressions first

The ID.4 we’ve been testing was in the entry-level Life trim level but fitted with the bigger 77kWh 
battery pack and the mid-level 204hp single electric motor of the Pro Performance drivetrain. 
At £44,480 on the road – and £47,595 with options as tested – it’s towards the lower end of 
what you can spend on one of these.

[...]

How difficult was it to keep charged?

[...] This in combination with the increasingly colder weather – no EV likes the cold – saw 
the theoretical real-world range drop from around 260 miles when the ID.4 arrived to around 220 miles 
after a few weeks of my custodianship. A far cry from the WLTP claim of 328 miles. [...]

Challenges and Solutions

  • Anaphora 🔗 Preservation
    • Preprocess using LLMs (example 🔗)
[...]

What is the ID.4 like to live with then?

Day-to-day, it’s largely been brilliant. There is something undeniably conscience-easing 
about pottering about the place in an electric vehicle, especially in Cambridge, among all 
the cyclists and electric scooters. I do a lot of short urban journeys ferrying the kiddo 
around, and not having to fire up a combustion engine – particularly on trips so short they 
never even get through the warm-up phase – has been wonderful. But that, of course, is a 
plus point you can apply to almost any EV.

[...]

How difficult was it to keep charged?

[...]

Journeys outside of this routine required a little more thinking ahead. Given I figured it 
best to try and avoid stopping for 40 minutes to charge while travelling with the tiny terror, 
this often meant making a dedicated trip to charge the car at Cambridge Services ahead of time.

[...]

Challenges and Solutions

  • Add overlaps
    • End of one segment slightly overlaps with the beginning of the next
  • Knowledge Base Integration
    • Reference external knowledge base with essential information
  • Document-level processing
    • Possible because of high token limits of new LLMs
    • Resource and cost-intensive
  • Pre-selecting documents based on titles, abstracts, and/or keywords
    • Make document-level processing possible
    • Reduce computational load
  • Manual pre-processing

What About OpenAI GPTs
and Assistants?

Thank you for your attention!