Acme Design is a full service design agency.
A context window is the maximum span of tokens (text and code) an LLM can consider at once
Including both the input prompt and output generated by the model
--> Hallucination ?
| Case | Example | Model | Pros | Cons |
|---|---|---|---|---|
| Quality & Reasoning | Advanced research assistants, complex code generation, highly nuanced content creation, medical/legal text analysis, strategic decision support,... | - OpenAI GPT-4o - OpenAI GPT-4 Turbo / Microsoft Azure GPT-4 Turbo |
- Highest quality - Large context size |
- High price - High latency |
| Complex Data & Long Conversations | Document summarization (long papers, books), legal discovery, large codebase analysis,... | - Google Gemini 1.5 Pro - Google Gemini 1.5 Flash |
- Massive context size - High quality |
- Medium price - Very high latency |
| Real-Time Responses & Budget-Conscious | Chatbots (especially high-volume customer service), interactive content generation, real-time code suggestions, dynamic content creation,... | - OpenAI GPT-4o mini | - Very cheap price - Low latency |
- Low context size - Medium quality |
No personalization at all. One model for everyone
Giving the model a short description of who the user is
Adjust the model’s internal behavior based on real user data (train it to understand a specific user’s preferences more deeply)
The idea of parameter-efficient finetuning techniques
| Method | Required Input | Concept | Advantages | Disadvantages |
|---|---|---|---|---|
| Baseline | Basic prompt | No personalization | Simple, fast | Not tailored to individual users |
| Prompt-based personalization | Prompt + user attributes | Use personal information inside the prompt | Easy to implement, no retraining needed | Limited effectiveness |
| Fine-tuning personalization | Personal data + model | Train a model specifically for each user | High effectiveness | Resource-intensive, requires personal data |
Criterias to stop the generated responses
Efficiency: Avoid wasting computational resources on generating unnecessary tokens
Quality: Generated output is relevant, coherent, and avoids excessive repetition
User Experience: can be significantly improved by controlling the length and format of responses
Limit the number of tokens the model can generate.
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
max_tokens=512
)
✅ Pros
Simple and effective.
Prevents infinite or excessively long outputs.
⚠️ Cons
Might cut off meaningful output mid-sentence.
Doesn’t adapt based on context or content.
Define one or more string sequences that, when generated, immediately stop further output.
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
stop=["foo", "bar"]
)
✅ Pros
Works well with tools/functions integration.
⚠️ Cons
Must know or enforce these sequences in the prompt.
Fragile if the model never emits the exact string.
Stop when the generated content semantically completes the task (e.g., completes a paragraph, finishes a function, etc.).
if response.endswith((".", "!", "?")) and not continuation_needed(response):
stop_generation = True
✅ Pros
Feels natural and human-like.
More flexible for open-ended generation.
⚠️ Cons
Harder to implement — requires post-processing or heuristics.
May need a second model or logic to evaluate “completion.”
Some LLMs can be guided with a schema — they stop generating when the function/tool-call is formed correctly.
{
"functions": [
{
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
}
}
}
]
}
✅ Pros
Great for building agentic systems.
Guarantees structured outputs.
⚠️ Cons
Overhead in defining function schemas.
Not for freeform text generation.
Repetition Penalties, Top-p, Temperature,...
openai.ChatCompletion.create(
model="gpt-4",
messages=chat_history,
temperature=0.7,
top_p=0.9,
presence_penalty=0.6
)
✅ Pros
More natural termination in creative writing or poetry.
Encourages diverse but coherent outputs.
⚠️ Cons
Indirect — you’re guiding rather than explicitly stopping.
Needs fine-tuning and experimentation.