Discussion points
How they work and possible security and privacy issues
Tokens can be though of as words, or parts of words, and other special characters that contain meaning.
Large Language Models (LLMs)
input prompt --> predict next token (~word)
Iterates until the next predicted token is a special "stop" token.
What an LLM produces can be controlled by:
Tokens can be though of as words, or parts of words, and other special characters that contain meaning.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Large language models (LLMs)
input prompt --> predict next token (~word)
Iterates until the next predicted token is a special "stop" token.
What an LLM produces can be controlled by:
Tokens can be though of as words, or parts of words, and other special characters that contain meaning.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Propensity...Large language models (LLMs)
input prompt --> predict next token (~word)
Iterates until the next predicted token is a special "stop" token.
What an LLM produces can be controlled by:
input prompt --> predict next token (~word)
Iterates until the next predicted token is a special "stop" token.
What an LLM produces can be controlled by:
Tokens can be though of as words, or parts of words, and other special characters that contain meaning.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Propensity score ...Large language models (LLMs)
Tokens can be though of as words, or parts of words, and other special characters that contain meaning.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Propensity Score Matching (PSM) is a statistical technique used to estimate the effect of a treatment or intervention in observational studies where randomization isn't possible. It works by matching treated individuals with untreated individuals who have similar propensity scores—the probability of receiving the treatment based on observed characteristics. StopTokenLarge language models (LLMs)
input prompt --> predict next token (~word)
Iterates until the next predicted token is a special "stop" token.
What an LLM produces can be controlled by:
The entire chat history (of the current session) is passed to the LLM each time it is asked to answer.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Propensity Score Matching (PSM) is a statistical technique used to estimate the effect of a treatment or intervention in observational studies where randomization isn't possible. It works by matching treated individuals with untreated individuals who have similar propensity scores—the probability of receiving the treatment based on observed characteristics.
User: I have a study here comparing patient care at Sahlgrenska and Karolinska, but a lot of the patients at Karolinska are very obese. Is PSM a good match?
Assistant: Well, it depends if BMI is a possible confounder for the treatment and outcome you are interested in.
User: Ahh, the outcome is myocardial infarction. I'm not sure if BMI could affect which hospital they get treatment at. It could just be that people in Stockholm are just more obese. Can you check and compare the average BMI in Gothenburg and Stockholm around 2011 when the study was performed?
Assistant:AI Chat
Documents, images, tables, etc are uploaded.
Converted to raw text and sliced into chunks, stored on a server and embedded into a vector database.
When user asks a question, relevant parts of the document are inserted into the input prompt as context/relevant information:
Uploaded files (or at least raw text versions) are stored over the duration of the conversation (until you delete it), or some max time (~30 days).
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
User: Can you explain propensity score matching to me?
Assistant: Propensity Score Matching (PSM) is a statistical technique used to estimate the effect of a treatment or intervention in observational studies where randomization isn't possible. It works by matching treated individuals with untreated individuals who have similar propensity scores—the probability of receiving the treatment based on observed characteristics.
Context: A study comparing cardiovascular outcomes in two hospitals, Sahlgrenska and Karolinska.
--------------------------------------------------
Characteristic | Sahlgrenska | Karolinska
Age, mean (SD) | 55 (3) | 65 (6)
BMI, mean (SD) | 29 (5) | 34 (4)
...
User: Would PSM be a good option for this study?
Retrieval Augmented Generation (RAG)
AI can generate calls to registered tools.
The server interacting with the LLM recognises a tool call and uses the tool, returning the results to the LLM who then uses this information to answer the users question.
Example: Enabled "web search".
The tool is called invisibly to the user. If web search is enabled, what you write in the prompt may be sent to Google or other search engines. The LLM can also use context from uploaded documents in its searches.
System: You are an assistant expert statistician. Users ask you questions and you are to answer professionally and accurately.
You only have information up to 2022. If the user asks you about anything past 2022, or if you don't have sufficient knowledge about what is asked, you may use the following tool to search the web.
{
"Tool" : "GoogleSearch",
"Description" : "A tool to search the web using Google.",
"Arguments" : {
"SearchPrompt" : "Your search prompt goes here",
},
"ResponseFormat" : "List of objects containing http-address and text-snippets"
}
User: What were the biggest advancements in propensity score matching in 2024?
Assistant: UseToolToken
{
"Tool" : "GoogleSearch",
"Arguments" : {
"SearchPrompt" : "Advancements in propensity score matching 2024"
}
}
// Recognising tool call, making search request, returning response
Tool:
{
"Response" : [
{
"Addr" : "http://propensity.score.haters.org/2024.index",
"Text" : "There were no advances... It's a dead field."
},
{
"Addr" : "http://propensity.score.news.org/2024.index",
"Text" : "Many advances were made in 2024, among other things..."
}
]
}
Assistant:
Tool calling
Integrated into IDEs like cursor or VS Code.
Typically inserts code of current document as context.
Allows LLMs to perform actions within the IDE through tool calling.
Tools can include:
System: You are an coding assistant. You provide suggestions for new code.
Context:
#########
# John Smith - Comparison of cardiovascular outcomes between two Swedish hospitals
# 2024-02-23
# Jens Michelsen
#########
proj_root <- "D:/ClientPath/Prog/"
out_path <- "C:/ClientPath/Out/"
# read data
data1 <- read(proj_root/"Confidential_data1.xlsx")
data2 <- read(proj_root/"Confidential_data2.xlsx")
names(data1)
# patient_ID, PNR, Age, Sex, BMI, Outcome
names(data2)
# patient_ID, PNR, Age, Sex, BMI, Outcome
# patient with PNR 1981-04-23-2916 should be removed due to erectile dysfunction
data1 <- filter(data1, PNR=="1981-04-23-2916")
User: Write a function that merges two medical datasets and sorts it by patient ID.
Assistant:Code assistants
IDE - integrated development environment
Programs that can generate entire projects based on the specification of the user.
One LLM plans the project and calls other LLMs to do specific tasks (creating folder structure, files of code, etc).
In principle Coding Agents have access to your entire system, including mapped drives.
Guardrails are implemented in different ways for different coding agents. Some specify what the LLM can/cannot do in the system prompt.
To facilitate work across multiple sessions, some context is stored on (locally or on AI servers) as long-term memory.
System: You are an coding agent. You plan and execute entier projects based on the users requests.
Break the project into smaller subtasks. You can call other LLMs to do subtasks by using the following tools:
{
...
}
Context:
Current working directory Z:/ClientPath/Projectpath/Prog/
Current folder structure:
-- Prog
|--R
|--Data
|--Doc
|-- Mail
|-- Confidential communication
User: Write a function that merges two medical datasets and sorts it by patient ID.
Assistant:Coding Agents
When using AI-tools we need to be aware of
what information the AI is given access to.
User agreements with AI companies do not necessarily ensure information is not stored, intercepted or shared with third parties.
Alternatives:
Conclusion
Discussion points