intelligent services and users may often
collaborate efficiently to achieve the user’s goals.
The paper lists 12 factors for effective integration of this "Collaboration"
"Look Out Service"
When invoked, Lookout parses the text in the body and
subject of an email message in focus and attempts to
identify a date and time associated with an event implied by the sender. The system then invokes Outlook’s calendaring subsystem, brings up the user’s online appointment book, and attempts to fill in relevant fields of an appointment record. The system displays its guesses to the user and allows the user to edit its guesses and to save the final result.
"Look Out Service"
Parses a message, date, etc
If Lookout cannot identify an implied date and time, the
system degrades its goal to identifying a span of time that is most relevant given the text of the message and then displays a scoped view of
the calendar to the user. The user can directly manipulate the proposed view and, if appropriate, go on to scheduleappointments manually.
Tries to reduce the number of interactions and complexity of navigation for the user
"Look Out Service"
Lookout processes the
header, subject, and body of the message and, based on this information, assigns a probability that a user would like to view the calendar or schedule an appointment, by
employing a probabilistic classification system that is
trained by watching the user working with email
Depending on the inferred probability-and on an
assessment of the expected costs and benefits of actionthe system decides to either
(1) do nothing but simply wait
for continued direct manipulation of Outlook or manual
invocation of Lookout
(2) to engage the user in a dialog
about his or her intentions with regards to providing a
service
(3) to go ahead and attempts to provide its
service by invoking its second phase of analysis.
"Look Out Service"
Multiple Interaction Modalities
1) Mannual Mode
2) Basic Automaed Assistance Mode
3) Social-Agent Modality
Also Handles Invocation Failures Gracefully...
"Look Out Service"
Inferring Beleifs about User's goals
Uses SVM for text classification / Intent Classification
training the system
on a set of messages that are calendar relevant and calendar
irrelevant. At runtime, for each email message being
reviewed, the linear SVM approximation procedure outputs
the likelihood that the user will wish to bring up a calendar
or schedule an appointment. The current version of
Lookout was trained initially on approximately 1000
messages, divided into 500 messages in the relevant and
500 irrelevant messages.
"Look Out Service"
Taking Actions based on beliefs
Autonomous actions
should be taken only when an agent believes that they will
have greater expected value than inaction for the user
"Look Out Service"
Taking Actions based on beliefs
Autonomous actions
should be taken only when an agent believes that they will
have greater expected value than inaction for the user
Prob. of goal given evidence
"Look Out Service"
Taking Actions based on beliefs
Autonomous actions
should be taken only when an agent believes that they will
have greater expected value than inaction for the user
Prob. of goal given evidence
"Look Out Service"
Prob. of goal given evidence
The best decision to make
at any value of p(GIE) is the action associated with the
greatest expected utility at that likelihood of the user having
the goal.
The best decision to make at any value of p(GIE) is the action associated with the greatest expected utility at that likelihood of the user having
the goal.
Dialogue as an Option
Timing of Service
They found the relationship between message size and the preferred time for deferring offers of service can be approximated by a sigmoid function
In the general case, we can construct a model of attention
from such timing studies and make the utility of outcomes
time-dependent functions of message length. Alternatively,
we can use timing information separately to defer service
until a user is likely ready to receive it.
Life Long Learning
Stores user emails and actions to retrain model periodically.
Questions?
Consider This Situation....
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Consider This Situation....
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
To answer questions related to error, you might need some extra information so that you can answer faithfully.
Consider This Situation....
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
To answer questions related to error, you might need some extra information so that you can answer faithfully.
You can ask....
a) What version of ubuntu you have?
b) What is the make of your wifi card?
c) Are you running ubuntu 14.10 kernel 4.4.0-59.....?
Consider This Situation....
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
To answer questions related to error, you might need some extra information so that you can answer faithfully.
You can ask....
a) What version of ubuntu you have?
b) What is the make of your wifi card?
c) Are you running ubuntu 14.10 kernel 4.4.0-59.....?
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
Suppose there is some "Utility Gain", U(.) because of adding the elicited answer to the problem
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
I am aiming to Install ape, A simple code for Pseudopotential generation, I am having this error message....
Different Questions Elicit Different Information
Expected Value of Perfect Information
Method
1. Generate set of candidate Q and A
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
1. Generate set of candidate Q and A
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
1. Generate set of candidate Q and A
VDB
Method
1. Generate set of candidate Q and A
VDB
consider the questions
asked to these 10 posts as our set of question candidates Q
Method
1. Generate set of candidate Q and A
VDB
consider the questions
asked to these 10 posts as our set of question candidates Q
edits made to the posts in response to the questions as our set of answer candidates A
consider the questions
asked to these 10 posts as our set of question candidates Q
Method
1. Generate set of candidate Q and A
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
High probability if, similar answer coming from similar question
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
How is Theta obtained?
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
How is Theta obtained?
(From Step 1)
Method
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
Method
1. Generate set of candidate Q and A
2. Given post p and Question q_i, how likely is this question to be answered using one of our answer candidates
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
3. Given Post P and answer candidate a_j, calculate U(p+a_j)
Method
Some Specifics...
Method
Evaluation
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
inherent knowledge stored within
LLMs have conflict understanding about the query
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
inherent knowledge stored within
LLMs have conflict understanding about the query
Unfamiliar
Contradiction
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
inherent knowledge stored within
LLMs have conflict understanding about the query
Unfamiliar
Contradiction
ALCUNA dataset, contain new entities fabricated by modifying
existing ones.
lassify the queries containing
new entities as ambiguous, while the rest are unambiguous. Subsequently, Also instruct GPT-4 to
generate a clarifying question for each ambiguousquery, focusing on the ambiguity of new entities
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
inherent knowledge stored within
LLMs have conflict understanding about the query
Unfamiliar
Contradiction
AmbiTask dataset to provide
ambiguous queries, which encodes contradiction
among queries and provided examples. Additionally, we create clarifying questions for ambiguous
queries by rule-based templates and manually transform ambiguous queries into unambiguous ones by
resolving contradictions
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when a word, phrase, or statement can
be interpreted in multiple ways due to its imprecise
or unclear meaning
Lexical
Semantic
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when a word, phrase, or statement can
be interpreted in multiple ways due to its imprecise
or unclear meaning
Lexical
Semantic
AmbER (Chen et al., 2021) and AmbiPun dataset (Mittal et al., 2022), which contain ambiguous entity names and ambiguous polysemy words
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when a word, phrase, or statement can
be interpreted in multiple ways due to its imprecise
or unclear meaning
Lexical
Semantic
AmbiCoref dataset , which consists of minimal pairs featuring ambiguous and unambiguous referents
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when the input is well-formed but the output contains potential confusion due to the lack of
essential elements
Who
Where
When
What
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when the input is well-formed but the output contains potential confusion due to the lack of
essential elements
Who
Where
When
What
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when the input is well-formed but the output contains potential confusion due to the lack of
essential elements
Who
Where
When
What
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
when the input is well-formed but the output contains potential confusion due to the lack of
essential elements
Who
Where
When
What
Introduces Taxonomy encompassing Three Dimensions
1. Epistemic Misalignment
2. Linguistic Ambiguity
3. Aleatoric Output
Introduces Taxonomy encompassing Three Dimensions
Evaluation: Task 1 Identifying Ambiguity
Evaluation: Task 1 Identifying Ambiguity
Evaluation: Task 1 Identifying Ambiguity
Evaluation: Task 1 Identifying Ambiguity
Evaluation: Task 1 Identifying Ambiguity
Task 2: Asking Clarifying Questions
Task 2: Asking Clarifying Questions
ChatGPT demonstrating its superior capabilities in generating clarifying questions compared
to small-scale LLMs
Task 2: Asking Clarifying Questions
Task 2: Asking Clarifying Questions
Please solve the math problem: Janet had some eggs (variable 𝑥0) and ate one (variable𝑥1). How many eggs does she have now (target variable 𝑦)?
We need to know the value of x0 to compute y
Constructed by using backwards search to obtain
1) a set of all possible variable
assignments that would imply 𝑦, and
2) another set for ¬𝑦. We take the
cross product between the sets and
identify pairs which differ on a single variable assignment, meaning assigning that variable deterministically
implies either 𝑦 or ¬𝑦
Logic-Q
Planning-Q
goal is to rearrange a set of blocks from an initial state to a goal state
constructed Planning-Q by deriving all
possible initial states from which there is a single shortest path to the goal through backwards search,
then removing up to one atom.
GSM-Q and GSME-Q
To construct GSM-Q/GSME-Q dataset out of GSM-Plus, they use human annotators to 1) check word problems for semantic ambiguity,
2) translate each word problem into a CSP
Results
Generally, all models tested struggled to perform beyond 50% on our Logic-Q and Planning-Q
domains.
CoT, Few shot did not help
Results
Generally, all models we tested struggled to perform beyond 50% on our Logic-Q and Planning-Q
domains.
CoT, Few shot did not help
Results
We can approximately quantify the difficulty of each problem in QuestBench based on the
runtime of each search algorithm on that problem
Results
Observe mostly negative correlation
Results
Is asking the right question harder than solving the problem?
Results
Is asking the right question harder than solving the problem?
Results
Is asking the right question harder than solving the problem?
Results
Can LLMs detect underspecification?
Results
Can LLMs detect underspecification?