Enhancing Long-Term Memory using Hierarchical Aggregate Tree in
Large Language Models

Aadharsh Aadhithya

CB.EN.U4AIE20001

Project Review : MiniProject

Guides: Dr. Sachin Kumar S & Dr. Soman K.P

Problem Statement

  • Large Language Models are trained to complete sentences.
  • These models are finetuned on "Instruction Following datasets" to make it follow instructions and respond in human like manner.
  • Figuring out the right input to give to the model, to get the desired output, have turned out to be a "skill", and is often termed as prompt engineering.

Problem Statement

LLM

Prompt

Output

Problem Statement

LLM

Prompt

Output

LLMs are "closed". Only way to feed in information about external world is through the prompts.

Problem Statement

LLM

Prompt

Output

LLMs are "closed". Only way to feed in information about external world is through the prompts.

Problem Statement

LLM

Prompt

Output

LLMs are "closed". Only way to feed in information about external world is through the prompts.

However, amount of information you can put in prompts is limited by the model's "Context Length". 

Problem Statement

LLM

Prompt

Output

LLMs are "closed". Only way to feed in information about external world is through the prompts.

However, amount of information you can put in prompts is limited by the model's "Context Length". 

Hence there is a need to efficiently and smartly choose what to put in prompts based on user's input and past conversation.

Problem Statement

LLM

Prompt

Output

However, amount of information you can put in prompts is limited by the model's "Context Length". 

Hence there is a need to efficiently and smartly choose what to put in prompts based on user's input and past conversation.

For example: A long Open Domain conversation, we cannot put entire conversation in context (It might go for even years)

Existing Methods

Existing Methods

Retrieval
Agumented

Memory
Agumented

[1]Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented
language model pre-training. In ICLR.
[2] Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein-
rich Kuttler, Mike Lewis, Wen tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela. 2020.
Retrieval-augmented generation for knowledge- intensive nlp tasks. In NeurIPS.

[3]Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang. 2022b.
Long time no see! open-domain conversation with long-term persona memory. In Findings of ACL

- Some sort of retrieval system is employed to retrieve "Relevant chunk" and inject to prompt

- "Train" a memory management model

Existing Methods

Retrieval
Agumented

Memory
Agumented

[1]Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented
language model pre-training. In ICLR.
[2] Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein-
rich Kuttler, Mike Lewis, Wen tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela. 2020.
Retrieval-augmented generation for knowledge- intensive nlp tasks. In NeurIPS.

[3]Xinchao Xu, Zhibin Gou, Wenquan Wu, Zheng-Yu Niu, Hua Wu, Haifeng Wang, and Shihang Wang. 2022b.
Long time no see! open-domain conversation with long-term persona memory. In Findings of ACL

- Some sort of retrieval system is employed to retrieve "Relevant chunk" and inject to prompt

- "Train" a memory management model

- We should go over entire corpus

- We need "training data" which is expensive

Existing Methods

Related Work

* Preprint Just Released on 29th Aug,2023

* Today: November

Lots of these papers and preprints gets accepted in conferences like NeurIPS, ACL, ICLR,etc. That underscores the relevance of the current work

Existing Methods

Related Work

This work proposes sort of "Running Summary" of the information. 

Existing Methods

Related Work

This work proposes sort of "Running Summary" of the information. 

However, prone to forgetting finer details, if required. And not scalable to data's like books,novels,etc.

Methedology

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

The tree consists of "nodes"

HST of memory length 2

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

Attributes

Operations

  1. ID
  2. text
  3. prev_states
  4. parent
  5. aggregator
  6. children
  1. Update Parent
  2. Add child
  3. Remove Child
  4. Update Text
  5. Hash Children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

  1. ID - UID, generated using information of child nodes
  2. text - Content of this node
  3. prev_states - Hashmap/Dictionary of all childnodes this node have seen
  4. parent - Pointer to its parent
  5. aggregator - A function, that sets this node's property to aggregate of child nodes property
  6. children - List of child nodes

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

Attributes

Operations

  1. ID
  2. text
  3. prev_states
  4. parent
  5. aggregator
  6. children
  1. Update Parent
  2. Add child
  3. Remove Child
  4. Update Text
  5. Hash Children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

Operations

  1. Update Parent
  2. Add child
  3. Remove Child
  4. Update Text
  5. Hash Children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

Operations

  1. Update Parent
  2. Add child
  3. Remove Child
  4. Update Text
  5. Hash Children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Nodes in HAT

Operations

  1. Update Parent
  2. Add child
  3. Remove Child
  4. Update Text
  5. Hash Children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Attributes

Operations

  1. memory length
  2. aggregator
  3. layers
  4. num layers
  1. Insert Node

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Attributes

  1. memory length - Number of nodes that should be aggregated in each layer
  2. aggregator - Function takes in text aggregates and returns text
  3. layers - List of nodes layer wise

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Attributes

Operations

  1. memory length
  2. aggregator
  3. layers
  4. num layers
  1. Insert Node

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Operations

  1. Insert Node

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(node,L)

L.append(node)

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(node,L)

Check if Number of nodes is a multiple of memory length. If not, add it as a child to last node in the above layer

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(node,L)

Check if Number of nodes is a multiple of memory length. If yes, remove recent memory length childs from last node in above layer and add new node(np). call insert(np,L-1)

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(np,L-1)

Check if Number of nodes is a multiple of memory length. If yes, remove recent memory length childs from last node in above layer and add new node(np). call insert(np,L-1)

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(np,L-1)

This recursively goes on. If L==0, then create a new layer and add children

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(np,L-1)

This recursively goes on. If L==0, then create a new layer and add children

Note: Because we are hashing children, When we remove children, we need not compute the summaries again.

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Speaker 1 chat

Speaker 2 chat

Level 1 Summary

Level 2 Summary

Entire Summary

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

More Generally

Given a Task T, We should now find best strategy to traverse the tree

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

More Generally

Given a Task T, We should now find best strategy to traverse the tree

Actions at each state: Up,Down,Left,Right

states: nodes

Train an Agent? 

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

More Generally

Given a Task T, We should now find best strategy to traverse the tree

Actions at each state: Up,Down,Left,Right

states: nodes

Train an Agent? 

Baseline: GPT as Agent

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Baseline: GPT as Agent

memory

agent

context

user query

q

action

A

mem
manager

if A== "OK"

c

Retrieved context

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Baseline: GPT as Agent

c

memory

agent

context

user query

q

action

A

mem
manager

if A== "OK"

Retrieved context

Memory Module

q

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Baseline: GPT as Agent

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Baseline: GPT as Agent

Note: This framework can be used for a variety of tasks. However, now we concentrate on open domain conversations

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT for Conversations

Stratergy

Start from Root node. If you need details go down. If you need information from the past go right.

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

HAT

Insert(node)

Check if Number of nodes is a multiple of memory length. If yes, remove recent memory length childs from last node in above layer and add new node

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Prompt Used

You are an agent traversing a tree. Given a user message and details/personas from a conversation, your task is to determine if the context provided is sufficient to answer the user's question by the other speaker. You begin at the root node, which should contain a high-level summary of all information up to this point. If more information is needed, you can explore the tree, following the traversal history provided. Avoid circular paths and make sure to explore effectively based on the traversal history. If after 7-8 traversals the information remains unavailable, use the 'U' action. If you reach the bottom or edge of the tree, reset with the 'S' action. The tree's structure is as follows:

1) Tree depth corresponds to information resolution. Leaf nodes are individual sentences from chats, and higher nodes summarize their children. Descend for more details, ascend for high-level information.
2) Horizontal position reflects time. The rightmost node is the most recent. Depending on the query and information, decide whether to search in the past or future. Move left for past information, right for future information.

Layer '0' is the root. Layer number increases with depth, and node number increases to the right. Stay within the permitted limits of layers and nodes.

Pay attention to the 'err' field for feedback on your moves.

Your actions can only be:
- U: Up
- D: Down
- R: Right
- L: Left
- S: Reset (Return to root node)
- O: Current Information Sufficient to Respond
- U: Current Information Insufficient to Respond

#############
Traversal History: {self.memory.get_current_traversal()}
Latest Moves and Reasons: {self.memory.traversals[:-3]}
#############
Current Location: Layer: {self.memory.layer_num}, Node: {self.memory.node_num}
Total Layers: {len(self.memory.hst.layers)},
Total Nodes in Current Layer: {len(self.memory.hst.layers[self.memory.layer_num]) if (self.memory.layer_num < len(self.memory.hst.layers)) else 0}
       
#############
Details/Personas: {self.memory._get_memory()}
#############
User Message: {user_message}
############
Errors: {err}
############

Action:
Reason: (Tell reason why you chose the action)
"""
 

Methodology

I propose to use a new data structure called "Hierarchical Aggregate Tree" (HAT)

Prompt Used

You are in a natural conversation. Given with a memory and user query, respond (You are Speaker 2) such that you continue the conversation and

answer the user query if any from the memory. If memory is not sufficient to answer, tell i dont know.

 

###########

memory: {mem}

###########

user query: {user_query}

##########

Speaker 2:

Dataset and Evaluation

Dataset 

We use the "Multi Session Chat" dataset [5]

[5] Xu, Jing, Arthur Szlam, and Jason Weston. "Beyond goldfish memory: Long-term open-domain conversation." arXiv preprint arXiv:2107.07567 (2021).

Dataset 

We use the "Multi Session Chat" dataset [5]

[5] Xu, Jing, Arthur Szlam, and Jason Weston. "Beyond goldfish memory: Long-term open-domain conversation." arXiv preprint arXiv:2107.07567 (2021).

Metrics 

BLEU (Bilingual Evaluation Understudy)

\text{BLEU} = \text{BP} \cdot \exp\left(\sum_{n=1}^N w_n \log p_n\right)

weights

ratio of the number of n-grams in the candidate that match n-grams in any reference

\begin{cases} 1 & \text{if } c > r \\ e^{(1-r/c)} & \text{if } c \leq r \end{cases}

Brevity Penalty

Metrics 

F1

\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}

Precision:  proportion of words (or n-grams) in the candidate sentence that are also found in the reference sentence.

Recall: proportion of words (or n-grams) in the reference sentence that are also found in the candidate sentence

Metrics 

DISTINCT (Distinct-N)

Measures the diversity of the response

\text{DISTINCT-N} = \frac{|\{\text{Unique N-grams in the generated text}\}|}{|\{\text{Total N-grams in the generated text}\}|}

Results

Results

Text

  1. The MSC dataset has about 500 conversations in its Test set, spanning over 5 Sessions
  2. We Follow the Methodology described, and compare the summary generated at end of every session with the gold summary given in the dataset

Results

Text

F1 BLEU-1/2
Ours 38.88 37.98/16.18
[7] 25.35 28.52/11.63

[7]Wang, Qingyue, et al. "Recursively summarizing enables long-term dialogue memory in large language models." arXiv preprint arXiv:2308.15022 (2023).

* Results are reported just for Session 2.  Results for other sessions are yet to be obtained

DISTINCT-1/2
Ours 0.76/0.88

Results

Text

Results

Text

Beyond Numbers: A example

Speaker 1: What kind of car do you own? I have a jeep.

Speaker 2: I don't own my own car! I actually really enjoying walking and running, but then again, I live in a small town and semi-close to work.

Speaker 1: Ah I see! I like going to the gym to work out.

Speaker 2: I'm a computer programmer. What do you do for work?

Speaker 1: I work in marketing. Do you have any hobbies?

Speaker 2: I love to grill! I love meat, especially steak.

Speaker 1: Oh neat! I don't eat meat but I do like some grilled veggies.

Speaker 2: My family is from Alaska. Where are you from?

Speaker 1: I'm from New England. It must be cold up there.

Speaker 2: It's so cold! I think that's why I love wearing warm pants!

Speaker 1: Haha I feel you. I love wearing warm pants in the winter.

Speaker 2: Do you play any sports?

Session 2

Results

Text

Beyond Numbers: A example

Speaker 1: What kind of car do you own? I have a jeep.

Speaker 2: I don't own my own car! I actually really enjoying walking and running, but then again, I live in a small town and semi-close to work.

Speaker 1: Ah I see! I like going to the gym to work out.

Speaker 2: I'm a computer programmer. What do you do for work?

Speaker 1: I work in marketing. Do you have any hobbies?

Speaker 2: I love to grill! I love meat, especially steak.

Speaker 1: Oh neat! I don't eat meat but I do like some grilled veggies.

Speaker 2: My family is from Alaska. Where are you from?

Speaker 1: I'm from New England. It must be cold up there.

Speaker 2: It's so cold! I think that's why I love wearing warm pants!

Speaker 1: Haha I feel you. I love wearing warm pants in the winter.

Speaker 2: Do you play any sports?

Session 2

Session 3

Speaker 1:Did you manage to go for a run today?

Results

Text

Beyond Numbers: A example

Session 3

Speaker 1:Did you manage to go for a run today?

Note: No where did Speaker 2 Explicitly mentioned he went for a run today till now

Usual LLM Behaviour

Results

Text

Beyond Numbers: A example

Speaker 1:Did you manage to go for a run today?

Our Method

  1. Action: 'D' (Descending the tree)

    • Reason: The user is asking whether the other speaker went out on a run today. This requires exploring the more recent nodes in the tree for relevant information.
  2. Action: 'S' (Resetting and exploring the tree)

    • Reason: The current location in the tree does not contain specific information related to running, necessitating a reset to find relevant information.
  3. Action: 'D' (Descending the tree)

    • Reason: To find information about going for a run, descending the tree is necessary to explore previous messages.
  4. Action: '' (No specific action)

    • Reason: Not provided.
  5. Action: 'D' (Descending the tree)

    • Reason: To check if there is any mention of the other speaker going for a run, descending the tree is required to explore information in lower layers.
  6. Action: 'R' (Moving right in the tree)

    • Reason: Having already explored downwards and reset, moving to the right is necessary to find information possibly in the future.
  7. Action: 'D' (Descending the tree)

    • Reason: To find related information about going out on a run today, it's best to explore the lower nodes.
  8. Action: 'O' (OK with the current context)

    • Reason: Given the user's query about going out for a run and Speaker 2's persona, which includes enjoyment in walking and running, it is assumed Speaker 2 might have gone out for a run.
  • Layer: 3
  • Node: 1
  • Final Memory: "Speaker 2: I don't own my own car! I actually really enjoy walking and running, but then again, I live in a small town and am semi-close to work."

Results

Text

Beyond Numbers: A example

Speaker 1:Did you manage to go for a run today?

Our Method

  1. Action: 'D' (Descending the tree)

    • Reason: The user is asking whether the other speaker went out on a run today. This requires exploring the more recent nodes in the tree for relevant information.
  2. Action: 'S' (Resetting and exploring the tree)

    • Reason: The current location in the tree does not contain specific information related to running, necessitating a reset to find relevant information.
  3. Action: 'D' (Descending the tree)

    • Reason: To find information about going for a run, descending the tree is necessary to explore previous messages.
  4. Action: '' (No specific action)

    • Reason: Not provided.
  5. Action: 'D' (Descending the tree)

    • Reason: To check if there is any mention of the other speaker going for a run, descending the tree is required to explore information in lower layers.
  6. Action: 'R' (Moving right in the tree)

    • Reason: Having already explored downwards and reset, moving to the right is necessary to find information possibly in the future.
  7. Action: 'D' (Descending the tree)

    • Reason: To find related information about going out on a run today, it's best to explore the lower nodes.
  8. Action: 'O' (OK with the current context)

    • Reason: Given the user's query about going out for a run and Speaker 2's persona, which includes enjoyment in walking and running, it is assumed Speaker 2 might have gone out for a run.
  • Layer: 3
  • Node: 1
  • Final Memory: "Speaker 2: I don't own my own car! I actually really enjoy walking and running, but then again, I live in a small town and am semi-close to work."

Results

Text

Beyond Numbers: A example

Speaker 1:Did you manage to go for a run today?

Our Method

  1. Action: 'D' (Descending the tree)

    • Reason: The user is asking whether the other speaker went out on a run today. This requires exploring the more recent nodes in the tree for relevant information.
  2. Action: 'S' (Resetting and exploring the tree)

    • Reason: The current location in the tree does not contain specific information related to running, necessitating a reset to find relevant information.
  3. Action: 'D' (Descending the tree)

    • Reason: To find information about going for a run, descending the tree is necessary to explore previous messages.
  4. Action: '' (No specific action)

    • Reason: Not provided.
  5. Action: 'D' (Descending the tree)

    • Reason: To check if there is any mention of the other speaker going for a run, descending the tree is required to explore information in lower layers.
  6. Action: 'R' (Moving right in the tree)

    • Reason: Having already explored downwards and reset, moving to the right is necessary to find information possibly in the future.
  7. Action: 'D' (Descending the tree)

    • Reason: To find related information about going out on a run today, it's best to explore the lower nodes.
  8. Action: 'O' (OK with the current context)

    • Reason: Given the user's query about going out for a run and Speaker 2's persona, which includes enjoyment in walking and running, it is assumed Speaker 2 might have gone out for a run.
  • Layer: 3
  • Node: 1
  • Final Memory: "Speaker 2: I don't own my own car! I actually really enjoy walking and running, but then again, I live in a small town and am semi-close to work."

When we return this reason and context, LLM is more likely to tell yes I went for a run

Results

Text

Beyond Numbers: A example

Speaker 1:Did you manage to go for a run today?

Our Method

When we return this reason and context, LLM is more likely to tell yes I went for a run

Actual Response: 'Yes I actually was able too. I am considering joining the local gym. Do you prefer going to the gym?' (From Dataset)

Directions

Future Directions

  • Can we "Train" our agent for sequential Decision Making, instead of relying on GPT?
  • Now we compared just summaries. We should compare the actual responses and report that results
  • Try Different Strategies for exploring the tree

Thank You

deck

By Incredeble us

deck

  • 37