Introduction
Contributions
Problem Formulation Sequence generation and refinement
Problem Formulation Sequence generation and refinement
Possible sequences
Vocabulary
Max seq length
Problem Formulation Sequence generation and refinement
Possible sequences
Actions
Vocabulary
Max seq length
: insert or delete
Problem Formulation Sequence generation and refinement
Environment
Possible sequences
Actions
Vocabulary
Max seq length
: insert or delete
Problem Formulation Sequence generation and refinement
Environment
Possible sequences
Actions
Vocabulary
Max seq length
Reward function
: insert or delete
: reward
(e.g. Levenshtein distance)
Problem Formulation Sequence generation and refinement
Environment
Possible sequences
Actions
Vocabulary
Max seq length
Input sequence (empty or incomplete)
Reward function
: insert or delete
: reward
(e.g. Levenshtein distance)
Problem Formulation Sequence generation and refinement
Agent is modelled by policy , it maps
a sequence over a distribution of actions
Problem Formulation Actions: Deletion & Insertion
<s>
</s>
Problem Formulation Actions: Deletion & Insertion
Deletion policy:
is a binary decision (0 or 1)
Can't delete start or end tokens :
Problem Formulation Actions: Deletion & Insertion
Predict the number of insertions: placeholder policy
a 2 phase insertion process
Predict the token of each insertion: token policy
Problem Formulation Actions: Deletion & Insertion
Policy alternate combination
Levenshtein Transformer Model
States from l-th block
: token embedding
: position embedding
Levenshtein Transformer Model
Levenshtein Transformer Model
Deletion classifier outputs 0 or 1
Levenshtein Transformer Model
Placeholder classifier outputs number of tokens to insert
Levenshtein Transformer Model
Token classifier predicts the likelihood of vocabulary tokens
Levenshtein Transformer Dual-policy Learning
Learn to imitate an expert policy
Objective:
and are state distributions
They suggest actions, then we optimise based on these actions
Levenshtein Transformer Dual-policy Learning
Learn to delete
if
else
Levenshtein Transformer Dual-policy Learning
Learn to insert
if
else
Levenshtein Transformer Dual-policy Learning
Expert policy
Oracle:
Levenshtein without substitution
Distillation: teacher model as expert policy, replace ground-truth by beam-search results
Levenshtein Transformer Dual-policy Learning
Levenshtein Transformer Inference
Greedy decoding
Training exit:
two consecutive iterations with same result
max number of iterations
Penalty for empty placeholders
Experiments