How do LLMs Predict?
Jason
The HOOK
HOOK IDEA 1
HOOK IDEA 2
HOOK IDEA 3
HOOK IDEA 4
HOOK IDEA 5
HOOK IDEA 6
HOOK IDEA 7
HOOK IDEA 8
HOOK IDEA 9
HOOK IDEA 10
Generate at least 10 possible hooks. You generally don't know what will work until you've exhausted the obvious and started to explore uncharted territory

Storyboard Your Narrative

the hook
scene 1
scene 2
scene 3
scene 4
the takeaway
Review Transformers:
Transformer
Phrase 1
Phrase 2
Phrase 3
Phrase n
.........
Token 1
Token 2
Token 3
Token n
.........
Logits 1
Logits 2
Logits 3
Logits m
.........
Now, Logits...?
- the number of logits ("m") matches the vocabulary size of the model
- each logit contains information of how possible a specific phrase should show up in the next position
- the greater the logit value, the more possible the phrase will take place in the next position
The SoftMax Function

source: https://www.geeksforgeeks.org/deep-learning/the-role-of-softmax-in-neural-networks-detailed-explanation-and-applications/
- normalizes the logits to values 0-1 that add up to 1, each representing the probability of specific vocabulary to occur
- brings logits to the same scale for comparison and analyses
- ensures positive values for manipulation and optimization (Cross-entropy loss, for example)
The Complete Transformer Architecture:
Transformer
Phrase 1
Phrase 2
Phrase 3
Phrase n
.........
Token 1
Token 2
Token 3
Token n
.........
Logits 1
Logits 2
Logits 3
Logits m
.........
Prob* 1
Prob 2
Prob 3
Prob m
.........
*Prob as in "Probability"
SoftMax
max( ) = Prob x
Prob 1, Prob 2, Prob 3,..., Prob m
Token x will be the next token!
Phrase x will be the next phrase of the sentence!
Yes, we have the next word...
but what about the word after that?
Autoregression: What makes transformers strong
The model makes the next prediction based on the outcome of the previous one
N * Phrases

(transformer)
The new predicted phrase
N * Phrases += new predicted phrase
(N+1) * Phrases

(transformer)
Another new predicted phrase
①
②
③
The Autoregression goes on and on, predicting one new phrase at a time, and...
Voila! We have predicted an entire sentence!



Jason
By Dan Ryan
Jason
- 7