Mistral AI
What's the Hype?
Apart from their logo
A LOT of money has been raised
Seed:
$112M
Series A:
$415M
European Rival to OpenAI
Focus on being truly "Open"
Mixtral 8*7
Flagship Open Source model
Decoder Only Transformer
Released under Apache license
Outperforms Llama 2 and GPT 3.5 on most tasks
Instruction Tuned
Sparse Mixture of Experts
At the core is MoE.
It uses 8 experts.
Model is 7B.
Hence, 8*7.
Sparse Mixture of Experts
Uses gating mechanism for routing.
Router is simply a softmax layer.
Top K where K = 2.
Has a total of 46.7B parameters but has an active count of 13B
Performance
Comparisons primarily against Llama 2 - 70B and GPT 3.5
Much lower parameter count of 13B active parameters lends to faster inference
Made with Slides.com