Mistral AI

What's the Hype?

Apart from their logo

A LOT of money has been raised
- Seed: $112M
- Series A: $415M
European Rival to OpenAI
Focus on being truly "Open"

Mixtral 8*7

Flagship Open Source model
Decoder Only Transformer
Released under Apache license
Outperforms Llama 2 and GPT 3.5 on most tasks
Instruction Tuned

Sparse Mixture of Experts

At the core is MoE.
It uses 8 experts.
Model is 7B.
Hence, 8*7.

Sparse Mixture of Experts

Uses gating mechanism for routing.
Router is simply a softmax layer.
Top K where K = 2.
Has a total of 46.7B parameters but has an active count of 13B

Performance

Comparisons primarily against Llama 2 - 70B and GPT 3.5
Much lower parameter count of 13B active parameters lends to faster inference

Made with Slides.com