Jamba

28 Mar 2024

Varying batch size, a single 80 GB GPU, int8, 8K context, 512 output tokens.
Jamba allows processing of large batches, leading to a 3x increase in throughput over Mixtral despite having a similar number of active parameters.

More from Penut Chen(陳威廷)