New model Qwen-VL
An open-source multi-modal large vision-language model (VLM) that is really good at reading handwriting.
Marshall Diary
Qwen has read the diary and given us good text to work from.
On to Istmina (EAP)
For the Istmina court materials, we will likely need to fine-tune the model on our materials. What does that mean? Why do it? How do we do it?
Attention and next word prediction
Vision Transformer (ViT)
mensual para accionistas
Vision encoder
Text decoder
Intermediate representations (hidden state)
[1.2223,0.3343,3.2314,5.3343...]
https://huggingface.co/spaces/GanymedeNil/Qwen2-VL-7B
2 billion parameter model
also 7 billion and 72 billion parameter variants
Fine-tuning Qwen2-VL