miromind-ai/MiroMind-M1-RL-62K: (question, answer) for math reasoning
Dataset
Model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B: distillation model of R1
GRPO
4k match the performance of 62k.
By Yao