Artyom Sorokin\(^{1,2}\) Nazar Buzun\(^{1,3}\) Alexander Anokhin\(^{2}\) Egor Vedernikov\(^{2}\) Petr Anokhin\(^{1}\) Mikhail Burtsev\(^{4}\) Evgeny Burnaev\(^{1,2}\)
\(^1\)AXXX, Moscow, Russia
\(^2\)Applied AI Institute, Moscow, Russia,
\(^3\)Innopolis University, Innopolis, Russia
\(^4\)London Institute for Mathematical Sciences, London, UK
\(^1\)AXXX, Moscow, Russia
\(^2\)Applied AI Institute, Moscow, Russia
\(^3\)Innopolis University, Innopolis, Russia
\(^4\)London Institute for Mathematical Sciences, London, UK
Artyom Sorokin\(^{1,2}\)
Nazar Buzun\(^{1,3}\)
Alexander Anokhin\(^{2}\)
Egor Vedernikov\(^{2}\)
Petr Anokhin\(^{1}\)
Mikhail Burtsev\(^{4}\)
Evgeny Burnaev\(^{1,2}\)
Long-Context is still challenging for Large Language Models
Common approaches to addressing the long-context challenge:
Long-Context is still challenging for Large Language Models
Common approaches to addressing the long-context challenge:
Long-Context is still challenging for Large Language Models
Common approaches to addressing the long-context challenge:
Popular direction for multi-step RAG is to fine-tune an LLM to use retrieval as a tool
But fine-tuning LLMs can be expensive :(
Main Idea:
Multi-step retrieval as RL problem:
We train embedders to approximate \(Q\)-function with Inner product beween state embedding and chunk embedding:
NCE training: \(\langle s , a \rangle \rightarrow\) semantic similarity
Q-value approximation: \(\langle s , a \rangle \rightarrow\) usefulness of \(a\) given query \(s = prompt(q, a_{t-1}..., a_{0})\)
Max entropy value functions that encourage exploration:
Given \(Q_\theta\), the chunk selection probability is computed using a Boltzmann policy:
, where
Additional Q-RAG Details:
Additional Q-RAG Details:
Maps each chunk index to a continuous value, which is injected into the chunk embedding through RoPE.
Properties:
Q-RAG assigns each candidate chunk a continuous relative position \(\rho_t(i)\):
where \(b_{j} < i < b_{j+1}\)
\(b_{j}\) - closest retrieved chunk to the left
\(b_{j+1}\) - closest retrieved chunk to the right
BabiLong is commonsense and temporal reasoning benchmark for ultra-long contexts
RULER benchmark:
Results:
Results on Open-Domain QA
Q-RAG fine-tunes only the retriever/embedder and keeps the answering LLM frozen.
An open question for future work: