Factored LP

Factored MDP -- Non-factored Value Function

Approximate!

V(s) = \sum_i w_i h_i(s)
V(s)=iwihi(s)V(s) = \sum_i w_i h_i(s)

Linear Regression

user defined

vars: w_1, ..., w_k, \phi
vars:w1,...,wk,ϕvars: w_1, ..., w_k, \phi

VI, PI try to move V outside of H

Bring it Back!

min: \phi
min:ϕmin: \phi
subj: \phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
subj:ϕbikwkcikiSsubj: \phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
  • k+1 vars good!
  • 2 |S| constraints bad!
\phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
ϕbikwkcikiS\phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
\phi \geq \max_i | b_i - \sum_k w_k c_{ik} |
ϕmaxibikwkcik\phi \geq \max_i | b_i - \sum_k w_k c_{ik} |
\phi \geq \max_i \sum_l b_l(\tilde{i}) - \sum_k w_k c_{k}(\tilde{i})
ϕmaxilbl(i~)kwkck(i~)\phi \geq \max_i \sum_l b_l(\tilde{i}) - \sum_k w_k c_{k}(\tilde{i})
\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})
ϕmaxim=l+kfm(i~)\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})
\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})
ϕmaxim=l+kfm(i~)\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})

Variable Elimination!

\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})
ϕmaxSi/1nfn(S~i/1)+maxS1qfq(S~)\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})
\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})
ϕmaxSi/1nfn(S~i/1)+maxS1qfq(S~)\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})

Example:

u_{s_2^1} \geq f_1(s_1^1, s_2^1)
us21f1(s11,s21)u_{s_2^1} \geq f_1(s_1^1, s_2^1)
u_{s_2^1} \geq f_1(s_1^2, s_2^1)
us21f1(s12,s21)u_{s_2^1} \geq f_1(s_1^2, s_2^1)
u_{s_2^2} \geq f_1(s_1^1, s_2^2)
us22f1(s11,s22)u_{s_2^2} \geq f_1(s_1^1, s_2^2)
u_{s_2^2} \geq f_1(s_1^2, s_2^2)
us22f1(s12,s22)u_{s_2^2} \geq f_1(s_1^2, s_2^2)
\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
ϕmaxSi/12nfn(S~i/12)+maxS2qfq(S~i/1)\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
u_{s_3^1} \geq f_2(s_2^1, s_3^1) + u_{s_2^1}
us31f2(s21,s31)+us21u_{s_3^1} \geq f_2(s_2^1, s_3^1) + u_{s_2^1}
\phi \geq \max_{S_{i/1-2-3}} \sum_{n} f_n(\tilde{S}_{i/1-2-3}) + \max_{S_3} \sum_q f_q(\tilde{S}_{i/1-2})
ϕmaxSi/123nfn(S~i/123)+maxS3qfq(S~i/12)\phi \geq \max_{S_{i/1-2-3}} \sum_{n} f_n(\tilde{S}_{i/1-2-3}) + \max_{S_3} \sum_q f_q(\tilde{S}_{i/1-2})
\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
ϕmaxSi/12nfn(S~i/12)+maxS2qfq(S~i/1)\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
u_{s_3^1} \geq f_2(s_2^2, s_3^1) + u_{s_2^2}
us31f2(s22,s31)+us22u_{s_3^1} \geq f_2(s_2^2, s_3^1) + u_{s_2^2}
u_{s_3^2} \geq f_2(s_2^1, s_3^2) + u_{s_2^1}
us32f2(s21,s32)+us21u_{s_3^2} \geq f_2(s_2^1, s_3^2) + u_{s_2^1}
u_{s_3^2} \geq f_2(s_2^2, s_3^2) + u_{s_2^2}
us32f2(s22,s32)+us22u_{s_3^2} \geq f_2(s_2^2, s_3^2) + u_{s_2^2}
\phi \geq u_{|S|}^{*}
ϕuS\phi \geq u_{|S|}^{*}

Factored LP

By svalorzen

Factored LP

  • 178
Loading comments...

More from svalorzen