Factored LP
Factored MDP -- Non-factored Value Function
Approximate!
V(s) = \sum_i w_i h_i(s)
V(s)=∑iwihi(s)
Linear Regression
user defined
vars: w_1, ..., w_k, \phi
vars:w1,...,wk,ϕ
VI, PI try to move V outside of H
Bring it Back!
min: \phi
min:ϕ
subj: \phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
subj:ϕ≥∣bi−∑kwkcik∣i∈∣S∣
- k+1 vars good!
- 2 |S| constraints bad!
\phi \geq | b_i - \sum_k w_k c_{ik} | \quad i \in |S|
ϕ≥∣bi−∑kwkcik∣i∈∣S∣
\phi \geq \max_i | b_i - \sum_k w_k c_{ik} |
ϕ≥maxi∣bi−∑kwkcik∣
\phi \geq \max_i \sum_l b_l(\tilde{i}) - \sum_k w_k c_{k}(\tilde{i})
ϕ≥maxi∑lbl(i~)−∑kwkck(i~)
\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})
ϕ≥maxi∑m=l+kfm(i~)
\phi \geq \max_i \sum_{m=l+k} f_m(\tilde{i})
ϕ≥maxi∑m=l+kfm(i~)
Variable Elimination!
\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})
ϕ≥maxSi/1∑nfn(S~i/1)+maxS1∑qfq(S~)
\phi \geq \max_{S_{i/1}} \sum_{n} f_n(\tilde{S}_{i/1}) + \max_{S_1} \sum_q f_q(\tilde{S})
ϕ≥maxSi/1∑nfn(S~i/1)+maxS1∑qfq(S~)
Example:
u_{s_2^1} \geq f_1(s_1^1, s_2^1)
us21≥f1(s11,s21)
u_{s_2^1} \geq f_1(s_1^2, s_2^1)
us21≥f1(s12,s21)
u_{s_2^2} \geq f_1(s_1^1, s_2^2)
us22≥f1(s11,s22)
u_{s_2^2} \geq f_1(s_1^2, s_2^2)
us22≥f1(s12,s22)
\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
ϕ≥maxSi/1−2∑nfn(S~i/1−2)+maxS2∑qfq(S~i/1)
u_{s_3^1} \geq f_2(s_2^1, s_3^1) + u_{s_2^1}
us31≥f2(s21,s31)+us21
\phi \geq \max_{S_{i/1-2-3}} \sum_{n} f_n(\tilde{S}_{i/1-2-3}) + \max_{S_3} \sum_q f_q(\tilde{S}_{i/1-2})
ϕ≥maxSi/1−2−3∑nfn(S~i/1−2−3)+maxS3∑qfq(S~i/1−2)
\phi \geq \max_{S_{i/1-2}} \sum_{n} f_n(\tilde{S}_{i/1-2}) + \max_{S_2} \sum_q f_q(\tilde{S}_{i/1})
ϕ≥maxSi/1−2∑nfn(S~i/1−2)+maxS2∑qfq(S~i/1)
u_{s_3^1} \geq f_2(s_2^2, s_3^1) + u_{s_2^2}
us31≥f2(s22,s31)+us22
u_{s_3^2} \geq f_2(s_2^1, s_3^2) + u_{s_2^1}
us32≥f2(s21,s32)+us21
u_{s_3^2} \geq f_2(s_2^2, s_3^2) + u_{s_2^2}
us32≥f2(s22,s32)+us22
\phi \geq u_{|S|}^{*}
ϕ≥u∣S∣∗
Factored LP
By svalorzen
Factored LP
- 812