-
Spotlight - On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
-
Contextual RNNs for Recommendation
Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task.
-
Safe Reinforcement Learning
Reinforcement learning agent learns to maximize the cumulative reward in dynamic environments without prior knowledge. During learning process, the agent is acting under uncertainty due to the finite amount of interactions with the environment. As a result, it is likely to execute sub-optimal actions that may lead to unsafe/poor states of the system. Thus, safety of RL algorithm is a primary concern for deployment in the real-world systems. In this talk, I will present a new family of safe RL algorithms that provide guarantees from poor decisions at finite time. We establish convergence of the proposed algorithms to the optimal policy and show that the safety guarantee achieved in exchange for slower convergence rate. Practically, the proposed algorithms are scalable to infinite state-action spaces and represent a simple modification to the standard policy iteration scheme. We demonstrate a significant reduction of low-performing states during learning on continuous control tasks.