All decks Close
All decks 3
  • Spotlight - On the Convergence of Smooth Regularized Approximate Value Iteration Schemes

  • Contextual RNNs for Recommendation

    Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task.

  • Safe Reinforcement Learning

    Reinforcement learning agents learn to maximize cumulative reward in dynamic environments without prior knowledge. During the learning process, agents act under uncertainty due to finite number of interactions with the environment, often executing suboptimal actions that lead to unsafe system states. Consequently, ensuring safety is necessary for deploying RL algorithms in real-world systems. In this talk, I present safe RL algorithms that provide finite-time guarantees to the RL agent performance. We establish convergence of the proposed algorithms to the optimal policy and demonstrate that these safety guarantees are achieved at the cost of a slower convergence rate. The proposed algorithms scale to infinite state-action spaces and require only simple modification to the standard policy iteration scheme. We demonstrate significant reduction in low-performing states during training on continuous control tasks.