Reinforcement Learning

Convergence proof for actor-critic methods applied to ppo and rudder

We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms. Published at Transactions on Large-Scale Data-and Knowledge-Centered Systems XLVIII.

Align-RUDDER -- Learning From Few Demonstrations by Reward Redistribution

We generalise steerable E(3) equivariant graph neural networks such that node and edge updates are able to leverage covariant information. Published at ICLR 2022 (Oral).

RUDDER -- Return Decomposition for Delayed Rewards

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes. Published at NeurIPS 2019.