We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms. Published at Transactions on Large-Scale Data-and Knowledge-Centered Systems XLVIII.
We generalise steerable E(3) equivariant graph neural networks such that node and edge updates are able to leverage covariant information. Published at ICLR 2022 (Oral).
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes. Published at NeurIPS 2019.