Reinforcement Learning

Convergence proof for actor-critic methods applied to ppo and rudder

We prove under commonly used assumptions the convergence of actor-critic reinforcement learning algorithms. Published at Transactions on Large-Scale Data-and Knowledge-Centered Systems XLVIII.

Align-RUDDER -- Learning From Few Demonstrations by Reward Redistribution

We generalise steerable E(3) equivariant graph neural networks such that node and edge updates are able to leverage covariant information. Published at ICLR 2022 (Oral).

General Deep Learning

After switching from High Energy Physics to Deep Learning, I started working in Reinforcement Learning before pivoting towards Associative Memories and modern Transformer networks. Recent years have shown that scalable ideas, improving the datasets, and clever engineering are the ingredients for ever better Deep Learning models. This totally coincides with my experience, and -- needless to say -- I will continue working on general large-scale Deep Learning directions.

RUDDER -- Return Decomposition for Delayed Rewards

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes. Published at NeurIPS 2019.