8e168e548acd1eb5.tex
1: \begin{abstract}
2: How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?
3: Standard RL algorithms assume a secure reward function, and can thus perform poorly in settings where agents can tamper with the reward-generating mechanism.
4: We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure.
5: For a natural class of corruption functions, decoupled approval algorithms have aligned incentives both at convergence and for their local updates.
6: Empirically, they also scale to complex 3D environments where tampering is possible.
7: \end{abstract}
8: