1: \begin{abstract}
2: Online approximation of an optimal station keeping strategy for a
3: fully actuated six degrees-of-freedom autonomous underwater vehicle
4: is considered. The developed controller is an approximation of the
5: solution to a two player zero-sum game where the controller is the
6: minimizing player and an external disturbance is the maximizing player.
7: The solution is approximated using a reinforcement learning-based
8: actor-critic framework. The result guarantees uniformly ultimately
9: bounded (UUB) convergence of the states and UUB convergence of the
10: approximated policies to the optimal polices without the requirement
11: of persistence of excitation.
12: \end{abstract}