1: \begin{abstract}
2: %
3: The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not efficiently capture the setting where the set of available decisions (actions) at each time step is stochastic.
4: %
5: %
6: Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which better captures the concept of a stochastic action set.
7: %
8: In this paper we argue that existing RL algorithms for SAS-MDPs can suffer from potential divergence issues, and present new policy gradient algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence.
9: %
10: We conclude with experiments that demonstrate the practicality of our approaches on tasks inspired by real-life use cases wherein the action set is stochastic.
11: %
12: \end{abstract}
13: