abstract:820b22d90dc69ac0.tex

1: \begin{abstract}

2: We consider the problem of finding a control policy for a Markov

3: Decision Process (MDP) to maximize the probability of reaching some

4: states while avoiding some other states.  This problem is motivated by

5: applications in robotics, where such problems naturally arise when

6: probabilistic models of robot motion are required to satisfy temporal

7: logic task specifications.  We transform this problem into a Stochastic

8: Shortest Path (SSP) problem and develop a new approximate dynamic

9: programming algorithm to solve it. This algorithm is of the actor-critic

10: type and uses a least-square temporal difference learning method. It

11: operates on sample paths of the system and optimizes the policy within a

12: pre-specified class parameterized by a parsimonious set of

13: parameters. We show its convergence to a policy corresponding to a

14: stationary point in the parameters' space.  Simulation results confirm

15: the effectiveness of the proposed solution.

16: \end{abstract}

17: