abstract:7f7ce4930004f4bc.tex

1: \begin{abstract}

2: This brief paper presents simple simulation-based algorithms for obtaining

3: an approximately optimal policy in a given finite set in large finite constrained Markov

4: decision processes.

5: The algorithms are adapted from playing strategies for ``sleeping

6: experts and bandits" problem and their computational complexities

7: are independent of state and action space sizes if the given policy set is relatively small.

8: We establish convergence of their expected performances to the value of an optimal policy

9: and convergence rates,

10: and also almost-sure convergence to an optimal policy with an exponential rate for

11: the algorithm adapted within the context of sleeping experts.

12: \end{abstract}