7f7ce4930004f4bc.tex
1: \begin{abstract}
2: This brief paper presents simple simulation-based algorithms for obtaining
3: an approximately optimal policy in a given finite set in large finite constrained Markov 
4: decision processes.
5: The algorithms are adapted from playing strategies for ``sleeping 
6: experts and bandits" problem and their computational complexities 
7: are independent of state and action space sizes if the given policy set is relatively small.
8: We establish convergence of their expected performances to the value of an optimal policy 
9: and convergence rates,
10: and also almost-sure convergence to an optimal policy with an exponential rate for 
11: the algorithm adapted within the context of sleeping experts.
12: \end{abstract}