1: \begin{abstract}
2: This brief paper presents simple simulation-based algorithms for obtaining
3: an approximately optimal policy in a given finite set in large finite constrained Markov
4: decision processes.
5: The algorithms are adapted from playing strategies for ``sleeping
6: experts and bandits" problem and their computational complexities
7: are independent of state and action space sizes if the given policy set is relatively small.
8: We establish convergence of their expected performances to the value of an optimal policy
9: and convergence rates,
10: and also almost-sure convergence to an optimal policy with an exponential rate for
11: the algorithm adapted within the context of sleeping experts.
12: \end{abstract}