753c5ca4359ebad7.tex
1: \begin{abstract}
2: We consider a dynamic programming (DP) approach to approximately solving
3: an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed 
4: initial-state for the expected total discounted-reward criterion
5: with a uniform-feasibility constraint of the expected total discounted-cost
6: in a deterministic, history-independent, and stationary policy set.
7: We derive a DP-equation that recursively holds 
8: for a CMDP problem and its sub-CMDP problems, where
9: each problem, induced from the parameters of the original CMDP problem,
10: admits a uniformly-optimal feasible policy in its policy set associated 
11: with the inputs to the problem.
12: A policy constructed 
13: from the DP-equation is shown to achieve the optimal values, defined
14: for the CMDP problem the policy is a solution to, at all states.
15: Based on the result, we discuss off-line and on-line computational algorithms, 
16: motivated from policy iteration for MDPs, whose output sequences have 
17: local convergences for the original CMDP problem.
18: \end{abstract}