6ef3a3323165c9a2.tex
1: \begin{abstract}
2: This paper is devoted to studying constrained continuous-time Markov
3: decision processes (MDPs) in the class of randomized policies depending
4: on \textit{state histories}. The transition rates may be \textit{unbounded},
5: the reward and costs are admitted to be \textit{unbounded from above and
6: from below}, and the state and action spaces are Polish spaces. The
7: optimality criterion to be maximized is the expected discounted
8: rewards, and the constraints can be imposed on the expected discounted
9: costs. First, we give conditions for the nonexplosion of underlying
10: processes and the finiteness of the expected discounted rewards/costs.
11: Second, using a technique of occupation measures, we prove that the
12: constrained optimality of continuous-time MDPs can be transformed to an
13: \textit{equivalent} (optimality) problem over a class of probability
14: measures. Based on the equivalent problem and a so-called \textit{$\bar
15: w$-weak convergence} of probability measures developed in this paper,
16: we show the existence of a constrained optimal policy. Third, by
17: providing a linear programming formulation of the equivalent
18: problem, we show the solvability of constrained optimal policies.
19: Finally, we use two \textit{computable} examples to illustrate our main
20: results.
21: \end{abstract}