1: \begin{abstract}
2: Let $A$ be a transition probability kernel on a finite state space $\Delta^o =\{1, \ldots , d\}$ such that $A(x,y)>0$ for all $x,y \in \Delta^o$. Consider a reinforced chain given as a sequence $\{X_n, \; n \in \NN_0\}$ of $\Delta^o$-valued random variables, defined recursively according to,
3: $$L^n = \frac{1}{n}\sum_{i=0}^{n-1} \delta_{X_i}, \;\; P(X_{n+1} \in \cdot \mid X_0, \ldots, X_n) = L^n A(\cdot).$$
4: We establish a large deviation principle for $\{L^n\}$. The rate function takes a strikingly different form than the Donsker-Varadhan rate function associated with the empirical measure of the Markov chain with transition kernel $A$ and is described in terms of a novel deterministic infinite horizon discounted cost control problem with an associated linear controlled dynamics and a nonlinear running cost involving the relative entropy function. Proofs are based on an analysis of time-reversal of controlled dynamics in representations for log-transforms of exponential moments, and on weak convergence methods.
5: \end{abstract}
6: