1: \begin{abstract}
2:
3: This paper studies the synthesis of an active perception policy that maximizes the information leakage of the initial state in a stochastic system modeled as a hidden Markov model (HMM). Specifically, the emission function of the HMM is controllable with a set of perception or sensor query actions. Given the goal is to infer the initial state from partial observations in the HMM, we use Shannon conditional entropy as the planning objective and develop a novel policy gradient method with convergence guarantees. By leveraging a variant of observable operators in HMMs, we prove several important properties of the gradient of the conditional entropy with respect to the policy parameters, which allow efficient computation of the policy gradient and stable and fast convergence. We demonstrate the effectiveness of our solution by applying it to an inference problem in a stochastic grid world environment.
4:
5: % This paper studies a class of active perception planning problems in hidden Markov processes with controllable observations. In a scenario where an active perception agent (observer) can query sensors and a moving object whose dynamics are known to the agent. The agent's objective is to identify the object's initial state, represented by a random variable. The stochastic system is modeled as a hidden Markov model (HMM) with a controllable emission function. To quantify the uncertainty of the initial states, we use Shannon conditional entropy, which captures the information revealed by the observations.
6: % We then develop a novel policy gradient method to optimize the agent's perception policy. The policy gradient of conditional entropy is derived using observable operators, and we prove the applicability of the gradient descent algorithm when we have a proper policy parameterization. This gradient computation allows for stable and fast convergence.
7: \end{abstract}
8: