1: \begin{abstract}
2: In order for an agent to perform well in partially observable domains, it is
3: usually necessary for actions to depend on the history of
4: observations. In this paper, we explore a {\it stigmergic} approach,
5: in which the agent's actions include the ability to set and clear bits
6: in an external memory, and the external memory is included as part of
7: the input to the agent. In this case, we need to learn a reactive
8: policy in a highly non-Markovian domain. We explore two algorithms:
9: {\sc sarsa}$(\lambda)$, which has had empirical success in partially
10: observable domains, and {\sc vaps}, a new algorithm due to Baird and
11: Moore, with convergence guarantees in partially observable domains.
12: We compare the performance of these two algorithms on benchmark problems.
13: \end{abstract}
14: