abstract:98ac860ed9d2ec87.tex

1: \begin{abstract}

2: In order for an agent to perform well in partially observable domains, it is

3: usually necessary for actions to depend on the history of

4: observations.  In this paper, we explore a {\it stigmergic} approach,

5: in which the agent's actions include the ability to set and clear bits

6: in an external memory, and the external memory is included as part of

7: the input to the agent.  In this case, we need to learn a reactive

8: policy in a highly non-Markovian domain.  We explore two algorithms:

9: {\sc sarsa}$(\lambda)$, which has had empirical success in partially

10: observable domains, and {\sc vaps}, a new algorithm due to Baird and

11: Moore, with convergence guarantees in partially observable domains.

12: We compare the performance of these two algorithms on benchmark problems.

13: \end{abstract}

14: