1: \begin{abstract}
2: Existing automatic 3D image segmentation methods usually fail to meet the clinic use.
3: Many studies have explored an interactive strategy to improve the image segmentation performance by iteratively incorporating user hints.
4: However, the dynamic process for successive interactions is largely ignored.
5: We here propose to model the dynamic process of iterative interactive image segmentation as a Markov decision process (MDP) and solve it with reinforcement learning (RL).
6: Unfortunately, it is intractable to use single-agent RL for voxel-wise prediction due to the large exploration space.
7: To reduce the exploration space to a tractable size, we treat each voxel as an agent with a shared voxel-level behavior strategy so that it can be solved with multi-agent reinforcement learning.
8: An additional advantage of this multi-agent model is to capture the dependency among voxels for segmentation task.
9: Meanwhile, to enrich the information of previous segmentations, we reserve the prediction uncertainty in the state space of MDP and derive an adjustment action space leading to a more precise and finer segmentation.
10: In addition, to improve the efficiency of exploration, we design a relative cross-entropy gain-based reward to update the policy in a constrained direction.
11: Experimental results on various medical datasets have shown that our method significantly outperforms existing state-of-the-art methods, with the advantage of fewer interactions and a faster convergence.
12: \end{abstract}
13: