abstract:d7796999ca2272ac.tex

1: \begin{abstract}

2:

3: Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world model to simulate the potential outcomes of various actions. Inspired by this, we propose a novel multi-step reasoning framework for LLMs, referred to as \textbf{S}tructure-a\textbf{wa}re \textbf{P}lanning with Accurate World Model (\methodname{}). Unlike previous approaches that rely solely on Chain-of-Thought (CoT) reasoning in natural language, \methodname{} incorporates structural information to guide the reasoning process via a world model and provides a soft verification mechanism over the steps. Moreover, \methodname{} overcomes the challenge of accurate world state predictions in complex reasoning tasks by introducing a Generator-Discriminator architecture, which enables more reliable world modeling. Specifically, the generator predicts the next state, and the discriminator ensures alignment with the logical consistency required by the problem context. \methodname{} also encourages the policy model to explore a broad range of potential actions to prevent premature convergence. By resolving the bottlenecks of generation diversity for both actions and states using diversity-based modeling (DBM) and improving discrimination accuracy through contrastive ranking (CR), \methodname{} significantly enhances the reasoning performance of LLMs. We evaluate \methodname{} across diverse reasoning-intensive benchmarks including math reasoning, logical reasoning, and coding tasks. Extensive experiments demonstrate that \methodname{} achieves substantial improvements over the baselines and consistently outperforms existing methods\footnote{Code and data are available at \url{https://github.com/xiongsiheng/SWAP}.}.

4:

5: \end{abstract}

6: