d3391f8d167ddfda.tex
1: \begin{abstract}
2: This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour.
3: Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness.
4: Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly sub-optimal solutions.
5: We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm.
6: Our experiments showed that these innovative designs enable our gradient estimator to converge $20\times$ faster than the advanced reinforcement learning baseline, and find up to $80\%$ shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. $1000$ cities and $15$ agents).
7: \end{abstract}
8: