1: \begin{abstract}
2: We propose a compositional approach to synthesize policies for networks of continuous-space stochastic control systems with unknown dynamics using model-free reinforcement learning (RL).
3: The approach is based on \emph{implicitly} abstracting each subsystem in the network with a finite Markov decision process with \emph{unknown} transition probabilities, synthesizing a strategy for each abstract model in an assume-guarantee fashion using RL, and then mapping the results back over the original network with \emph{approximate optimality} guarantees.
4: We provide lower bounds on the satisfaction probability of the overall network based on those over individual subsystems.
5: A key contribution is to leverage the convergence results for adversarial RL (minimax Q-learning) on finite stochastic arenas to provide control strategies maximizing the probability of satisfaction over the network of continuous-space systems.
6: We consider \emph{finite-horizon} properties expressed in the syntactically co-safe fragment of linear temporal logic.
7: These properties can readily be converted into automata-based reward functions, providing scalar reward signals suitable for RL.
8: Since such reward functions are often sparse, we supply a potential-based \emph{reward shaping} technique to accelerate learning by producing dense rewards. The effectiveness of the proposed approaches is demonstrated via two physical benchmarks including regulation of a room temperature network and control of a road traffic network.
9: \end{abstract}