1: \begin{abstract}
2: We focus on energy harvesting (EH) two-hop communications since they are the essential building blocks of more complicated multi-hop networks.
3: The scenario consists of three nodes, where an EH transmitter wants to send data to a receiver through an EH relay.
4: The harvested energy is used exclusively for data transmission and we address the problem of how to efficiently use it.
5: As in practical scenarios, we assume only causal knowledge at the EH nodes, i.e., in each time interval, the transmitter and the relay know their own current and past amounts of incoming energy, battery levels, data buffer levels and channel coefficients for their own transmit channels.
6: Our goal is to find transmission policies which aim at maximizing the throughput considering that the EH nodes fully cooperate with each other to exchange their causal knowledge during a signaling phase.
7: We model the problem as a Markov game and propose a multi-agent reinforcement learning algorithm to find the transmission policies.
8: Furthermore, we show the trade-off between the achievable throughput and the signaling required, and provide convergence guarantees for the proposed algorithm.
9: Results show that even when the signaling overhead is taken into account, the proposed algorithm outperforms other approaches that do not consider cooperation among the nodes.
10: \end{abstract}
11: