1: \begin{abstract}
2:
3: Several \gls{6g} use cases have tight requirements in terms of reliability and latency, in particular \gls{td}. % require connected vehicles to continuously share sensory perception data to a remote driver, and latency requirements are in the order of a few tens of ms.
4: To address those requirements, \gls{pqos}, possibly combined with \gls{rl}, has emerged as a valid approach to dynamically adapt the configuration of the \gls{td} application (e.g., the level of compression of automotive data) to the experienced network conditions.
5: In this work, we explore different classes of \gls{rl} algorithms for \gls{pqos}, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with \gls{nn} approximation). We trained the agents in a \gls{fl} setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between \gls{qos}, measured in terms of the end-to-end latency, and \gls{qoe}, measured in terms of the quality of the resulting compression operation.
6: We show that %stateful off-policy algorithms outperform stateless on-policy algorithms, and \gls{nn} approximation does not always improve linear approximators. Finally, we prove
7: Q-Learning uses a small number of learnable parameters, and is the best approach to perform \gls{pqos} in the \gls{td} scenario in terms of average reward, convergence, and computational cost.
8:
9: \end{abstract}
10: