e85155f834c99b93.tex
1: \begin{abstract}
2: 
3:   Several \gls{6g} use cases have tight requirements in terms of reliability and latency, in particular \gls{td}. % require connected vehicles to continuously share sensory perception data to a remote driver, and latency requirements are in the order of a few tens of ms.  
4:   To address those requirements, \gls{pqos}, possibly combined with \gls{rl}, has emerged as a valid approach to dynamically adapt the configuration of the \gls{td} application (e.g., the level of compression of automotive data) to the experienced network conditions. 
5:   In this work, we explore different classes of \gls{rl} algorithms for \gls{pqos}, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with \gls{nn} approximation). We trained the agents in a \gls{fl} setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between \gls{qos}, measured in terms of the end-to-end latency, and \gls{qoe}, measured in terms of the quality of the resulting compression operation. 
6:   We show that %stateful off-policy algorithms outperform stateless on-policy algorithms, and \gls{nn} approximation does not always improve linear approximators. Finally, we prove 
7:   Q-Learning uses a small number of learnable parameters, and is the best approach to perform \gls{pqos} in the \gls{td} scenario in terms of average reward, convergence, and computational cost.
8: 
9: \end{abstract}
10: