1: \begin{abstract}
2: %We study the use of reinforcement learning to adapt the modulation and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit to learn to jam the victim system. Prior work has shown that with the use of linear bandits, improved convergence is achieved to jam a single-carrier system using time-domain jamming schemes. However, communications systems today typically employ orthogonal frequency division multiplexing (OFDM) to transmit data, particularly in 4G/5G networks. This work explores the use of linear Thompson Sampling (TS) to jam OFDM-modulated signals.
3: We study jamming of an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning with a contextual bandit to jam a 5G-based system implementing some aspects of the 5G protocol. This model introduces unreliable reward feedback in the form of ACK/NACK observations to the jammer to understand the effect of how imperfect observations of errors can affect the jammer's ability to learn. We gain insights into the convergence time of the jammer and its ability to jam a victim 5G waveform, as well as insights into the vulnerabilities of wireless communications for reinforcement learning-based jamming.
4: \end{abstract}
5: