1: \begin{abstract}
2: In this letter, we investigate the discrete phase shift design of the intelligent reflecting surface (IRS) in a time-division duplexing (TDD) multi-user multiple-input-multiple-output (MIMO) system. We modify the design of deep reinforcement learning (DRL) scheme so that we can maximizing the average downlink data transmission rate free from the sub-channel channel state information (CSI). Based on the characteristics of the model, we modify the ``proximal policy optimization (PPO)" algorithm and integrate gated recurrent unit (GRU) to tackle the non-convex optimization problem. Simulation results show that the performance of the proposed PPO-GRU surpasses the benchmarks in terms of performance, convergence speed, and training stability.
3: \end{abstract}
4: