1: \begin{abstract}
2: In many real-world scenarios involving high-stakes and safety implications, a human decision-maker (HDM) may receive recommendations from an artificial intelligence while holding the ultimate responsibility of making decisions. %This protocol involving both an autonomous agent and an HDM is known as ``expert in the loop." %where an HDM receives recommendations from an algorithm but ultimately decides which actions need to be taken.
3: In this letter, we develop an ``adherence-aware Q-learning" algorithm to address this problem. The algorithm learns the ``adherence level" that captures the frequency with which an HDM follows the recommended actions and derives the best recommendation policy in real time. We prove the convergence of the proposed Q-learning algorithm to the optimal value and evaluate its performance across various scenarios.
4: \end{abstract}
5: