609a8e1cfef84002.tex
1: \begin{abstract}
2: %The nature of trial and error in reinforcement learning introduces variances to the learning process. 
3: High variances in reinforcement learning have shown impeding successful convergence and hurting task performance. As reward signal plays an important role in learning behavior, multi-step methods have been considered to mitigate the problem, and are believed to be more effective than single step methods.
4: However, there is a lack of comprehensive and systematic study on this important aspect  to demonstrate the  effectiveness of multi-step methods in solving highly complex continuous control problems.
5: %that require deep reinforcement learning (DRL) solutions.  
6: In this study, we introduce a new  long $N$-step surrogate stage (LNSS) reward approach to effectively account for complex environment dynamics while previous methods are usually feasible for limited numbers of steps. The LNSS method is simple,  low computational cost, and applicable to value based or policy gradient reinforcement learning. 
7: %In results, we show significantly improved task performance scores, reduced learning time, and variance reduction.  With a systematic evaluation on complex benchmark tasks in OpenAI Gym and DeepMind Control Suite, 
8: We systematically evaluate LNSS in OpenAI Gym and DeepMind Control Suite  to address some  complex benchmark environments that have been challenging  to obtain good results by DRL in general. We demonstrate performance improvement in terms of total reward, convergence speed, and coefficient of variation (CV)  by LNSS. We also provide analytical insights on how LNSS exponentially reduces the upper bound on the variances of $Q$ value  from a respective single step method. 
9: 
10: 
11: \end{abstract}
12: