bd18d447e9ba2429.tex
1: \begin{abstract}
2: 
3:     One way to approach end-to-end autonomous driving is to learn a policy
4:     function that maps from a sensory input, such as an image frame from a
5:     front-facing camera, to a driving action, by imitating an expert driver, or
6:     a reference policy.  This can be done by supervised learning, where a policy
7:     function is tuned to minimize the difference between the predicted and
8:     ground-truth actions. A policy function trained in this way however is known
9:     to suffer from unexpected behaviours due to the mismatch between the states
10:     reachable by the reference policy and trained policy functions.  More
11:     advanced algorithms for imitation learning, such as DAgger, addresses this
12:     issue by iteratively collecting training examples from both reference and
13:     trained policies. These algorithms often requires a large number of queries
14:     to a reference policy, which is undesirable as the reference policy is often
15:     expensive. In this paper, we propose an extension of the DAgger, called
16:     SafeDAgger, that is query-efficient and more suitable for end-to-end
17:     autonomous driving. We evaluate the proposed SafeDAgger in a car racing
18:     simulator and show that it indeed requires less queries to a reference
19:     policy. We observe a significant speed up in convergence, which we
20:     conjecture to be due to the effect of automated curriculum learning.
21: 
22: \end{abstract}
23: