1: \begin{abstract}
2:
3: One way to approach end-to-end autonomous driving is to learn a policy
4: function that maps from a sensory input, such as an image frame from a
5: front-facing camera, to a driving action, by imitating an expert driver, or
6: a reference policy. This can be done by supervised learning, where a policy
7: function is tuned to minimize the difference between the predicted and
8: ground-truth actions. A policy function trained in this way however is known
9: to suffer from unexpected behaviours due to the mismatch between the states
10: reachable by the reference policy and trained policy functions. More
11: advanced algorithms for imitation learning, such as DAgger, addresses this
12: issue by iteratively collecting training examples from both reference and
13: trained policies. These algorithms often requires a large number of queries
14: to a reference policy, which is undesirable as the reference policy is often
15: expensive. In this paper, we propose an extension of the DAgger, called
16: SafeDAgger, that is query-efficient and more suitable for end-to-end
17: autonomous driving. We evaluate the proposed SafeDAgger in a car racing
18: simulator and show that it indeed requires less queries to a reference
19: policy. We observe a significant speed up in convergence, which we
20: conjecture to be due to the effect of automated curriculum learning.
21:
22: \end{abstract}
23: