8ca0d578d3eb6a1a.tex
1: \begin{abstract}
2: In this paper, we propose a weakly supervised deep temporal encoding-decoding solution for anomaly detection in surveillance videos using multiple instance learning. The proposed approach uses both abnormal and normal video clips during the training phase which is developed in the multiple instance framework where we treat the video as a bag and video clips as instances in the bag. Our main contribution lies in the proposed novel approach to consider temporal relations between video instances. We deal with video instances (clips) as sequential visual data rather than a set of independent instances. We employ a deep temporal encoding-decoding network that is designed to capture spatio-temporal evolution of video instances over time. %We also propose a new loss function that is smoother than the similar loss functions recently presented in the computer vision literature, and therefore; enjoys faster convergence and improved tolerance to local minima during the training. 
3: We also propose a new loss function that maximizes the mean distance between normal and abnormal instance predictions. The new loss function ensures a low false alarm rate which is very crucial in practical surveillance application.The proposed temporal encoding-decoding approach with modified loss is benchmarked against the state of the art in simulation studies. The results show that the proposed method performs similar to or better than the state-of-the-art solutions for anomaly detection in video surveillance applications and achieve state of the art false alarm rate on UCF-crime dataset. 
4: \end{abstract}
5: