1: \begin{abstract}
2: %
3: Semantic segmentation is important for scene understanding.
4: To address the scenes of adverse illumination conditions of natural images, thermal infrared (TIR) images are introduced.
5: Most existing RGB-T semantic segmentation methods follow three cross-modal fusion paradigms, \ie encoder fusion, decoder fusion, and feature fusion.
6: Some methods, unfortunately, ignore the properties of RGB and TIR features or the properties of features at different levels.
7: %
8: In this paper, we propose a novel feature fusion-based network for RGB-T semantic segmentation, named \emph{LASNet}, which follows three steps of location, activation, and sharpening.
9: The highlight of LASNet is that we fully consider the characteristics of cross-modal features at different levels, and accordingly propose three specific modules for better segmentation.
10: %
11: Concretely, we propose a Collaborative Location Module (CLM) for high-level semantic features, aiming to locate all potential objects.
12: We propose a Complementary Activation Module for middle-level features, aiming to activate exact regions of different objects.
13: %
14: We propose an Edge Sharpening Module (ESM) for low-level texture features, aiming to sharpen the edges of objects.
15: Furthermore, in the training phase, we attach a location supervision and an edge supervision after CLM and ESM, respectively, and impose two semantic supervisions in the decoder part to facilitate network convergence.
16: %
17: Experimental results on two public datasets demonstrate that the superiority of our LASNet over relevant state-of-the-art methods.
18: %
19: The code and results of our method are available at https://github.com/MathLee/LASNet.
20: \end{abstract}
21: