abstract:168fb3aad61b253a.tex

1: \begin{abstract}

2: %

3: Semantic segmentation is important for scene understanding.

4: To address the scenes of adverse illumination conditions of natural images, thermal infrared (TIR) images are introduced.

5: Most existing RGB-T semantic segmentation methods follow three cross-modal fusion paradigms, \ie encoder fusion, decoder fusion, and feature fusion.

6: Some methods, unfortunately, ignore the properties of RGB and TIR features or the properties of features at different levels.

7: %

8: In this paper, we propose a novel feature fusion-based network for RGB-T semantic segmentation, named \emph{LASNet}, which follows three steps of location, activation, and sharpening.

9: The highlight of LASNet is that we fully consider the characteristics of cross-modal features at different levels, and accordingly propose three specific modules for better segmentation.

10: %

11: Concretely, we propose a Collaborative Location Module (CLM) for high-level semantic features, aiming to locate all potential objects.

12: We propose a Complementary Activation Module for middle-level features, aiming to activate exact regions of different objects.

13: %

14: We propose an Edge Sharpening Module (ESM) for low-level texture features, aiming to sharpen the edges of objects.

15: Furthermore, in the training phase, we attach a location supervision and an edge supervision after CLM and ESM, respectively, and impose two semantic supervisions in the decoder part to facilitate network convergence.

16: %

17: Experimental results on two public datasets demonstrate that the superiority of our LASNet over relevant state-of-the-art methods.

18: %

19: The code and results of our method are available at https://github.com/MathLee/LASNet.

20: \end{abstract}

21: