1: \begin{abstract}
2:
3: We consider the problem of developing an efficient multi-threaded
4: implementation of the matrix-vector multiplication algorithm for sparse
5: matrices with structural symmetry. Matrices are stored using the
6: \textit{compressed sparse row-column} format (CSRC), designed for profiting
7: from the symmetric non-zero pattern observed in global finite element matrices.
8: Unlike classical compressed storage formats, performing the sparse
9: matrix-vector product using the CSRC requires thread-safe access to the
10: destination vector. To avoid race conditions, we have implemented two
11: partitioning strategies. In the first one, each thread allocates an array for
12: storing its contributions, which are later combined in an accumulation step.
13: We analyze how to perform this accumulation in four different ways.
14: The second strategy employs a coloring
15: algorithm for grouping rows that can be concurrently processed by threads. Our
16: results indicate that, although incurring an increase in the working set size,
17: the former approach leads to the best performance improvements for
18: most matrices.
19:
20: \bigskip
21:
22: \noindent{\bf Keywords}: structurally symmetric matrix; sparse matrix-vector product; compressed sparse
23: row-column; parallel implementation; multi-core architectures; finite element method;
24:
25: \end{abstract}
26: