abstract:edac31ddfa61d264.tex

1: \begin{abstract}

2: The study of data-parallel domain re-organization and thread-mapping techniques

3: are relevant topics as they can increase the efficiency of GPU computations when

4: working on spatial discrete domains with non-box-shaped geometry.  In this work we

5: study the potential benefits of applying a succint data re-organization of a

6: tetrahedral data-parallel domain of size $\mathcal{O}(n^3)$ combined with an

7: efficient block-space GPU map of the form $g(\lambda):\mathbb{N} \rightarrow

8: \mathbb{N}^3$. Results from the analysis suggest that in theory the combination

9: of these two optimizations produce significant performance improvement as

10: block-based data re-organization allows a coalesced one-to-one correspondence at

11: local thread-space while  $g(\lambda)$ produces an efficient block-space spatial

12: correspondence between groups of data and groups of threads, reducing the number

13: of unnecessary threads from $O(n^3)$ to $O(n^2\rho^3)$ where $\rho$ is the

14: linear block-size and typically $\rho^3 \ll n$. From the analysis, we obtained

15: that a block based succint data re-organization can provide up to $2\times$ improved

16: performance over a linear data organization while the map can be up to

17: $6\times$ more efficient than a bounding box approach. The results from this

18: work can serve as a useful guide for a more efficient GPU computation on

19: tetrahedral domains found in spin lattice, finite element and special n-body

20: problems, among others.

21: \end{abstract}