edac31ddfa61d264.tex
1: \begin{abstract}
2: The study of data-parallel domain re-organization and thread-mapping techniques
3: are relevant topics as they can increase the efficiency of GPU computations when
4: working on spatial discrete domains with non-box-shaped geometry.  In this work we
5: study the potential benefits of applying a succint data re-organization of a
6: tetrahedral data-parallel domain of size $\mathcal{O}(n^3)$ combined with an
7: efficient block-space GPU map of the form $g(\lambda):\mathbb{N} \rightarrow
8: \mathbb{N}^3$. Results from the analysis suggest that in theory the combination
9: of these two optimizations produce significant performance improvement as
10: block-based data re-organization allows a coalesced one-to-one correspondence at
11: local thread-space while  $g(\lambda)$ produces an efficient block-space spatial
12: correspondence between groups of data and groups of threads, reducing the number
13: of unnecessary threads from $O(n^3)$ to $O(n^2\rho^3)$ where $\rho$ is the
14: linear block-size and typically $\rho^3 \ll n$. From the analysis, we obtained 
15: that a block based succint data re-organization can provide up to $2\times$ improved
16: performance over a linear data organization while the map can be up to 
17: $6\times$ more efficient than a bounding box approach. The results from this
18: work can serve as a useful guide for a more efficient GPU computation on
19: tetrahedral domains found in spin lattice, finite element and special n-body
20: problems, among others.
21: \end{abstract}