1: \begin{abstract}
2:
3: Large scale graph processing is a major research area for Big Data exploration. Vertex centric
4: programming models like Pregel are gaining traction due to their simple abstraction that allows for
5: scalable execution on distributed systems naturally. However, there are limitations to this
6: approach which cause vertex centric algorithms to under-perform due to poor compute to
7: communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce \emph{GoFFish} a
8: scalable sub-graph centric framework co-designed with a distributed persistent graph storage
9: for large scale graph analytics on commodity clusters. We introduce a \emph{sub-graph centric programming
10: abstraction} that combines the scalability of a vertex centric approach with the flexibility of
11: shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to
12: this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real
13: world graphs and demonstrate its significant performance improvement, \emph{orders of magnitude in
14: some cases}, compared to Apache Giraph, the leading open source vertex centric implementation.
15:
16: \end{abstract}
17: