Large Graph Processing
I am writing this post to share the slides of the seminar I gave today here at University of Pisa about Large Graph Processing. The abstract is the following:
The main topic of this seminar is large graph processing. We will try to understand what are the problems that arise when dealing with large datasets and which are the most used solutions today. Specifically we will introduce the Bulk Synchronous Parallel (BSP) model that allows us to process extremely large graphs by distributing the computation to cluster of machines. We will then move our attention to graph compression. We will survey two interesting graph compression techniques, namely K2-trees and WebGraph. The former approach exploits the sparsity of adjacency matrices to achieve compression through a recursive spatial decomposition. The latter technique look for repetitions in the adjacency list in order to achieve a good compression ratio and fast decompression. At the end we will briefly discuss how the WebGraph compression algorithm coupled with HyperLogLog counters was used to obtain the famous “4 degree of separation” in the Facebook graph.
You can view the slides online following this link.