Research Post # 5
Just finished another week at my internship. As you remember, I have been working on a data visualization and am continuing to do so. This week I experimented with graph sparsification. Unfortunately, my computer only has 8 GB of RAM so I can’t visualize all the data without my computer crashing or slowing down. I am supposed to get access to the university’s High Performance Computing (HPC) servers soon, but in the meantime my professor suggested I utilize graph sparsification. Graph sparsification is a technique that makes the size of a graph smaller and require less memory by pruning edges and nodes. I tried it out on smaller graphs and it works! There are a bunch of different algorithms that can be used and altered for different results. I started out with a basic randomizing sparsification algorithm that was built-in in the software that I am using - NetworkX. This algorithm combined nodes of high densities that were close to each other - reducing the total number of edges. In the end, the sparsified graph had about 200 less edges than the original graph (with a starting number of 1,000 edges).
Graph sparsification is a really interesting concept since there are so many different algorithms for it. This paper does an in depth analysis of different graph sparsification algorithms. This study uses 14 graphs obtained from real world scenarios with different characteristics, 12 common graph sparsification algorithms (sparsifiers), and assesses their performance on 16 various graph metrics. The results found that no one sparsifier has an overall best performance in preserving all the graph characteristics, the algorithms each perform well in different cases. The paper talks about the math behind each algorithm, the significance of each metric, and the impact of each algorithm on each metric.
Apart from graph sparsification, in addition to creating a larger data visualization I have been tasked with finding external platforms and softwares for graph visualization. NetworkX is specifically for aesthetically showing graphs - it does not do extensive analysis of the data itself. For this reason, all their layouts (which can be seen in my last post) are more or less symmetrical and do not offer any informative qualitative results. By using external softwares to position the nodes and then feed that into NetworkX, the result may be a more asymmetrical graph that would inform us on communities that are formed and better predict the unknown genes. We have tried community-based clustering, however that did not work well because the algorithm kept reading the entire graph as one giant community, which is uninformative and not useful when clustering.
Next steps are continuing with graph sparsification, using other softwares to visualize the network, and hopefully start to get some informative visualizations. See you next week.