Research Post # 4

I just finished my fourth week at the NYU Internship. Because of the July 4th weekend, this week was cut short, but we still did a lot. I continued to work on the visualization of my data and here are the updates.

Following these steps will enhance your blog post and provide readers with a more engaging experience.I decided to color the edges of the graph as per the interaction category (physical or genetic) and thicknesses are based on interaction count (or the number of organisms the interaction is present in). Once this was done, I tried experimenting with different layouts that the graphing software (NetworkX) offers. I experimented with circular/spiral layouts, force-directed graphs, clustered by color, and spectral clustering. The results are linked here:

These are the five relevant layouts that I found. Each graph has 10,000 edges and 2,061 nodes. The nodes are colored red for disease associated, blue for not disease associated and gray for unknown. The edges are colored orange for physical gene interactions and green for genetic gene interactions. The thicknesses are based off interaction count. Out of all the layouts, force-directed is likely the most informative because it directly shows which nodes are the most connected, allowing us to make predictions about whether a node is disease associated. To elaborate, if a node is primarily connected to disease associated nodes, it is likely also disease associated, and if it is connected to primarily non-disease associated, it is likely not disease associated. The force-directed layout helps visualize this.

There were two force-directed layouts I was experimenting with, each used a different algorithm. The force-directed (spring layout) graph that is shown here is also known as Fruchterman-Reingold and uses a force-directed algorithm that treats the nodes like charged springs, with attractive and repulsive forces. The attractive force comes into play when nodes are connected with an edge, pulling connected nodes closer together. Contrastingly, there is a repulsive force between any pair of nodes, pushing all nodes away from each other. The process is iterative, beginning with random node placement then slightly adjusting the position of the node based on edges and iterating through the graph. Ultimately the result is a clear graph, requiring relatively low computational power, with connected nodes closer together and disconnected nodes further apart.

The second layout is called the Kamada-Kawai layout. This layout is also force-directed, and, similar to the spring layout it uses the idea of attractive and repulsive forces (springs). However, the Kamada-Kawai layout aims to minimize the energy of the nodes and edges. That makes it much more computationally intensive and not logical to use on larger graphs (which is why it is not in the pictures above).

Moving forward I will most likely be using the spring layout, since it is informative and able to be done on larger graphs. Other than this visualization stuff, I played around with the data a bit more making plots and charts. Next week I will continue to do the visualization and hopefully get to some modeling.

See you next week!

Previous
Previous

Research Post # 5

Next
Next

Research Post # 3