Visualizing correlations with graphs

Yesterday I found a statistics textbook for geologists [1] for $1 at a library book sale. When I thumbed through the book an image similar to the one below caught my eye.

This image approximates Figure 15.2 in [1],

The nodes represent six factors of the thickness of rock formations and the edges are labeled with the correlations between factors. Only large correlations are shown. For example, in theory everything is correlated with “total” but carbonates are not significantly correlated with the total. Nonclastics divide into evaporates and carbonates; apparently nearly all the nonclastics in this data set were evaporites.

Notice that this example illustrates that correlation is not transitive. That is, if A is correlated with B and B is correlated with C, it does not follow that A is necessarily correlated with C.

Making the graph

I made the graph above with GraphViz using the following code.

    graph G {
    T [label="Total"      , pos="2.50, 5.00!"]
    S [label="Sand"       , pos="4.66, 3.75!"]
    C [label="Carbonates" , pos="4.66, 1.25!"]
    E [label="Evaporites" , pos="2.50, 0.00!"]
    N [label="Nonclastics", pos="0.39, 1.25!"]
    H [label="Shale"      , pos="0.39, 3.75!"]
    T -- S [label=" 0.24 "]
    T -- H [label=" 0.89 "]
    T -- N [label=" 0.84 "]
    T -- E [label=" 0.82 "]
    H -- N [label=" 0.69 "]
    H -- E [label=" 0.70 "]
    S -- C [label=" 0.45 "]
    N -- E [label=" 0.99 "]


I’ve mostly used GraphViz to make graphs when I didn’t care much about the layout. I’ve experimented with a few layout engines, but I hadn’t tried specifying the node positions before.

The nodes in the original graph were arranged in a circle, so I tried the circo layout engine. This did not position the nodes in a circle. I also tried specifying the positions without the bang on the end, giving the positions as layout hints. GraphViz did not appreciate my suggestions and was certain that it knew better how to layout the graph. But when I added the exclamation marks GraphViz acquiesced to my wishes.

GraphViz will create output in a variety of formats. I tried PNG and SVG. The SVG image above was 11 times smaller than the PNG output. One reason I starting using SVG images more often is that they often result in smaller files. They also look very nice at multiple resolutions, i.e. on a desktop and on a mobile device.

Related posts

[1] Krumbein and Graybill. An Introduction to Statistical Models in Geology. McGraw-Hill, 1965.

2 thoughts on “Visualizing correlations with graphs

  1. John, take a look at Design Structure Matrix to see how that graph can also have “feedback” loop as well as correlations and propagation
    Tyson Browning has many resources for this
    Here’s my compendium of resources

Leave a Reply

Your email address will not be published. Required fields are marked *