Analysing Network Visualization Statistics

As mentioned in a previous post, there are many statistics that can be derived from the network visualizations that I have been generating from the course data I have been collecting. At the moment, these are the particular numbers that I have been paying attention to:

  • Mean Degree of Nodes – The mean amount of connections per node on the graph.
  • Mean Weighted Degree of Nodes – The mean weight of connections per node on the graph.
  • Graph Density – A ratio of the number of edges per node to the number of possible edges.
  • Modularity – a measure of the strength of division of a network into modules. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
  • Mean Clustering Coefficient – the degree to which nodes in the graph tend to cluster together.

So, in terms of applying these to the networks generated with awards data:

  • Mean Degree of Nodes – The mean amount of connections for each award. i.e. the mean amount of awards that each award is connected to.
  • Mean Weighted Degree of Nodes – The mean weight of connections for each award. i.e. the mean amount of modules shared by that award with other awards.
  • Graph Density – The amount of connections per award when compared to the total amount of awards in the network. (more affected by an increase in awards offered than others)
  • Modularity – a higher modularity suggests that awards are very highly connected with specific other awards, but have very few ‘odd’ connections to other awards in the network. A very high modularity would suggest that a group of awards shared a lot of modules between themselves.
  • Mean Clustering Coefficient – a low coefficient would suggest that awards did not group together, and therefore did not share modules between them. A high coefficient would suggest that most of the awards in the network formed clusters with other awards.

The numbers generated for the weighted connections between awards for the academic year 2006/07 through to 2012/13 are as follows:

Academic Year Mean Degree Mean Weighted Degree Graph Density Modularity Mean Clustering Coefficient
2006 – 2007 0.804 1.821 0.069 0.657 0.357
2007 – 2008 0.763 1.711 0.041 0.726 0.408
2008 – 2009 0.500 1.324 0.030 0.588 0.224
2009 – 2010 0.405 1.432 0.023 0.574 0.124
2010 – 2011 0.720 1.880 0.029 0.777 0.212
2011 – 2012 0.716 2.486 0.020 0.810 0.259
2012 – 2013 0.651 4.349 0.021 0.847 0.267

So what do these numbers show and are they actually useful? Well….

Mean degree shows the amount of awards that each award is connected to, on average. If we look at mean weighted degree instead, we then take into consideration the weight of a connection between a pair of nodes, i.e. the amount of joins between them, rather than just the fact that a join exists. Plotting this graphically helps to show the pattern that emerges.

 

Mean weighted degree of awards, 2006/07 - 2012/13

From the graph above it becomes clear that there is a definite drop on MWD (mean weighted degree) from the academic year 07/08 to the year 08/09 (around 22%), showing that the average amount of links between awards dropped fairly considerably. Through looking back at the university’s history, this can be explained as this was the point in time that the amount of points per module of study was altered, meaning that, essentially, multiple version of the same award were running in tandem: some with the old weighting of awards, some the new. This also explains the steady increase in MWD up to 11-12 which is the first year that the old weighted degrees would not have been active at all. From the highest point of the old weighting, to this point in the new weighting, there is an increase of over 36% in the amount of joins between awards offered at the university. This shows that (assuming an increased modularity is good in terms of curriculum design) that the provision has been improved through the alteration of module weightings. Taking into account the overall increase in the amount of awards offered, this also shows that the restructuring of the modules had a significant impact on the sharing of teaching and assessment across different awards.

The number given for the ‘modularity’ of the graphs shows a couple of interesting things.

Modularity values for awards, 06-07 to 12-13

As noted above, the modularity shows how well the nodes on the graph (i.e. the awards) form into self contained clusters. A value of 1 would suggest that the awards form perfectly into self-contained clusters, having lots of connections between themselves but no connections with other clusters, a value of 0 would suggest the opposite. As you can see from the graph above, in 06/07, the modularity was reasonably high, quite possibly due to the smaller amount of awards offered at the university. This figure rises over the next year, and then drops for two consecutive years as the weighting of modules at the university goes through a period of change. As the change is fully implemented, the modularity rises significantly and continues to rise, almost at a constant rate from 2010-11 through to 2012-13. This would suggest (though is not necessarily the case) that, either by design or good fortune, the awards offered at the university are starting to form into self-contained groups or areas of specialism. This is interesting to note, as the university has recently gone through an organizational restructuring whereby three colleges were formed – could these clusters be contained within the colleges?

Though this has only looked at two series of numbers generated for each of these visualizations, it does show that visualizing course data produces extra data that cannot be collected when the data is in its raw form. Further to this, it also shows that this data accurately reflects historical changes in provision within the university. If these principles can be applied retrospectively to show changes, in which ways can they be applied to decision making processes, to help assess the impact of potential changes?

Data Visualization (1)

As a vast part of this project is based around the presentation and visualization of the data, I’ve started reading around the ‘art’ of visualizing information in order to ground my choices in accepted theories and principles. As I’ll be consulting a number of sources, I’ll be blogging about data visualization over multiple posts. This first post is based around me reading Edward Tufte’s ‘Envisioning Information’

The book itself covers a wide range of areas and gives a lot of examples of how the principles it discusses have (or haven’t) been applied over the course of history, from Galileo to the current day. If you’re interested in the visualization of information, then the book is definitely worth reading, but I have taken a few quotes from the book that strike me as being particularly pertinent when it comes to my work on this particular project.

A grave sin of information design – Pridefully Obvious Presentation. Presenting in such a way that the focus is on the method of presentation, rather than the information being presented.

These two sentences struck me as being very important when it comes to presenting information, especially ing software / web design. When using new and interesting libraries and code bases, it becomes very easy to get trapped in the ‘excitement’ of all the new ways you *could* present the information. If the purpose of the project is, as in this case, to present information in such a way that it is useful and informative to the user of the system or service, then the focus has to be on the best way to present the information to the user, not the best way to show off how you *could* present the information to the user.

…promoters imagine that numbers and details are boring, dull and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of data. If the numbers are boring, you’ve got the wrong numbers.

It is often tempting to ‘decorate’ a presentation of information, in order to increase how visually appealing the visualization is. Here Tufte makes a point that cosmetically decorating a visualization can often lead to a distortion of the data. If there’s not enough meaningful data there, then making the visualization look pretty will do nothing to increase how useful the visualization is. Further to this, if the embellishing is being done because the data itself is boring, then what is the point in visualizing it? The data being presented must, by definition of being useful, be interesting and of relevance to the ‘users’ (for want of a better word) that will be viewing the visualization.

Worse is contempt for our audience, designing as if readers were obtuse and uncaring. In fact, consumers of graphics are often more intelligent about the information at hand than those who fabricate the data decoration.

…no matter what, the operating moral premise of information design should be that our readers are alert and caring.  They may be busy, eager to get on with it, but they are not stupid.

Visualizations of data sets should be done with a target audience in mind. If someone is interested in the data, then the chances are that they have a reasonably sound base of knowledge surrounding the concept the data deals with. As such, they shouldn’t be treated as though they are 2 years old and need even the most basic of concepts explaining to them.

What E.B White said of writing is equally true for information design: “No one can write decently who is distrustful of the reader’s intelligence, or whose attitude is patronizing”

Again, this relates back to understanding your target audience. If you don’t understand the audience, you can’t target the visualizations at them and present the data in a useful and meaningful way.

Display of closely-read data surely requires the skilled craft of good graphic and poster design: typography, object representation, layout, color, production techniques and visual principles that inform criticism and revision. Too often those skills are accompanied by the ideology of chartjunk and data posterization; excellence in presenting information requires mastering the craft and spurning the ideology.

This point refers back to one made earlier regarding the over embellishment of visualizations, there are many areas within information visualization that require the graphic/design elements to be considered, such as layout, colour choices or how the data is represented. This is (more or less) with the graphic design process should stop, there’s no need to embellish the presentation of the visualization itself, attaching superflous images around the periphery does nothing to improve the visualization of information and only serves to distract the viewer.

To clarify, add detail.

This point is really simple and yet very important and useful, if a point needs clarifying, add more detail to the visualization. Simple.

Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison and choice – as so often it is, then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, requires the viewers to rely on visual memory – a weak skill, to make a contrast, a comparison, a choice.

The human brain can process quite a lot of information when shown it, but visual memory is generally weak; this should be remembered when deciding how to present information to the user – show them what you can on one screen, don’t make them remember details as they move from screen to screen to screen – they’ll forget!

These are just a few of the many points made in Tufte’s book, but they all apply in one way or another to this particular project. As I read more around data visualization I’ll be writing more blog posts on the subject.