Data Visualization (2)

This blog post is my second on ‘Data Visualization‘ and is based on quotes and notes that I’ve made around Edward Tufte’s Visual Display of Quantitative Information. The following quote from the book is an excellent way of describing the simplicity of data graphics (or visualizations) whilst recognising how useful they can be.

Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasining about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful.

In his book, Tufte dedicates individual chapters to certain aspects of graphics:

Graphical Excellence

Tufte discusses achieving ‘graphical excellence’ by adhering to, or adopting, a series of key tenets. Obviously some of these tenets are more easily achieved than others, but by aiming to adhere to these principles, hopefully the use of data graphics in this project will be a good example of creating quality visualizations.

Firstly, data graphics should:

  • Show the data
  • Induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production or something else
  • Avoid distorting what the data have to say
  • Present many numbers in a small space
  • Make large datasets coherent
  • Encourage the eye to compare different pieces of data
  • Reveal the data at several levels of detail, from a broad overview to a fine structure
  • Serve a reasonably clear purpose: description, exploration, tabulation or decoration
  • Be closely integrated with the statistical and verbal descriptions of a data set

These principles lay a simple groundwork for constructing data graphics: show the data, don’t twist what the data shows, make it easy to look at and don’t let fancy technology and pretty baubles get in the way of the true purpose of the graphic: to convey information to the viewer.

Tufte goes on to describe ‘graphical excellence’ as being: the well-designed presentation of interesting data; complex ideas communicated with clarity, precision and efficiency; giving the viewer the greatest number of ideas in the shortest time, with the least ink, in the smallest space; and being multivariate – showing more than one variable.

In terms of this project, the relevance of graphical excellence is simple – the process of selecting and applying to university is a complex one, we should be presenting information with clarity and efficiently; we shouldn’t be making applicants remember details for a long time while they try and search for information and we should be showing them as much information as they require, in as short a time as is reasonable and effective. As mentioned in my previous blog post, the human brain is far more capable of taking in more information at a single time than it is remembering multiple pieces of information over a longer time period.

Graphical Integrity

Obviously we have to be truthful and not misleading when representing this data, Tufte lays out a few simple steps to ensuring this integrity:

The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.

Basically, don’t distort the numbers being shown by playing around with how they’re visualized. If value B is twice that of value A, the graphical representation of B should be twice the size of the representation of A.

Clear, detailed, and thorough labelling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.

Where appropriate, particularly in this project, it may be necessary to label and/or provide a textual description of some of the data being presented in order to avoid ambiguity. This will be particularly important in datasets such as the KIS, where the source of data, or how a particular figure has been derived can differ from one instance to another.

Show data variation, not design variation.

When presenting various datasets that are similar, the design should remain consistent so that any variation in the view shown to the viewer is due to variations in the data being presented, rather than the method of presentation.

‘Data Ink’ and Graphical Redesign

Tufte outlines five core principles, in terms of ‘data ink’ – the amount of ink (or pixels) used to visualize data.

  • Above all else show the data
  • Maximise the data-ink ratio
  • Erase non-data-ink
  • Erase redundant data-ink
  • Revise and edit

The first point is self-explanatory – the purpose of the graphic is to show the data. The data-ink ratio refers to the amount of data that is shown for the amount of ink (pixels) used – increase the amount of data that can be shown using as small amount of ink as possible. Also, where possible, the amount of ink/pixels used for non-data should be reduced, i.e. the amount of pointless gridlines, needless embellishments etc. Further to this, one should erase data-ink that is redundant and serves no real purpose, as well as revising and editing the graphic in order to achieve a more optimal use of data-ink.

In terms of how applicable these concepts are to this project, I think there has to be a happy -medium that can be maintained.  Whilst removing surplus ‘ink’ is useful for preventing the over-embellishment of data visualizations, being over zealous could result in a similar phenomena – making the graphic harder to read because there’s almost nothing there to look at.

Chartjunk

This term is exactly what it sounds like, filling a graphic with junk for the sake of doing so. A quote from Tufte’s book says it all:

Occasionally designers seem to seek credit merely for possessing a new technology, rather than using it to make better designs. … at least a few computer graphics only evoke the response ‘Isnt it remarkable that my computer can be programmed to draw like that?’ instead of ‘My, what interesting data.’

 

These notes form just a tiny portion of what Tufte discuss is this particular book, but are points that are more applicable to this particular project. As I bring this post to a conclusion, I leave you with one last quote:

What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult, that is – the revelation of the complex.

Data Visualization (1)

As a vast part of this project is based around the presentation and visualization of the data, I’ve started reading around the ‘art’ of visualizing information in order to ground my choices in accepted theories and principles. As I’ll be consulting a number of sources, I’ll be blogging about data visualization over multiple posts. This first post is based around me reading Edward Tufte’s ‘Envisioning Information’

The book itself covers a wide range of areas and gives a lot of examples of how the principles it discusses have (or haven’t) been applied over the course of history, from Galileo to the current day. If you’re interested in the visualization of information, then the book is definitely worth reading, but I have taken a few quotes from the book that strike me as being particularly pertinent when it comes to my work on this particular project.

A grave sin of information design – Pridefully Obvious Presentation. Presenting in such a way that the focus is on the method of presentation, rather than the information being presented.

These two sentences struck me as being very important when it comes to presenting information, especially ing software / web design. When using new and interesting libraries and code bases, it becomes very easy to get trapped in the ‘excitement’ of all the new ways you *could* present the information. If the purpose of the project is, as in this case, to present information in such a way that it is useful and informative to the user of the system or service, then the focus has to be on the best way to present the information to the user, not the best way to show off how you *could* present the information to the user.

…promoters imagine that numbers and details are boring, dull and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of data. If the numbers are boring, you’ve got the wrong numbers.

It is often tempting to ‘decorate’ a presentation of information, in order to increase how visually appealing the visualization is. Here Tufte makes a point that cosmetically decorating a visualization can often lead to a distortion of the data. If there’s not enough meaningful data there, then making the visualization look pretty will do nothing to increase how useful the visualization is. Further to this, if the embellishing is being done because the data itself is boring, then what is the point in visualizing it? The data being presented must, by definition of being useful, be interesting and of relevance to the ‘users’ (for want of a better word) that will be viewing the visualization.

Worse is contempt for our audience, designing as if readers were obtuse and uncaring. In fact, consumers of graphics are often more intelligent about the information at hand than those who fabricate the data decoration.

…no matter what, the operating moral premise of information design should be that our readers are alert and caring.  They may be busy, eager to get on with it, but they are not stupid.

Visualizations of data sets should be done with a target audience in mind. If someone is interested in the data, then the chances are that they have a reasonably sound base of knowledge surrounding the concept the data deals with. As such, they shouldn’t be treated as though they are 2 years old and need even the most basic of concepts explaining to them.

What E.B White said of writing is equally true for information design: “No one can write decently who is distrustful of the reader’s intelligence, or whose attitude is patronizing”

Again, this relates back to understanding your target audience. If you don’t understand the audience, you can’t target the visualizations at them and present the data in a useful and meaningful way.

Display of closely-read data surely requires the skilled craft of good graphic and poster design: typography, object representation, layout, color, production techniques and visual principles that inform criticism and revision. Too often those skills are accompanied by the ideology of chartjunk and data posterization; excellence in presenting information requires mastering the craft and spurning the ideology.

This point refers back to one made earlier regarding the over embellishment of visualizations, there are many areas within information visualization that require the graphic/design elements to be considered, such as layout, colour choices or how the data is represented. This is (more or less) with the graphic design process should stop, there’s no need to embellish the presentation of the visualization itself, attaching superflous images around the periphery does nothing to improve the visualization of information and only serves to distract the viewer.

To clarify, add detail.

This point is really simple and yet very important and useful, if a point needs clarifying, add more detail to the visualization. Simple.

Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison and choice – as so often it is, then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, requires the viewers to rely on visual memory – a weak skill, to make a contrast, a comparison, a choice.

The human brain can process quite a lot of information when shown it, but visual memory is generally weak; this should be remembered when deciding how to present information to the user – show them what you can on one screen, don’t make them remember details as they move from screen to screen to screen – they’ll forget!

These are just a few of the many points made in Tufte’s book, but they all apply in one way or another to this particular project. As I read more around data visualization I’ll be writing more blog posts on the subject.

Creating *Valid* Dummy Data

As this project is ‘piggybacking’ off another project being carried out in the university, we will be taking most of our data from another system that is in the process of being implemented. As such, it has been necessary to create some dummy datasets so that I can continue with this project while the other system is being implemented. Using data that has been taken from various sources, I have been able to create dummy data that resembles the data that will be made available through KIS and the XCRI-CAP feed.

At this stage, the data that I am dealing with focuses around 5 awards offered at the university, rather than trying to encompass every award currently offered. To try and get a grasp of how this information is presented in different departments and areas of the university, I have selected a range of awards, rather than focus on a specific school or college within the university. Hopefully this will mean that there won’t be too many surprises when I can get all of the ‘real’ data.

As part of the process of validating the data that I have collected, I’ve been using Craig Hawker’s XCRI-CAP 1.2 validator, which has proved invaluable in ensuring that the ‘test’ XCRI feed that I’ve been working with is actually valid, again reducing the amount of surprises I should get when I can use the full XCRI feed being made available by the university.

On top of the data being made available through the KIS and XCRI feed, the implementation of a new ‘Academic Programme Management System’ means that I should be able to easily get data regarding the modules/units that make up each of the courses offered by the university. This, in combination with the data available in the KIS and XCRI, should be more than enough data to produce services that are useful to students and present the information in meaningful ways.

Next step, the APIs to get at the data and documentation!!!!!!!

Key Information Sets (KIS) – A Summary!

One of the collections of data that is going to become available during the course of this project is the Key Information Sets (KIS) data. To quote HEFCE:

Key Information Sets are comparable sets of standardised information about undergraduate courses. They are designed to meet the information needs of prospective students and will be published ‘in context’ on the web-sites of universities and colleges.

As part of our wider work with partner organisations in response to the increasing importance of information about higher education, universities and colleges will be expected to publish these information sets on their web-sites from September 2012.

Students will be able to access facts and figures about undergraduate courses that will be drawn from a range of sources, including the NSS, DLHE and institutions. Some of these facts and figures will centre around these topics: student satisfaction,  course information, salary figures for graduates, cost of accommodation, fees etc. With such a wide range of data being provided by such a wide range of institutions, there really is going to be a lot of data for developers to utilise to provide services.

A very quick summary of what data is used by the KIS and where it is taken from, can be downloaded from the HEFCE site, along with a mock-up of how the KIS data may be presented (in the form of a widget) on institution’s websites. For those of you who want to read about the KIS in detail, are a glutton for punishment or need to read it, there’s a handy 108 page document which can also be downloaded. As part of the process of getting to grips with exactly what data will be available from the KIS, I’ve created a handy summary, which details the data available in the KIS, but summarises it down from 108 pages to a spreadsheet that is just a couple of pages long, which is available as a Google doc here. Some of the Google doc relates specifically to the University of Lincoln, but the main summary of the KIS data should be useful to anybody in a similar situation to myself. Obviously the summary is not intended to be a complete view of the KIS , so please don’t regard the summary I have created as ‘The Definitive Guide to KIS’, but rather a snapshot of just what might be available.

In terms of when this data *should* be submitted by institutions and when it *might* be available, I’ve taken the following table from HESA:

Date Action
June 2011 Provision of information about higher education: Outcomes of consultation and next steps, published by HEFCE, UUK and GuildHE (HEFCE 2011/18).
September 2011 Publication of technical guidance available on HESA web-site.
December 2011 Publication of updated technical guidance available on HESA web-site.
22 February/1 March 2012 HESA training seminars see HESA training.
February/March 2012 Submission system opens and validation kits issued for KISs to be published in September 2012. Institutions will have until August 2012 to finalise their data. The system will remain open throughout this period to allow institutions to manage their workloads and undertake quality assurance work.
March 2012 Publication of final update to technical guidance available HESA web-site.
July 2012 NSS 2012 and DLHE 2012 (C10018) data added by HEFCE to KIS submissions.
August 2012 Final deadline for submission of KIS data and sign off by head of institution. Institutions will be able to preview their full KIS data at this point.
September 2012 Institutions preview new official web-site with KIS and associated widgets.
New official web-site goes live; Unistats web-site closes.
KIS widgets to be embedded and visible on all institutional web-sites.
UCAS to include KIS data into course search web-site.
October 2012 KIS system re-opens for updates to September 2012 KIS, where necessary, for example to confirm fee information. Updates will be routinely published via the KIS widgets and official web-site.

 

Well, I hope that this summary of KIS is helpful! I’ll be posting something about XCRI-CAP soon.