How Not to Visualize Course Data

This blog post will probably be the first of several, detailing what hasn’t worked in terms of mixing various datasets and the resulting visualisations. We learn from our (and other people’s) mistakes, as well as our successes, after all!

As mentioned in a previous blog posts, one of the other datasets that the University of Lincoln makes available through data.lincoln is space data – relating to rooms, buildings and campuses. As I have been able to determine which courses are offered by which departments, I decided to see how this visualisation would work when overlaid on a map of the campus. Frankly, it didn’t work very well.

 

Map showing sharing of modules across campus

Whilst knowing that various departments are sharing modules, which may show the teaching of interdisciplinary courses, is a good thing, representing this data on a map doesn’t work very well. The first problem that arose is where to situate each department. Obviously each department has a building that it is based in, but the teaching of the courses offered by the department may spread across different buildings, and even different campuses. To simplify things, I placed each department within the building in which it is primarily based. Secondly the weight of the line represents the amount of courses and/or modules being shared across departments.

The result? A mess of lines that don’t really show anything meaningful. As staff and students won’t necessarily be involved in lectures in the building in which the department is based, it doesn’t show movement across campus or anything tangible. It’s not the buildings that are related, but the courses and modules organised and run by the people that work in the departments that are (more or less) situated, or at least based, within the buildings. Further to this, because of the nature of how close together the buildings on the Brayford campus are, mixed with the distance between the Brayford campus and Riseholme, if you zoom out to see where the red line to the left of the image above ends up, you get a line between the two campuses, and a big red blog on the Brayford campus, which makes it even more difficult to gain any information whatsoever from the visualization.

So, it is safe to say that, at least in this particular context, mixing course data with space data and overlaying it on a map doesn’t work overly well. It may be that that when combined with, perhaps, timetable data, and done on a more granular level, perhaps just showing modules within one award, that overlaying the information on a map would be more suitable; but in this particular case, the visualisation of the data leaves a lot to be desired.

What to Do with Six Years of Course Data?!?!

After asking colleagues in Planning, I came across some stored reports that contain information about the various awards/courses offered at the university, along with the modules that constitute those awards – from short certificates to full undergraduate and postgraduate degrees. Whilst the reports date back to the 90s, the data within them is substantial enough to be used from 2006-07 onwards; in total this comes to around 50,000 individual award->module relationships spread over the 6 academic years represented in the data.

The first question that arose was: ‘What to do with six years of course data?!?!?!’.

After speaking with Tony Hirst last week, we came to the conclusion that this data would also have a great benefit if utilised in new ways within the university itself, as well as presenting the course information (and related datasets) to current and prospective students. The first way I decided to look at all of this information was to visualise the relationships between modules and courses offered at the university.

The data shows how different awards share certain modules in common; this can be seen in small-scale examples within the raw data itself, but how would the entire dataset for a year look? To find out, I extracted the pertinent information from everything that was currently being stored, and eventually narrowed it down to a set of data that showed the relationships between modules – basically pairs of modules offered on the same awards. Modules formed the nodes of the graph and the links between the nodes – the edges, are representative of the various courses that the modules are offered on.

With this dataset prepared, I loaded the data into Gephi, selected an appropriate layout algorithm and let Gephi work its magic. As a result, we get graphs like this: allmodules_11_12. (Each node is a module, each edge is an award that the module is available on, edge colours represent a single award). From these graphs we can see that clusters of courses form that share many modules in common, mainly around joint degrees (which makes sense!); we can also see that many courses ‘float away’ from these hubs as they are entirely self contained and share no modules with any other award offered at the university. The other graphs can be seen here: all modules 06 07all modules 07 08all modules 08 09all modules 09 10 and all modules 10 11.

So apart from making pretty pictures with course data, what purpose has this served? Well, firstly, I now know that I can get a vast amount of data covering the past six years of course and modules offered at the university. Secondly, I now have a better understanding of the inner workings of Gephi, which will no doubt serve me well over the rest of the project. Thirdly I also now know just who to pester in the right departments to get even more data. Finally…..we now have A0 printouts of these graphs plastered around the office walls – I certainly didn’t envisage using course data as wallpaper when I started on this project.

Being able to quickly see the connections between modules, particularly where one module is used for multiple awards could be very useful for those involved in curriculum planning. Obviously I’m not suggesting that they consult one of these A0 posters to assess the impact of changing one module, but being able to quickly find the impact of changing it would be useful. Take for instance, a module that contains an element of group work. 5 courses use this module, 4 of which are run by one particular college, the 5th course is run by a completely separate college. 4 of the courses have far too much group work, it is decided, so the decision is made to remove the group work element from the module. Do those involved in the decision know that the module is used by a course in College B, and, that the module is the only element of group work within a year’s study on the course? Removing the group work element would mean that the course doesn’t contain all of the required elements to be re-validated, obviously causing problems further down the line. Combining the data used to produce the visualisations above, along with other datasources could help to resolve this issue.

So where to go from here? Well, abstracting slightly further from the course->module level, we (I) can start to compare inter-departmental and inter-disciplinary sharing of modules at a department, faculty or college level within the university. Combining with other data that we make available through data.lincoln, we can look at how departments share modules across the physical space of the campuses that make up the university (more on that in another blog post). Combining the data with student numbers, we can look at the subscription levels to the modules that form a focal point to multiple awards. If / when I can get hold of full datasets for learning outcomes & module descriptors, I can start to look at modules that don’t necessarily share any course in common, but may be similar in terms of the learning outcomes they address or the topics they cover (as described in the module descriptions). There really are many ways to combine all of the information that I’m starting to stumble across and it is just a case of finding interesting combinations of datasets and assessing how useful the results are.

As a result of this digging around and tidying up of various data sources, all of the data that can be made accessible through data.lincoln will be made available – in a nice format, unlike the multitude of document types and messy data that I’ve been dealing with recently.

Any suggestions of ways to mash-up some data or ideas about new visualisations, feel free to leave me a comment or three below!

Data Visualization (2)

This blog post is my second on ‘Data Visualization‘ and is based on quotes and notes that I’ve made around Edward Tufte’s Visual Display of Quantitative Information. The following quote from the book is an excellent way of describing the simplicity of data graphics (or visualizations) whilst recognising how useful they can be.

Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasining about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers – even a very large set – is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful.

In his book, Tufte dedicates individual chapters to certain aspects of graphics:

Graphical Excellence

Tufte discusses achieving ‘graphical excellence’ by adhering to, or adopting, a series of key tenets. Obviously some of these tenets are more easily achieved than others, but by aiming to adhere to these principles, hopefully the use of data graphics in this project will be a good example of creating quality visualizations.

Firstly, data graphics should:

  • Show the data
  • Induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production or something else
  • Avoid distorting what the data have to say
  • Present many numbers in a small space
  • Make large datasets coherent
  • Encourage the eye to compare different pieces of data
  • Reveal the data at several levels of detail, from a broad overview to a fine structure
  • Serve a reasonably clear purpose: description, exploration, tabulation or decoration
  • Be closely integrated with the statistical and verbal descriptions of a data set

These principles lay a simple groundwork for constructing data graphics: show the data, don’t twist what the data shows, make it easy to look at and don’t let fancy technology and pretty baubles get in the way of the true purpose of the graphic: to convey information to the viewer.

Tufte goes on to describe ‘graphical excellence’ as being: the well-designed presentation of interesting data; complex ideas communicated with clarity, precision and efficiency; giving the viewer the greatest number of ideas in the shortest time, with the least ink, in the smallest space; and being multivariate – showing more than one variable.

In terms of this project, the relevance of graphical excellence is simple – the process of selecting and applying to university is a complex one, we should be presenting information with clarity and efficiently; we shouldn’t be making applicants remember details for a long time while they try and search for information and we should be showing them as much information as they require, in as short a time as is reasonable and effective. As mentioned in my previous blog post, the human brain is far more capable of taking in more information at a single time than it is remembering multiple pieces of information over a longer time period.

Graphical Integrity

Obviously we have to be truthful and not misleading when representing this data, Tufte lays out a few simple steps to ensuring this integrity:

The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.

Basically, don’t distort the numbers being shown by playing around with how they’re visualized. If value B is twice that of value A, the graphical representation of B should be twice the size of the representation of A.

Clear, detailed, and thorough labelling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.

Where appropriate, particularly in this project, it may be necessary to label and/or provide a textual description of some of the data being presented in order to avoid ambiguity. This will be particularly important in datasets such as the KIS, where the source of data, or how a particular figure has been derived can differ from one instance to another.

Show data variation, not design variation.

When presenting various datasets that are similar, the design should remain consistent so that any variation in the view shown to the viewer is due to variations in the data being presented, rather than the method of presentation.

‘Data Ink’ and Graphical Redesign

Tufte outlines five core principles, in terms of ‘data ink’ – the amount of ink (or pixels) used to visualize data.

  • Above all else show the data
  • Maximise the data-ink ratio
  • Erase non-data-ink
  • Erase redundant data-ink
  • Revise and edit

The first point is self-explanatory – the purpose of the graphic is to show the data. The data-ink ratio refers to the amount of data that is shown for the amount of ink (pixels) used – increase the amount of data that can be shown using as small amount of ink as possible. Also, where possible, the amount of ink/pixels used for non-data should be reduced, i.e. the amount of pointless gridlines, needless embellishments etc. Further to this, one should erase data-ink that is redundant and serves no real purpose, as well as revising and editing the graphic in order to achieve a more optimal use of data-ink.

In terms of how applicable these concepts are to this project, I think there has to be a happy -medium that can be maintained.  Whilst removing surplus ‘ink’ is useful for preventing the over-embellishment of data visualizations, being over zealous could result in a similar phenomena – making the graphic harder to read because there’s almost nothing there to look at.

Chartjunk

This term is exactly what it sounds like, filling a graphic with junk for the sake of doing so. A quote from Tufte’s book says it all:

Occasionally designers seem to seek credit merely for possessing a new technology, rather than using it to make better designs. … at least a few computer graphics only evoke the response ‘Isnt it remarkable that my computer can be programmed to draw like that?’ instead of ‘My, what interesting data.’

 

These notes form just a tiny portion of what Tufte discuss is this particular book, but are points that are more applicable to this particular project. As I bring this post to a conclusion, I leave you with one last quote:

What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and the difficult, that is – the revelation of the complex.

Focus Group – 14th March 2012

We met with a group of students yesterday to better understand the application process from individuals that have just gone through it, and to formulate some loose user requirements in terms of datasets and visualizations / applications.

A group of eight students attended, 4 from Games Computing, 2 Computer Science students and 2 Journalism students. The session followed a loose agenda, based around the application process; data that is/is not available during the application process and the various factors that potential students take into account when considering various universities and degree programmes.

The feedback from the session can be grouped into three categories: that gained through general discussion, written responses to a set of questions and ‘post-it notes’ that were used to allow the students to state factors that they did / would take into account when applying to university, and the importance that they placed on these factors. The feedback is broken down in this post into these three categories.

Feedback from General Discussion

When applying to university, social circles tend to have a heavy influence on the choice of university and/or degree. It was stated that schools/colleges (at least in this particular group’s experience) tended to encourage going to university, but took no active role in helping to determine which university and degree course was best for the individual student.

In terms of the process undertaken to research the various universities and courses available to them, the common starting point was the UCAS website. From here various universities’ websites were viewed, along with related Google (or similar) searches. Social media also played a role in gaining information, by joining various Facebook groups, ‘liking’ various universities Facebook pages and some joining the UCAS ‘UGoFurther’ social network, although the latter was rarely used.

The discussion then covered various factors  that the students may have taken into account when considering their choice of university and degree. The main points are summarised below:

  • Open days are beneficial – gives the students an opportunity to meet staff and visit the campus.
  • Distance from home is a factor, although not necessarily meaning that the closer to home the better.
  • University league tables played a part – Lincoln seen as rising through the tables.
  • Course content was highlighted as being particularly important.
  • Extra-curricular/entertainment provision was discussed, with the Engine Shed being mentioned
  • Entry requirements to courses were obviously a big part in course selection – although it was not made entirely clear which courses ‘counted’ and there was some confusion over the conversion of grades to UCAS points.
  • Course accreditation did not play a role in selection, primarily because it was not made clear  what it actually meant. The need for accreditation was often explained after students had started their courses.
  • The experience (in industry or academic) of teaching staff was not a strong factor, but it would have been good to see alongside the course information.
The students that discussed what information would have been useful, but wasn’t always readily available:
  • ‘Real’ careers info – statistics, case studies of former students etc.
  • Being able to easily compare courses within the same institution and across institutions.
  • More information on how their grades related to the entry requirements.
  • More information about optional modules and how it affects the structure of the course.
  • More information regarding the optional placement year – some cited this option as a reason for selecting the course, but stated that  more information about it would have been useful.
The discussion progressed onto prospectuses:
  • Most, if not all, used a collection of physical prospectuses, as well as viewing course and institution information online.
  • The prospectus gives a wider range of information, based on the university and the wider city.
  • It was noted, however, that a lot of the prospectus is wasted on individual students as they will very rarely (if at all) need or want to read about all the other courses offered at the university.
Feedback from ‘Questionnaires’
A ‘questionnaire’ of six questions was used to help structure further discussion. The information below shows each question, along with a summary of the various answers provided by those taking part in the focus group. (Students will have different views, so some bullet points may appear to contradict others)
      1. What sources of information did you consult when deciding on a university and course?
  • League tables
  • University websites
  • UCAS websites
  • Open days
  • Prospectuses
  • Event sites – music gigs etc
  • Information about the city
  • Google
  • Social Networks
  • Word of mouth, recommendations etc.
    2. What information would you have liked but couldn’t find, or wasn’t readily available?
  • Lecturer profiles
  • Course comparison – within an institution and across the sector
  • Detailed information on optional modules
  • What A-Levels were accepted
  • Case studies
  • Contact time
  • Placement year information
  • Accommodation options
  • Shopping / Night life in the city
    3. What role did the prospectus have in selecting a course? Is having the full prospectus useful or would a more custom prospectus be more useful?
  • Full prospectus preferable, not wanting to miss any information.
  • Custom prospectus better – already knew chosen area of study
  • Custom prospectus preferable, but still printed (as opposed to online version)
  • Full prospectus useful for generalised info, customised prospectus could offer more detailed information about chosen area.
  • Prospectus useful for showing family / friends.
    4. Whilst at university have you been made aware of the impact that your choice of optional modules could have on the structure of your course later on? Would this information be useful earlier on in the course?
  • Module information useful for selecting course
  • Optional modules were made clear on open day
  • Aware of options, not aware of possible implications (pre-requisites at later levels of study, for example)
  • Currently a first year student, haven’t been given information yet.
  • Not particularly aware – given full text description of course but information wasn’t overly clear.
  • Only aware as a student rep and attend meetings.
    5. How much of an impact does / would the assessment structure of a module or course have in your selection process? Would you avoid a module because of how it is assessed? Would you be drawn to a particular module because of how it is assessed?
  • Did not affect deicison, but would be interesting to know.
  • Don’t think it would have too much impact unless they were all presentations.
  • Hates writing with a passion, more likely to do practical modules.
  • Prefers practical assessment, assessment structure mentioned but not overly clear, assessment structure would influence them somewhat.
  • Quite important – it would be useful to see type breakdown for each module, would influe module choice.
  • Not bothered about assessment structure, course has to be assessed, doesn’t matter how its done.
  • Avoided universities and modules that had a lot of exams, drawn to modules with more practical assessment, big impact on university and course selection.
  • Lots of exams (with a high weighting) at end of year is bad.
Question six was ‘What factors did or would affect your choice of university and degree course?’ and was used as a prelude to the ‘Post It’ task, which is described below.

Post It Notes

The final section of the focus group involved the students writing any and all factors that may influence their choice of university or degree, along with features / information that are important to them on post-it notes. A scale on the wall (0 – 10) was then used, to allow them to prioritise how important each of these things was to them as an individual applicant. There was two main reasons for doing the task like this, firstly it allowed a priority to be established, with the more important aspects appearing closer to 10 on the scale (to the right of the picture, below); secondly, these post-it notes of individual pieces of information are somewhat akin to user stories within agile development. For instance, one note reads ‘Pass / Fail Rate’ and was placed around 7 on the scale. Firstly, this shows us that knowing the pass / fail rate of a given course is fairly important to potential applications; secondly, this note could be transformed to a user story similar to ‘Users are able to view the pass / fail rate of a course’.

The graph below quickly summarises the responses given in the task above. It shows that the top ten overall responses given are:

  1. Course content information
  2. Provision of specialist equipment
  3. Case studies (or similar ) of previous students
  4. Accomodation
  5. Lecturer profiles
  6. Similar courses
  7. Information about the city
  8. League tables
  9. Fees
  10. Relationship between grades and UCAS points

The focus group was very successful and for the time-being has given us enough information to work with and an insight into what services and visualizations may be useful for potential applicants. Another point that was made as a result of this group was that, while the majority of this work would be very useful for potential applicants, there would still be some use in it for current students. This has given us another area of work to consider.

We intend to hold future ‘user groups’ aimed at receiving feedback (‘requirements’) on the work we do as a result of this focus group.

Data Visualization (1)

As a vast part of this project is based around the presentation and visualization of the data, I’ve started reading around the ‘art’ of visualizing information in order to ground my choices in accepted theories and principles. As I’ll be consulting a number of sources, I’ll be blogging about data visualization over multiple posts. This first post is based around me reading Edward Tufte’s ‘Envisioning Information’

The book itself covers a wide range of areas and gives a lot of examples of how the principles it discusses have (or haven’t) been applied over the course of history, from Galileo to the current day. If you’re interested in the visualization of information, then the book is definitely worth reading, but I have taken a few quotes from the book that strike me as being particularly pertinent when it comes to my work on this particular project.

A grave sin of information design – Pridefully Obvious Presentation. Presenting in such a way that the focus is on the method of presentation, rather than the information being presented.

These two sentences struck me as being very important when it comes to presenting information, especially ing software / web design. When using new and interesting libraries and code bases, it becomes very easy to get trapped in the ‘excitement’ of all the new ways you *could* present the information. If the purpose of the project is, as in this case, to present information in such a way that it is useful and informative to the user of the system or service, then the focus has to be on the best way to present the information to the user, not the best way to show off how you *could* present the information to the user.

…promoters imagine that numbers and details are boring, dull and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of data. If the numbers are boring, you’ve got the wrong numbers.

It is often tempting to ‘decorate’ a presentation of information, in order to increase how visually appealing the visualization is. Here Tufte makes a point that cosmetically decorating a visualization can often lead to a distortion of the data. If there’s not enough meaningful data there, then making the visualization look pretty will do nothing to increase how useful the visualization is. Further to this, if the embellishing is being done because the data itself is boring, then what is the point in visualizing it? The data being presented must, by definition of being useful, be interesting and of relevance to the ‘users’ (for want of a better word) that will be viewing the visualization.

Worse is contempt for our audience, designing as if readers were obtuse and uncaring. In fact, consumers of graphics are often more intelligent about the information at hand than those who fabricate the data decoration.

…no matter what, the operating moral premise of information design should be that our readers are alert and caring.  They may be busy, eager to get on with it, but they are not stupid.

Visualizations of data sets should be done with a target audience in mind. If someone is interested in the data, then the chances are that they have a reasonably sound base of knowledge surrounding the concept the data deals with. As such, they shouldn’t be treated as though they are 2 years old and need even the most basic of concepts explaining to them.

What E.B White said of writing is equally true for information design: “No one can write decently who is distrustful of the reader’s intelligence, or whose attitude is patronizing”

Again, this relates back to understanding your target audience. If you don’t understand the audience, you can’t target the visualizations at them and present the data in a useful and meaningful way.

Display of closely-read data surely requires the skilled craft of good graphic and poster design: typography, object representation, layout, color, production techniques and visual principles that inform criticism and revision. Too often those skills are accompanied by the ideology of chartjunk and data posterization; excellence in presenting information requires mastering the craft and spurning the ideology.

This point refers back to one made earlier regarding the over embellishment of visualizations, there are many areas within information visualization that require the graphic/design elements to be considered, such as layout, colour choices or how the data is represented. This is (more or less) with the graphic design process should stop, there’s no need to embellish the presentation of the visualization itself, attaching superflous images around the periphery does nothing to improve the visualization of information and only serves to distract the viewer.

To clarify, add detail.

This point is really simple and yet very important and useful, if a point needs clarifying, add more detail to the visualization. Simple.

Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison and choice – as so often it is, then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, requires the viewers to rely on visual memory – a weak skill, to make a contrast, a comparison, a choice.

The human brain can process quite a lot of information when shown it, but visual memory is generally weak; this should be remembered when deciding how to present information to the user – show them what you can on one screen, don’t make them remember details as they move from screen to screen to screen – they’ll forget!

These are just a few of the many points made in Tufte’s book, but they all apply in one way or another to this particular project. As I read more around data visualization I’ll be writing more blog posts on the subject.