In Search of Similar Courses

Or, ‘How I went round and round in circles….. and then round and round some more’

One of the initial ideas that was suggested back at the beginning of the project was a way of mining course data in order to provide suggestions for similar courses. Since we now have access to the APMS data (a lot more data than my dummy set), finding ways of suggesting similar courses is now something that I can attempt properly.

The first step in the process was deciding on a method of finding keywords from within the various text descriptions of programmes and modules. OpenCalais, which has been used in a previous project at the university – JISCPress, was one such option.

OpenCalais, which has an easy to use API, takes a body of text and returns a series of keywords, broken down by type, and their relevancy score, which indicates how strongly an identified keyword is relevant to the body of text. Initially I looked at using an existing PHP library that would interface with the API (reinventing the wheel and all that), but found that they did not return all of the data I wanted in a easy to use manner, so I wrote my own code, which will be available on Github.

When I first started this process, I had access to data relating to 6436 modules of study, which are part of 878 individual programmes of study for a total of 349 courses. (If one course has two different years’ intake, it will be represented by 2 programmes. A similar situation exists with modules.)

In the first instance, I looked at generating keywords for all programmes of study (a mistake, which I cover later) and modules. I decided to use the following fields to generate keywords for programmes:

  • ‘Aims and Objectives’
  • ‘Introduction’
  • ‘Specialist Facilities’
  • ‘Career Opportunities’

I also used the ‘Synopsis’ field for modules.

This process generated 3,335 keywords, broken down into 36 types of keywords. 17,835 links were generated between programmes of study and keywords and 19,255 links between keywords and modules.

Continue reading

Dev8eD and the XCRI Course Aggregator

On the 29th and 30th of May I attended the dev8eD conference in Birmingham, which was organized for ” …developers, educational technologists and users working throughout education on the development of tools, widgets, apps and resources aimed at staff in education and enhancing the student learning experience.”

There was several organised sessions that were of direct relevance to the ON Course project, including sessions on XCRI-CAP and the XCRI-CAP Aggregator currently being developed.

One of the challenges at dev8eD was to make use of the data available through the XCRI-CAP Aggregator and present it in useful / meaningful / interesting ways; with some help from Dale Mckeown who also works the University of Lincoln, I created a mashup of data from multiple sources to make a rudimentary course search engine.

The mashup uses data from the course aggregator (currently searching only by keywords) with geo-location data, university league table data and pub location data. The course finder also links to local crime data for institutions and ‘cost of living’ data. These latter two data sources were used to show how external data sets can be used to enrich the searching experience, providing further context for the wider surroundings and environment of universities.

When more XCRI feeds are added to the aggregator, the quality and quantity of data available through the aggregator’s API will obviously increase, meaning that such a search engine would become more useful. In its current state, the website acts as a good prototype for search functionality, and also demonstrates the potential for ‘mashing up’ XCRI data with numerous other datasets.

The code (such as it is) for the website is available on Github.