Course Finder – Complete!

After a break in development and finally fixing a few *minor* niggles, Course Finder is now complete and is available at coursefinder.lncd.lincoln.ac.uk.

An earlier blog post describing (briefly) the process / logic behind the application can be found here.

First off, a quick tour through the application.

       Screen Shot 2013-01-24 at 14.34.53Screen Shot 2013-01-24 at 14.35.05

Search parameters are broken down into 3 categories: subjects studied, subjects interested in and general keywords. The first two are based on benchmark subjects used in QAA validation of programmes. This makes it easier to tie subjects to programmes offered by the university, but does present a problem in that some words aren’t recognized that a potential user may expect. For instance, ‘religion’ is not recognized, instead the QAA benchmark subject is ‘theology’. The latter input field is based on keywords identified by Open Calais (explain in a previous post). This offers a wider range of keywords, but these keywords often link to quite a large range of programmes offered by the university. Obviously some kind of middle ground would be ideal, but that is not explored in this initial application.

Screen Shot 2013-01-24 at 14.36.02Entering the following parameters – studied : education, interested in: psychology, keywords – teacher. Generates the results as shown above. The results are broken down by how many of the specified criteria they meet. In this example, the PGCE course meets two of the specified criteria, which seems reasonable. Selecting the result from the list takes the user through to the screen shown below.

Screen Shot 2013-01-24 at 14.36.15This screen should contain whatever data is considered pertinent to the user discovering if this course is suitable for them. At the moment this screen just contains the aims and objectives of the programme, but can easily be extended (and should be, should the application be developed and used properly). The screen also allows the user to recommend this course as being relevant to their search parameters, as well as showing similar courses. Similar courses are (at the moment) based on the keywords as identified by Open Calais, with suitable limits and restrictions put in place. These restrictions are discussed in an earlier blog post, but there seems to be a fine line between having every course linked through keywords such as education (everything identifies with this keyword as they’re university courses) and having very few courses identified as being similar, even when they obviously are. Further work would have to be done on this to improve it.

Recommending the course as matching your criteria improves the search process for anyone else that may search for the criteria you specified. If entering the same criteria in a new search, the results look slightly different – recommended results have a star next to them, indicating that they have been recommended.

Screen Shot 2013-01-24 at 14.36.54As mentioned previously, this application (at it currently stands) is merely a proof of concept, and would require further development before being used properly. Points to consider for further development include:

  • How should we allow users to enter search criteria? A free-form text field would allow users to search for exactly what they want, but searching all of the relevant data sources would prove challenging and would be very time consuming. Using high-level JACS codes means that far too many potential matches are returned, with only a few courses being truly relevant, surrounded by a large amount of, essentially, noise.
  • If the OpenCalais keywords are to be used, more work would have to be done on determining the correct level of filtering to be carried out. At the moment courses that are obviously similar are not being shown as so. This is, however, an improvement over a previous point at which a ridiculously high amount of courses were being identified as similar, when they obviously were not.
  • A lot of data is being stored relating to click-throughs for search results. This *could* be integrated into the search rankings, so that courses that are often clicked on are moved further up the rankings.

There are many ways in which this application could, and should, be improved, but as a proof of concept it demonstrates what can be done with course data to potentially improve the student experience.

Designing a Course Finder Application

Since we now have access to a very large amount of course data, it is possible to look at ways of improving the presentation of, and access to, this data for (for example) potential students. As such, I’m looking at building a prototype ‘Course Finder’ application.

Building on my work that I outlined in my previous post, we can now identify keywords for all of the courses offered at the university. This offers one way that suitable courses that can be identified for users of the application, likely to be potential students. These courses can also be linked to JACs codes, representing the subjects covered by / in the courses. Courses are also delivered at a particular level – foundation degree, bachelors degree, masters etc.

The criteria that I am currently considering using to identify potential courses for users are: subjects previously studied;  subjects interested in and keywords (identified with Open Calais).

As well as using these parameters to produce search results, I have also included features within the application to record ‘click-through’ on search results, as well as the ability to ‘recommend’ a search result as being appropriate and relevant to the search parameters outlined above. As such, the application should ‘learn’ as more and more searches are carried out. If parameters A,B and C are specified and one of the courses recommended as being relevant, then the next time a search is carried out using parameters A,B and C, the same courses should be highlighted as being potentially more useful and relevant to the user.

Database Design

Most of the data required to execute the searches is available from our Academic Programme Management System, through our Nucleus data store. The data relating to the individual search instances, as well as recording click-throughs and recommendations will obviously need to be stored within the application’s database. It may also be necessary to locally store some of the data from Nucleus, in order to improve performance by essentially caching views on the data that are unlikely to change too often, such as links between keywords and courses.

Tables storing data locally include:

  • keyword course links
  • search instances
  • search click-throughs
  • search interests
  • search keywords
  • search studied
  • search recommendations
  • subjects
  • similar courses

The majority of the data stored in the tables listed above relates to Coursefinder-specific functionality, some data has been ‘cached’ from Nucleus, purely to save multiple API calls for data that will change very rarely.

In a follow-up post, I’ll show the created application and describe the benefits and limitations.

 

In Search of Similar Courses

Or, ‘How I went round and round in circles….. and then round and round some more’

One of the initial ideas that was suggested back at the beginning of the project was a way of mining course data in order to provide suggestions for similar courses. Since we now have access to the APMS data (a lot more data than my dummy set), finding ways of suggesting similar courses is now something that I can attempt properly.

The first step in the process was deciding on a method of finding keywords from within the various text descriptions of programmes and modules. OpenCalais, which has been used in a previous project at the university – JISCPress, was one such option.

OpenCalais, which has an easy to use API, takes a body of text and returns a series of keywords, broken down by type, and their relevancy score, which indicates how strongly an identified keyword is relevant to the body of text. Initially I looked at using an existing PHP library that would interface with the API (reinventing the wheel and all that), but found that they did not return all of the data I wanted in a easy to use manner, so I wrote my own code, which will be available on Github.

When I first started this process, I had access to data relating to 6436 modules of study, which are part of 878 individual programmes of study for a total of 349 courses. (If one course has two different years’ intake, it will be represented by 2 programmes. A similar situation exists with modules.)

In the first instance, I looked at generating keywords for all programmes of study (a mistake, which I cover later) and modules. I decided to use the following fields to generate keywords for programmes:

  • ‘Aims and Objectives’
  • ‘Introduction’
  • ‘Specialist Facilities’
  • ‘Career Opportunities’

I also used the ‘Synopsis’ field for modules.

This process generated 3,335 keywords, broken down into 36 types of keywords. 17,835 links were generated between programmes of study and keywords and 19,255 links between keywords and modules.

Continue reading