Gamification in metadata creation – how do we show “quality” and encourage improvement?

Encouraging the creation of good metadata can be a challenging exercise. Systems for metadata creation need to allow for blank fields to allow incremental or in progress content to be saved, however some fields may be semantically recommended or mandatory for standardisation. So, while a metadata editing tool needs to be flexible to allow evolving content, it also needs to provide feedback to drive improvement.

This leads to three questions:

  1. Can we automatically measure metadata quality?
  2. Can we use this data to encourage metadata editors to more actively participate and improve content?
  3. How can we best show the “quality” of metadata to an editor to achieve better content?

Gamification is a recognised term for encouraging changes in user behaviour through small incentives. The question I’d like to pose for Aristotle is, how can these principles be used to encourage the creation of good metadata. Obviously, ‘good’ is very subjective, but for metadata objects such as a Data Elements, at the bare minimum having an attached “Data Element Concept” and “Value Domain” is a prerequisite for a quality item. Likewise, a piece of metadata with a description is good, but a piece of metadata with a longer description is probably better (but not always which leads to further challenges).

For the moment, let’s assume that basic metrics for “good” metadata can be constructed and applied to a piece of metadata, and that these can be summed together to give a raw score based on the possible sum. This assumption means we can grade metadata completion and get a raw score like “77 passes out of a possible 82″. From these we can derive all sorts of graphics or figures that can influence user behaviour, and it’s that which I’m interested in right now.

First of all, from these figures we can derive a percentage or rank – 77 out of 82 is about 94%, “9/10″, “4.5/5″ or “A-“. This may mean a metadata item has all its relations, all fields are filled out, but one or two are a little shorter than our metrics would like. Perhaps though, the item is described perfectly – adding more text in this case is worse.

Secondly, there is the issue of how to present this visually once we’ve determined a score. There are probably many ways to present this, but for now I want to focus on two – symbols and progress bars. A symbol can be any repeated small graphic, but the best example is a star ranking where metadata is given a rank out of 5 stars.

Once a raw score is computed, we can then normalise this to a score out of 5 and show stars (or other symbols). However, initial discussions suggest that this presents a more abrupt ranking that discourages work-in-progress, rather than displaying further work to do.

An alternative is the use of progress bars to show the completion of an item. Again, this is computed from the raw score and normalised to a percent and then shown to the user. The images show different possible options including a percentage complete, an integer or rounded decimal rankings out of 10. Again, initial discussions suggest that percentages may encourage over work, where users on 94% might strive for 100% by ‘gaming the system’, opposed to users with metadata ranked 9.5/10. For example, if a metadata item has a well written short description but is under a predefined limit resulting in a score of 94%, we need to design a pattern to discourage an editor from ‘adding’ content free text to ‘score’ 100%. The use of colour is also a possible way to gauge progress analogous to stars that many users are familiar with, but raises the questions of how to define ‘cut-offs’ for quality.

Metadata quality tracking in practice

In this section we look at a number of possible options for presenting the quality metrics of metadata using the Aristotle Metadata Registry. At the moment these are just mock-ups, but sample work has shown that dynamic analysis of metadata to quantify “quality” is possible, so here we will address the matter of how to show this.

First of all, it is important to note that these quality rankings can be shown at every stage from first edit all the way through to final publication, so a one-size-fits-all approach may not be the best way. In the simple case, we can look at the difference between progress and a star rating. Alongside with all the basic details, metadata can be given a ranking right on the page as well as a status to give users immediate feedback on its fitness for use.

An example of how stars or bars may look for a user.

Secondly, we can look at simple presentation options. Here it’s important to note only one rating would be shown out of all the possible options. Stars offer fewer levels of granularity, and when coloured are bright and distracting. However, progress bars blend quite well, even when coloured, and give more options for embedding textual representations.

Lastly, we can see how these would look when a number of items are shown together. Using a ‘sparklines’ approach, we can use stars or bars to quickly highlight trouble spots in metadata when looking at a large number of items.


For the professional context of a registry, based on initial feedback, there are strengths in the progress bar style that make it more suited for use, however more feedback is required to make a conclusive argument.

Conclusion

This is intended to be be the first in a number of articles that address the issue of best practices in presenting “quality” in the creation of metadata and the implementation of these practices in the Aristotle Metadata Registry. As such, I welcome and encourage comments and feedback into this design process, both from Aristotle users and the broader community.

Key questions for feedback

  1. Question 1: How can we textually show a metadata quality rank to encourage more participation? Possible options: raw values (77/82), percent (94%), normalised (9/10), graded (A-) or something else?
  2. Question 2: How can we visually show metadata rank to encourage participation? Possible options: stars (or others symbols), progress bar, colours, text only or something else?
  3. Question 3: How do we positively encourage completion without adversely encouraging “gaming the system”?
  4. Future questions:
    How do we programmatically measure metadata quality?
    Based on a set of sub-components of quality how and when can we show a user how to improve metadata quality?

Adding advanced user tracking and security to the Aristotle Metadata Registry ecosystem

The Aristotle MetaData Registry is already built on the strong, secure web-framework provided by Django and includes a vast suite of tests that ensure the security of metadata at all stages of the data lifecycle.

But to enhance this, work has begun on a new extension to Aristotle, called Pythias, that incorporates additional user tracking and security to provide peace of mind when deploying large scale metadata systems.

Powered by django-axes and django-user-sessions this will give Aristotle site administrators the power to track login successes and failure, block access at an IP level, automatically lock out user accounts and block concurrent logins. It will also give users the ability to view current logins and remote logout from sessions.

Hiding ModelForm fields in a Django modelformset_factory

A small short and sweet one. I had trouble hiding a field in a Django modelformset_factory and none of the usual places were any help.

It turns out the simple answer is using the undocumented widgets parameter when creating it:

ValuesFormSet = modelformset_factory(
models.MyModel,
fields=('id','order','maximum_occurances'),
widgets={'order':HiddenInput},
)

This also means in a template it shows up as hidden, and isn’t included in the form.fields, but is included in form.hidden_fields – the more you know!

Make sure your continuous testing is continuous

One of the key features of the Aristotle Metadata Registry is its extensive test suite. Every time code is checked-in, the test suites are run and about 20 minutes later I get a notification saying everything is fine… or so it should be.

I recently made a small change to the test suite, that altered no code, and just changed some of the reporting. This shouldn’t have changed how the last tests were run, so they should have completed without problems, but this wasn’t the case.

After a short investigation, I discovered that a library that is used in the Aristotle admin interface had changed in a big way. Unfortunately, I haven’t been able to work on Aristotle as frequently as I’d like over the past few months, so this had gone completely unnoticed. Since the test environment is rebuilt every time the test are run, it was using the most recent version, while my code depended on an earlier version.

Since Aristotle is still in beta, the result wasn’t disastrous, however it still highlights (for me at least) an issue with relying on a green tick in the test suite saying everything is alright – because while the tests might be alright at that point in time, its prone to change.

So if you have to put down a project for a few weeks, or longer, make sure to nudge your code periodically, just to make sure everything is still running ok.

As for how it was fixed, a short alteration to the requirements file got the tests passing again, and a newer version that incorporates the updated library will be coming shortly.

Django-spaghetti-and-meatballs now available on pypi and through pip

The title covers most of it: django-spaghetti-and-meatballs (a little library I was making for producing entity-relationship diagrams from django models) is now packaged and available on PyPI, which means it can be installed via pip super easily:

pip install django-spaghetti-and-meatballs

There is even documentation for django-spaghetti-and-meatballs available on ReadTheDocs, so its all super stable and ready to use. So get it while its still fresh!

There is a live demo on the Aristotle Metadata Registry site, or you can check out the static version below:

A sample erd

Two new projects for data management with django

I’ve recently been working on two new projects through work that I’ve been able to make open source. These are designed to make data and metadata management with Django much easier to do. While I’m not in a position to talk about the main work yet, I can talk about the libraries that have sprung out of it:

The first is the “Django Data Interrogator” which is a Django app for 1.7 and up that allows anyone to create tables of information from a database that stores django models. I can see this being is handy when you are storing lists of people, products or events and want to be able to produce ad-hoc reports similar to “People with the number of sales made”, “Products with the highest sales, grouped by region”. At this stage this is done by giving a list of relations from a base ‘class’, more information is available on the Git repo. I should give apologies to a more well known project with the same acronym – I didn’t pick the name, and will never acronymise this project.

The second is “Django Spaghetti and Meatballs” which is a tool to produce ERD-like diagrams from Django projects – that depending on the colors, and number of models ,looks kind of like a plate of spaghetti. Once given a list of Django apps, this mines the Django content types table and produces an interactive javascript representation, using the lovely VisJs library. This has been really useful for prototyping the database, as while Django code is very readable, as the number of models and cross-app connections grew, this gave us a good understanding of how the wider picture looked. The other big advantage is that this uses Python docstrings, Django help text and field definitions to produce all the text in the diagrams. The example below shows a few models in three apps: Django’s build in Auth models, and the Django notifications and revision apps:

A graph of django models

A sample plate of spicy meatballs – Ingredients: Django Auth, Notifications and Revisions

Database caching and unexplained errors in Django tests

Unit testing is a grand idea, and Django offers an extraordinary capability to test code to an extraordinary degree. But no-one ever wrote a blog post to talk about how great things are…

This is a post about the worst kind of test failure. The intermittent ones, that are hard to replicate as they sometimes show up and sometimes don’t.

Below is a test on an jango item we’ve just retrieved from the database:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        self.assertTrue(foo.bar == 2)
def call_some_method_that_sets_bar(foo_obj,bar):
    f = models.Foo.objects.get(pk=foo_obj.pk)
    f.bar = bar
    f.save()

Everything here seems ok, and seems like it should work. But it might not, for unclear reasons. On line 3 an item is made and saved in the database, and then instantiated as a python object. On line 4, we change a field in the database that sets the value of bar to 2 and then we check that on line 5.

Again it seems correct, but, despite representing the same row in the database, the python object f in
call_some_method_that_sets_bar is a different python object to foo in test_foo and since we haven’t refetched foo from the database, it still has the original value for bar it was instantiated with.

The solution, if you are getting unexplained assertion failures when you are manipulating objects? Re-instantiate them, you might increase your test lengths but it means you are dealing with fresh objects:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        foo = models.Foo.objects.get(pk=foo_obj.pk)
        self.assertTrue(foo.bar == 2)

Here we use Djangos internal pk field, to refetch it, regardless of what other fields exist.

The unspoken financial benefits of open-source software

https://commons.wikimedia.org/wiki/File:Stone-soup-ii-pawn-nitichan.jpg

Stone soup

I have been recently been applying for support from an employer for travel funding to attend the 2015 IASSIST conference to present the Aristotle Metadata Registry and after adding up the cost, I started thinking about the benefits that would justify this expense.

Since the call for comments went out I’ve had two people offer to provide translation for Aristotle-MDR and I started considering the unaccounted for benefits I’ve already received. For arguments sake, lets consider a typical conference registration cost of $500 (AUD or USD) with accommodation and travel being another $1000. For attendance to be beneficial, you’d want to be able to see at least $1500 in return.

I started by looking at professional translation costs, which can cost as high as $100 per hour. So, if translating a portion of the project takes an hour, for the two languages that are (or will soon be) available, I can say that Aristotle has received about $200 of volunteer effort. With this in mind, I started thinking about how little support needs to be rallied to quickly provide a return on an investment in attending a conference.

If we consider freelancer developers can be hired for about $50, this means that for our conference, we’d need to get around 30 hours of work – not a small amount, especially when done for free. But broken down across multiple attendees this shrinks dramatically. If a talk is able to encourage moderate participation from as little as 3 people in an audience, this becomes 10 hours of work. Spread again across the course of a year, this is under an hour a month!

Given the rough numbers above, convincing 3 attendees to provide an hour of work a month gives a very rough approximate of $1800 of service – a 20% return on investment.

Along with programming or user interface development, there are other metrics when calculating the value generated from open-source. As a developer, I know the intrinsic value of a well written bug report, so even discovered bugs that lead to improvements are highly valuable for a project. This means that numbers of filed and closed bugs can be used as a rough metric (albeit a very, very rough metric) for positive contributions.

Ultimately, while there are strong ideological reasons for contributing to open-source, when developing for open-source projects within a business context these need to be offset with solid financial rational.

Request for comments/volunteers for the Aristotle Metadata Registry

This is a request for comments and volunteers for an open source ISO 11179 metadata registry I have been working on called the Aristotle Metadata Registry (Aristotle-MDR). Aristotle-MDR is a Django/Python application that provides an authoring environment for a wide variety of 11179 compliant metadata objects with a focus to being multilingual. As such, I’m hoping to raise interest around bug checkers, translators, experienced HTML and Python programmers and data modelers for mapping of ISO 11179 to DDI3.2 (and potentially other formats).

For the eager:

Background

Aristotle-MDR is based on the Australian Institute of Health and Welfare’s METeOR Registry, an ISO 11179 compliant authoring tool that manages several thousand metadata items for tracking health, community services, hospital and primary care statistics. I have undertaken the Aristotle-MDR project to build upon the ideas behind Meteor, and extend it to improve compliance with 11179, but to also allow for access and discovery using other standards, including DDI and GSIM.

Aristotle-MDR is build on a number of existing open source frameworks, including Django, Haystack, Bootstrap and jQuery which allows it to easily scale from mobile to desktop on the client side, and scale from small shared hosting to full-scale enterprise environments on the server side. Along with the in-built authoring suite is the Haystack search platform which allows for a range of searching solutions from enterprise search such as Solr or Elastisearch, to smaller scale search engines.

The goal of the Aristotle-MDR is to conform to the ISO/IEC 11179 standard as closely as possible, so while it has a limited range of metadata objects, much like the 11179 standard it allows for the easy extension and inclusion of additional items. Among those already available, are extensions for:

Information on how to create custom objects can be found in the documentation: http://aristotle-metadata-registry.readthedocs.org/en/latest/extensions/index.html

Due to the wide variety of needs for users to access information, there is a download extension API that allows for the creation of a wide variety of download formats. Included is the ability to generate PDF versions of content from simple HTML templates, but an additional module allows for the creation of DDI3.2 (at the moment this supports a small number of objects only): https://github.com/aristotle-mdr/aristotle-ddi-utils

As mentioned, this is a call for comments and volunteers. First and foremost I’d appreciate as much help as possible with my mapping of 11179 objects in DDI3.2 (or earlier versions), but also with the translations for the user interface – which is currently available in English and Swedish (thanks to Olof Olsson). Partial translations into other languages are available thanks to translations in the Django source code, but additional translations around technical terms would be appreciated. More information on how to contribute to translating is available on the wiki: https://github.com/aristotle-mdr/aristotle-metadata-registry/wiki/Providing-translations.

To aid with this I’ve added a few blank translation files in common languages. Once the repository is forked, it should be relatively straightforward to edit these in Github and send a pull request back without having to pull down the entire codebase. These are listed by ISO 639-1 code, and if you don’t see your own listed let me know and I can quickly pop a boilerplate translation file in.

https://github.com/aristotle-mdr/aristotle-metadata-registry/tree/master/aristotle_mdr/locale

If you find bugs or identify areas of work, feel free to raise them either by emailing me or by raising a bug on Github: https://github.com/aristotle-mdr/aristotle-metadata-registry/issues

Aristotle MetaData Registry now has a Github organisation

This weekends task has been upgrading Aristotle from a single user repository to a Github organisation. The new Aristotle-MDR organisation holds the main code for the Aristotle Metadata Registry, but alongside that it also has the DDI Utilities codebase and some additional extensions, along with the new “Aristotle Glossary” extension.

This new extension pulls the Glossary code base out of the code code to improve it status as a “pure” ISO/IEC 11179 implementation as stated in the Aristotle-MDR mission statement. It will also provide additional Django post-save hooks to provide easy look-ups from Glossary items, to any item that requires the glossary item in its definition.

If you are curious about the procedure for migrating an existing project from a personal repository to an organisation, I’ve written a step-by-step guide on StackExchange that runs through all of the steps and potential issues.