Make sure your continuous testing is continuous

One of the key features of the Aristotle Metadata Registry is its extensive test suite. Every time code is checked-in, the test suites are run and about 20 minutes later I get a notification saying everything is fine… or so it should be.

I recently made a small change to the test suite, that altered no code, and just changed some of the reporting. This shouldn’t have changed how the last tests were run, so they should have completed without problems, but this wasn’t the case.

After a short investigation, I discovered that a library that is used in the Aristotle admin interface had changed in a big way. Unfortunately, I haven’t been able to work on Aristotle as frequently as I’d like over the past few months, so this had gone completely unnoticed. Since the test environment is rebuilt every time the test are run, it was using the most recent version, while my code depended on an earlier version.

Since Aristotle is still in beta, the result wasn’t disastrous, however it still highlights (for me at least) an issue with relying on a green tick in the test suite saying everything is alright – because while the tests might be alright at that point in time, its prone to change.

So if you have to put down a project for a few weeks, or longer, make sure to nudge your code periodically, just to make sure everything is still running ok.

As for how it was fixed, a short alteration to the requirements file got the tests passing again, and a newer version that incorporates the updated library will be coming shortly.

Django-spaghetti-and-meatballs now available on pypi and through pip

The title covers most of it: django-spaghetti-and-meatballs (a little library I was making for producing entity-relationship diagrams from django models) is now packaged and available on PyPI, which means it can be installed via pip super easily:

pip install django-spaghetti-and-meatballs

There is even documentation for django-spaghetti-and-meatballs available on ReadTheDocs, so its all super stable and ready to use. So get it while its still fresh!

There is a live demo on the Aristotle Metadata Registry site, or you can check out the static version below:

A sample erd

Two new projects for data management with django

I’ve recently been working on two new projects through work that I’ve been able to make open source. These are designed to make data and metadata management with Django much easier to do. While I’m not in a position to talk about the main work yet, I can talk about the libraries that have sprung out of it:

The first is the “Django Data Interrogator” which is a Django app for 1.7 and up that allows anyone to create tables of information from a database that stores django models. I can see this being is handy when you are storing lists of people, products or events and want to be able to produce ad-hoc reports similar to “People with the number of sales made”, “Products with the highest sales, grouped by region”. At this stage this is done by giving a list of relations from a base ‘class’, more information is available on the Git repo. I should give apologies to a more well known project with the same acronym – I didn’t pick the name, and will never acronymise this project.

The second is “Django Spaghetti and Meatballs” which is a tool to produce ERD-like diagrams from Django projects – that depending on the colors, and number of models ,looks kind of like a plate of spaghetti. Once given a list of Django apps, this mines the Django content types table and produces an interactive javascript representation, using the lovely VisJs library. This has been really useful for prototyping the database, as while Django code is very readable, as the number of models and cross-app connections grew, this gave us a good understanding of how the wider picture looked. The other big advantage is that this uses Python docstrings, Django help text and field definitions to produce all the text in the diagrams. The example below shows a few models in three apps: Django’s build in Auth models, and the Django notifications and revision apps:

A graph of django models

A sample plate of spicy meatballs – Ingredients: Django Auth, Notifications and Revisions

Database caching and unexplained errors in Django tests

Unit testing is a grand idea, and Django offers an extraordinary capability to test code to an extraordinary degree. But no-one ever wrote a blog post to talk about how great things are…

This is a post about the worst kind of test failure. The intermittent ones, that are hard to replicate as they sometimes show up and sometimes don’t.

Below is a test on an jango item we’ve just retrieved from the database:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        self.assertTrue( == 2)
def call_some_method_that_sets_bar(foo_obj,bar):
    f = models.Foo.objects.get( = bar

Everything here seems ok, and seems like it should work. But it might not, for unclear reasons. On line 3 an item is made and saved in the database, and then instantiated as a python object. On line 4, we change a field in the database that sets the value of bar to 2 and then we check that on line 5.

Again it seems correct, but, despite representing the same row in the database, the python object f in
call_some_method_that_sets_bar is a different python object to foo in test_foo and since we haven’t refetched foo from the database, it still has the original value for bar it was instantiated with.

The solution, if you are getting unexplained assertion failures when you are manipulating objects? Re-instantiate them, you might increase your test lengths but it means you are dealing with fresh objects:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        foo = models.Foo.objects.get(
        self.assertTrue( == 2)

Here we use Djangos internal pk field, to refetch it, regardless of what other fields exist.

The unspoken financial benefits of open-source software

Stone soup

I have been recently been applying for support from an employer for travel funding to attend the 2015 IASSIST conference to present the Aristotle Metadata Registry and after adding up the cost, I started thinking about the benefits that would justify this expense.

Since the call for comments went out I’ve had two people offer to provide translation for Aristotle-MDR and I started considering the unaccounted for benefits I’ve already received. For arguments sake, lets consider a typical conference registration cost of $500 (AUD or USD) with accommodation and travel being another $1000. For attendance to be beneficial, you’d want to be able to see at least $1500 in return.

I started by looking at professional translation costs, which can cost as high as $100 per hour. So, if translating a portion of the project takes an hour, for the two languages that are (or will soon be) available, I can say that Aristotle has received about $200 of volunteer effort. With this in mind, I started thinking about how little support needs to be rallied to quickly provide a return on an investment in attending a conference.

If we consider freelancer developers can be hired for about $50, this means that for our conference, we’d need to get around 30 hours of work – not a small amount, especially when done for free. But broken down across multiple attendees this shrinks dramatically. If a talk is able to encourage moderate participation from as little as 3 people in an audience, this becomes 10 hours of work. Spread again across the course of a year, this is under an hour a month!

Given the rough numbers above, convincing 3 attendees to provide an hour of work a month gives a very rough approximate of $1800 of service – a 20% return on investment.

Along with programming or user interface development, there are other metrics when calculating the value generated from open-source. As a developer, I know the intrinsic value of a well written bug report, so even discovered bugs that lead to improvements are highly valuable for a project. This means that numbers of filed and closed bugs can be used as a rough metric (albeit a very, very rough metric) for positive contributions.

Ultimately, while there are strong ideological reasons for contributing to open-source, when developing for open-source projects within a business context these need to be offset with solid financial rational.

Request for comments/volunteers for the Aristotle Metadata Registry

This is a request for comments and volunteers for an open source ISO 11179 metadata registry I have been working on called the Aristotle Metadata Registry (Aristotle-MDR). Aristotle-MDR is a Django/Python application that provides an authoring environment for a wide variety of 11179 compliant metadata objects with a focus to being multilingual. As such, I’m hoping to raise interest around bug checkers, translators, experienced HTML and Python programmers and data modelers for mapping of ISO 11179 to DDI3.2 (and potentially other formats).

For the eager:


Aristotle-MDR is based on the Australian Institute of Health and Welfare’s METeOR Registry, an ISO 11179 compliant authoring tool that manages several thousand metadata items for tracking health, community services, hospital and primary care statistics. I have undertaken the Aristotle-MDR project to build upon the ideas behind Meteor, and extend it to improve compliance with 11179, but to also allow for access and discovery using other standards, including DDI and GSIM.

Aristotle-MDR is build on a number of existing open source frameworks, including Django, Haystack, Bootstrap and jQuery which allows it to easily scale from mobile to desktop on the client side, and scale from small shared hosting to full-scale enterprise environments on the server side. Along with the in-built authoring suite is the Haystack search platform which allows for a range of searching solutions from enterprise search such as Solr or Elastisearch, to smaller scale search engines.

The goal of the Aristotle-MDR is to conform to the ISO/IEC 11179 standard as closely as possible, so while it has a limited range of metadata objects, much like the 11179 standard it allows for the easy extension and inclusion of additional items. Among those already available, are extensions for:

Information on how to create custom objects can be found in the documentation:

Due to the wide variety of needs for users to access information, there is a download extension API that allows for the creation of a wide variety of download formats. Included is the ability to generate PDF versions of content from simple HTML templates, but an additional module allows for the creation of DDI3.2 (at the moment this supports a small number of objects only):

As mentioned, this is a call for comments and volunteers. First and foremost I’d appreciate as much help as possible with my mapping of 11179 objects in DDI3.2 (or earlier versions), but also with the translations for the user interface – which is currently available in English and Swedish (thanks to Olof Olsson). Partial translations into other languages are available thanks to translations in the Django source code, but additional translations around technical terms would be appreciated. More information on how to contribute to translating is available on the wiki:

To aid with this I’ve added a few blank translation files in common languages. Once the repository is forked, it should be relatively straightforward to edit these in Github and send a pull request back without having to pull down the entire codebase. These are listed by ISO 639-1 code, and if you don’t see your own listed let me know and I can quickly pop a boilerplate translation file in.

If you find bugs or identify areas of work, feel free to raise them either by emailing me or by raising a bug on Github:

Aristotle MetaData Registry now has a Github organisation

This weekends task has been upgrading Aristotle from a single user repository to a Github organisation. The new Aristotle-MDR organisation holds the main code for the Aristotle Metadata Registry, but alongside that it also has the DDI Utilities codebase and some additional extensions, along with the new “Aristotle Glossary” extension.

This new extension pulls the Glossary code base out of the code code to improve it status as a “pure” ISO/IEC 11179 implementation as stated in the Aristotle-MDR mission statement. It will also provide additional Django post-save hooks to provide easy look-ups from Glossary items, to any item that requires the glossary item in its definition.

If you are curious about the procedure for migrating an existing project from a personal repository to an organisation, I’ve written a step-by-step guide on StackExchange that runs through all of the steps and potential issues.

Aristotle-Metadata-Registry – My worst kept secret

About 6 months ago I stopped frequently blogging, as I began work on a project that was not quite ready for a wider audience, but today that period comes to a close.

Over the past year, I have been working on a new piece of open-source software – an ISO/IEC 11179 metadata registry. This originally began from my experiences working on the Meteor Metadata Registry, which gave me an in-depth understanding of the systems and governance issues around the management of metadata across large scale organisations. I believe Aristotle-MDR provides one of the closest open-source implementations of the information model of Part 6 and the registration workflows of Part 3, in an easy to use and install piece of open-source software.

In that time, Aristotle-MDR has grown to several thousand lines of code, most substantially over 5000 line of rigorously tested Python code, tested using a suit of over 500 regression tests, and rich documentation covering installation, configuration and extension. From a front-end perspective, Aristotle-MDR uses the Bootstrap, CKEditor and jQuery libraries to provide a seemless, responsive experience, the use of the Haystack search engine provides scalable and accurate search capability, while custom wizards encourage the discovery and reuse metadata at the point of content creation.

One of the guiding principles of Aristotle-MDR has been to not only model 11179 straight-forward fashion, but do so in a way that complies with the extension principles of the standard itself. To this end, while the data model of Aristotle-MDR is and will remain quite bare-bones, it provides a robust, tested framework on which extensions can be built. Already a number of such extensions are being built, including those for the management of datasets, questionnaires, and performance indicators and for the sharing of information in the Data Documentation Initiative XML Format.

In the last 12 months, I have learned a lot as a systems developer, had the opportunity to contribute to several Django-based projects and look forward to sharing Aristotle, especially at IASSIST 2015 where I aim to present Aristotle-MDR as a stable 1.0 release. In the interim, there is a demonstration server for Aristotle available, with two guest accounts and a few hundred example items for people to use, test and possibly break.

The public release of “A Case Against the Skip Statement”

A few years ago I wrote a paper titled “A Case Against the Skip Statement” on the logic construction of questionnaires that was awarded second place in the 2012 Young Statisticians Awards of the International Association of Official Statistics.

It went through two or three rounds of review over the course of a year, but due to shifting organisational aims, I was never able to get the time to polish it to the point of publication before changing jobs. So for the past few years I have quietly emailed it around and received some positive feedback and have gotten a few requests to have it published so it could be cited. I have even myself referred back to it in conferences and other papers, but never formally cited it myself. I have also used this article as a reason why study of ‘classical’ articles in computer science is still important, for the simple fact that while Djikstra’s “Gotos Considered Harmful” is outdated in traditional computer science, its methods and mathematical and logic reasoning can still be useful, as seem in the comparison of programming languages and the logic of questionnaires.

As a compromise to those requests, I released the full text online, with references and a ready to use Bibtex citation for those who are interested. For those interested the abstract follows the Bibtex reference:

    title = {A Case Against the Skip Statement},
    author ={Samuel Spencer},
    year = 2012,
    howpublished = {\url{}},
    note = {[Date downloaded]}

or using BibLatex:

   author ={Samuel Spencer},
   title ={A Case Against the Skip Statement},
   year = 2012,
   url ={},
   urldate ={[Date downloaded]}

With statistical agencies facing shrinking budgets and a desire to support evidence-based policy in a rapidly changing world, statistical surveys must become more agile. One possible way to improve productivity and responsiveness is through the automation of questionnaire design, reducing the time necessary to produce complex and valid questionnaires. However, despite computer enhancements to many facets of survey research, questionnaire logic is often managed using templates that are interpreted by specialised staff, reducing efficiency. It must then be asked why, in spite of such benefits, is automation so difficult?

This paper suggests that the stalling point of further automation within questionnaire design is the ‘skip statement’. An artifact of paper questionnaires, skip statements are still used in the specification of computer-aided instruments, complicating the understanding of questionnaires and impeding their transition to computer systems. By examining questionnaire logic in isolation we can analyse the structural similarity to computer programming and examine the applicability of hierarchical patterns described in the structured programming theorum, laying a foundation for more structured patterns in questionnaire logic, which in time will help realise the benefits of automation.

Making a login badge with font-awesome

On a new site I’m building I was looking for a way to include a nice login badge, similar to those on Google login pages. In fact I found some nice looking bootstrap login templates that actually included the Google login image below directly.

Google login guy

Turns out the image is actually a square, with a css border-radius applied, so given that the page already loads the whole set of font-awesome icons, I wondered if it was possible to replicate this without loading an image… and it is.

Font-awesome login guy

There were issues getting the user silhouette to sit nicely over a standard font awesome circle, so I went down the route of using the border radius. The colours don’t match exactly, but the advantage of this approach is the colours can be customised to match any theme very quickly. The complete code is pretty straight forward and the gist is below: