Posts Tagged ‘ python

Make sure your continuous testing is continuous

One of the key features of the Aristotle Metadata Registry is its extensive test suite. Every time code is checked-in, the test suites are run and about 20 minutes later I get a notification saying everything is fine… or so it should be.

I recently made a small change to the test suite, that altered no code, and just changed some of the reporting. This shouldn’t have changed how the last tests were run, so they should have completed without problems, but this wasn’t the case.

After a short investigation, I discovered that a library that is used in the Aristotle admin interface had changed in a big way. Unfortunately, I haven’t been able to work on Aristotle as frequently as I’d like over the past few months, so this had gone completely unnoticed. Since the test environment is rebuilt every time the test are run, it was using the most recent version, while my code depended on an earlier version.

Since Aristotle is still in beta, the result wasn’t disastrous, however it still highlights (for me at least) an issue with relying on a green tick in the test suite saying everything is alright – because while the tests might be alright at that point in time, its prone to change.

So if you have to put down a project for a few weeks, or longer, make sure to nudge your code periodically, just to make sure everything is still running ok.

As for how it was fixed, a short alteration to the requirements file got the tests passing again, and a newer version that incorporates the updated library will be coming shortly.

Two new projects for data management with django

I’ve recently been working on two new projects through work that I’ve been able to make open source. These are designed to make data and metadata management with Django much easier to do. While I’m not in a position to talk about the main work yet, I can talk about the libraries that have sprung out of it:

The first is the “Django Data Interrogator” which is a Django app for 1.7 and up that allows anyone to create tables of information from a database that stores django models. I can see this being is handy when you are storing lists of people, products or events and want to be able to produce ad-hoc reports similar to “People with the number of sales made”, “Products with the highest sales, grouped by region”. At this stage this is done by giving a list of relations from a base ‘class’, more information is available on the Git repo. I should give apologies to a more well known project with the same acronym – I didn’t pick the name, and will never acronymise this project.

The second is “Django Spaghetti and Meatballs” which is a tool to produce ERD-like diagrams from Django projects – that depending on the colors, and number of models ,looks kind of like a plate of spaghetti. Once given a list of Django apps, this mines the Django content types table and produces an interactive javascript representation, using the lovely VisJs library. This has been really useful for prototyping the database, as while Django code is very readable, as the number of models and cross-app connections grew, this gave us a good understanding of how the wider picture looked. The other big advantage is that this uses Python docstrings, Django help text and field definitions to produce all the text in the diagrams. The example below shows a few models in three apps: Django’s build in Auth models, and the Django notifications and revision apps:

A graph of django models

A sample plate of spicy meatballs – Ingredients: Django Auth, Notifications and Revisions

Database caching and unexplained errors in Django tests

Unit testing is a grand idea, and Django offers an extraordinary capability to test code to an extraordinary degree. But no-one ever wrote a blog post to talk about how great things are…

This is a post about the worst kind of test failure. The intermittent ones, that are hard to replicate as they sometimes show up and sometimes don’t.

Below is a test on an jango item we’ve just retrieved from the database:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        self.assertTrue( == 2)
def call_some_method_that_sets_bar(foo_obj,bar):
    f = models.Foo.objects.get( = bar

Everything here seems ok, and seems like it should work. But it might not, for unclear reasons. On line 3 an item is made and saved in the database, and then instantiated as a python object. On line 4, we change a field in the database that sets the value of bar to 2 and then we check that on line 5.

Again it seems correct, but, despite representing the same row in the database, the python object f in
call_some_method_that_sets_bar is a different python object to foo in test_foo and since we haven’t refetched foo from the database, it still has the original value for bar it was instantiated with.

The solution, if you are getting unexplained assertion failures when you are manipulating objects? Re-instantiate them, you might increase your test lengths but it means you are dealing with fresh objects:

class TestFoo:
    def test_foo(self):
        foo = models.Foo.objects.create(bar=1) # create an item
        call_some_method_that_sets_bar(foo,bar = 2)
        foo = models.Foo.objects.get(
        self.assertTrue( == 2)

Here we use Djangos internal pk field, to refetch it, regardless of what other fields exist.

Request for comments/volunteers for the Aristotle Metadata Registry

This is a request for comments and volunteers for an open source ISO 11179 metadata registry I have been working on called the Aristotle Metadata Registry (Aristotle-MDR). Aristotle-MDR is a Django/Python application that provides an authoring environment for a wide variety of 11179 compliant metadata objects with a focus to being multilingual. As such, I’m hoping to raise interest around bug checkers, translators, experienced HTML and Python programmers and data modelers for mapping of ISO 11179 to DDI3.2 (and potentially other formats).

For the eager:


Aristotle-MDR is based on the Australian Institute of Health and Welfare’s METeOR Registry, an ISO 11179 compliant authoring tool that manages several thousand metadata items for tracking health, community services, hospital and primary care statistics. I have undertaken the Aristotle-MDR project to build upon the ideas behind Meteor, and extend it to improve compliance with 11179, but to also allow for access and discovery using other standards, including DDI and GSIM.

Aristotle-MDR is build on a number of existing open source frameworks, including Django, Haystack, Bootstrap and jQuery which allows it to easily scale from mobile to desktop on the client side, and scale from small shared hosting to full-scale enterprise environments on the server side. Along with the in-built authoring suite is the Haystack search platform which allows for a range of searching solutions from enterprise search such as Solr or Elastisearch, to smaller scale search engines.

The goal of the Aristotle-MDR is to conform to the ISO/IEC 11179 standard as closely as possible, so while it has a limited range of metadata objects, much like the 11179 standard it allows for the easy extension and inclusion of additional items. Among those already available, are extensions for:

Information on how to create custom objects can be found in the documentation:

Due to the wide variety of needs for users to access information, there is a download extension API that allows for the creation of a wide variety of download formats. Included is the ability to generate PDF versions of content from simple HTML templates, but an additional module allows for the creation of DDI3.2 (at the moment this supports a small number of objects only):

As mentioned, this is a call for comments and volunteers. First and foremost I’d appreciate as much help as possible with my mapping of 11179 objects in DDI3.2 (or earlier versions), but also with the translations for the user interface – which is currently available in English and Swedish (thanks to Olof Olsson). Partial translations into other languages are available thanks to translations in the Django source code, but additional translations around technical terms would be appreciated. More information on how to contribute to translating is available on the wiki:

To aid with this I’ve added a few blank translation files in common languages. Once the repository is forked, it should be relatively straightforward to edit these in Github and send a pull request back without having to pull down the entire codebase. These are listed by ISO 639-1 code, and if you don’t see your own listed let me know and I can quickly pop a boilerplate translation file in.

If you find bugs or identify areas of work, feel free to raise them either by emailing me or by raising a bug on Github:

Beginning the soft launch of SQBL and Canard

Over the past week I’ve start finalising a version of Canard and SQBL ready for early-Beta testing and public review ahead of IASSIST2013. While I’ll be putting together more documentation later in the week, the first of a series of short tutorials on how Canard will eventually be used.

Also, later this week will see the source code for Canard as shown in the below video released on GitHub, as well as a beta binary for easy of use during testing. For now the SQBL schemas can be seen on GitHub and the main SQBL website contains more information. For now, enjoy the two videos below to see how a strict structure can make questionnaire design easier than ever before!

How you can and why you should learn to program.

Often people ask me how I learned to program and why I did. The answer is simple, lots of practice because its a useful skill – and it also helps that I enjoy it.

Not everyone will find programming fun or interesting, however at more than one time people will come up against a problem that computers were made to do. Contrary to popular believe computers are dumb – at least in the sense that they can only do what they are told. What they can do, is they can do these dumb things very, very quickly. So much so, that they can fool you into believing they are smart – more than smart, magic even. In fact, if you are even a mediocre programmer people can become convinced you are a magician.

So why should you learn to program?

Mostly, because your time is valuable. Not to me, but to you. Unlike a computer your time is finite, and if you can make a machine that can give you more time to do something, isn’t that in your interest? Even if it isn’t in work, if its just sorting your taxes, or writing a script to check your email for you, there are plenty of small, repetative tasks that you probably do that a machine can do quicker. If you enjoy doing repetitive tasks, then there isn’t much I can do for you. But if you want to spend more time understanding why you do these things, then read on…

Where do you begin?

Well, I think, there are 3 programs every one must be able to write. Because if you can write these 3 programs and adapt them to your needs, you can do most big, boring tasks that will come your way.

There are 3 programs you need to learn to start to become a programmer and as I explain them, I’ll show you a brief example to edit and play with and ultimately understand what is going on. These examples are written in Python, a free programming language with a very user friendly syntax.

Hello, World!

“Hello world” is traditionally the first program many new programmers will write. It is simple, when the program is run the computer prints “Hello, World!”. In essence, this simple program introduces programming syntax and demonstrates how to display text to a user.

print "Hello, World!"

There isn’t much to this, but its a starting point. It teaches you some basic syntax and with a lot of languages understanding syntax is important – computers don’t speak English and to make them useful you need to learn to talk to them, more than the other way around.

Simple user interaction

The second is a simple string manipulator. This goal is to create a simple, persistent user interface, with some error checking that fulfills a task. Here we see an example that does actions on a given string based on a command given to it.

while True:
        input = raw_input("> ")
                cmd,text = input.split(":",1)
                if cmd == "uc":
                        print text.upper()
                elif cmd == "lc":
                        print text.lower()
                elif cmd == "rev":
                        print text
                elif cmd == "quit":
                        print "Bye"
                        print "Command not recognised"
                print "Syntax error: enter a command, a colon (:), then a string"

Firstly, we start the loop and set it to never stop looping. As long as the user wants to play with strings this program will keep going. Next we ask for some input from the user.
Now things get a little more complex, first we try and split the string around a colon, into a command and the text. If the user doesn’t enter a colon, we throw an error, and give them some help text (after the line that says “except:”.
If they do enter a command and text, we set the text to upper case, lower case or reverse it. If they tell us to “quit:” we quit by breaking out of the loop (the break command) or we tell them we didn’t recognise the command.

Its not perfect, but it gives us an understanding of errors, handling user input and basic user interaction – not bad for 18 lines of code.

File manipulation

The last and most important program is a simple file manipulation tool. Again, what the tool does to the file is irrelevant- it might merge files, look for spelling errors, count lines, anything. Perhaps, we are looking for entries in a diary that start with numbers (like dates) in a large file, and only want to view these.

file = open('test')
for line in file:
    if line[0].isdigit():
        print line

Here we open the file, and then line by line we search through it. When we find a line whose first character is a digit we print the line (line[0] essentially means the 0th character from the start – its complicated but its how almost everyone deals with subscripting in lists). Lastly, we clean everything up by closing the file.

Again, by no means the best implementation, but easy enough to read and alter. This time in 5 lines we have a simple script that could help use find our tax information, search our diary, lookup phone numbers, or anything like this.

I’ve done this, what now?

Do whatever task it is that needs doing. Odds are having read these short code snippets you can get an idea of what can be done. Its just a matter of getting out and doing it. You may not be the next Bill Gates/Mark Zuckerburg/whoever, but if you learn to paint a wall you won’t be the next Da Vinci either. Learn how to do what you need to do, and keep plugging away. Programming is about pulling together pieces of logic, to help us do simple tasks easily and reproducibly. So think lazy, and find all the tasks that you can automate and get something else to do them for you.

Virgil UI 0.0.1 Beta now live!!

After months of development, testing, coding and crying…. Virgil UI version 0.0.1b is now available for public beta testing.

This release sees the first public testing of a full-functional, classification and codelist specific editor based on and supporting the DDI Lifecycle XML format (DLML).

Features in this release of Virgil include:

Known issues in the 0.0.1b release that  will be fixed in a future release:

  • Codes or languages cannot be removed once added.
  • New CodeSchemes cannot be added manually, only when importing from CSV.
Also new is an updated version of the standalone CSV to DDI converter tool that fixes some outstanding bugs in multilingual imports and corrects a few mistakes when writing the DLML.
For more information on Virgil-UI there is a list of blog post outlining the development process, or you can checkout the Google Code page, view all the downloads, or submit bugs.

Updates to the Virgil CSV to DDI Converter

A short and sweet update:

There was an oversight with the CSV converter not converting coded values to the proper place in the created DDI XML. This has been fixed and the changes have been pushed into SVN and a new version (0.0.2b) of the executable has been released on Google Code.