Posts Tagged ‘ eddi

Always double check the standard before writing code

A few weeks ago, I had the privilege of presenting at a collection of DDI Developers in Gothenburg at EDDI. There I presented one of my larger pieces of work, the Virgil-UI DDI Codelist Editor, for critique. While there I received advice, praise and most importantly constructive criticism for which I am grateful. However, this has brought to light a rather large problem.

It was pointed out that I made a small error when dealing with <Code> elements in DDI and accidentally gave them @id attributes, and it was noted that this should be an easy fix. Unfortunately, due to my missing this very early on in the development of Virgil the underlying model relies on Codes having ids to be able to easily make connections between the hierarchical user interface, the <Code>s and the <Category>s that give them meaning.

What this means is that both the DDI coming out of Virgil is invalid, and any valid DDI would not actually be able to be read by Virgil. Essentially, the Virgil model for handling DDI is broken and needs to be almost entirely rewritten and this might take quite a while.

Unfortunately, at this stage rewriting also means re-examining a lot of the initial ideas about what Virgil should be and has highlighted some interesting questions about the DDI model and DDI software, such as:

  1. Is abstracting the DDI model away from a user a good approach to software design? Yes.
    This was the crux of my talk at EDDI, and I still feel that abstracting the DDI model away from day-to-day users is necessary. The DDI model is complex and covers a wide range of tasks. I believe that designing software that helps users relate the model to specific tasks they are trying to do is a key to getting people to use DDI and think about how they can make their metadata support themselves and those around them.
  2. Is DDI a standard that is suitable to use for day to day management of information? Probably.
    In practice, the DDI standard needs to be able to be passed between software if it is to move from an archival standard to a practical statistical metadata standard. One of the things I wanted to achieve with Virgil, was a tool that not only produced DDI, but could also consume it from other sources. In the simplest case this to me meant being able to take a DDI file, and edit the contents of part of it, leaving the rest untouched, and in a lot of cases this is possible with DDI. However, since having to rethink how to manage classifications using DDI, I have realised that there are some objects that are not captured well within DDI and unfortunately classifications are one such example.
  3. Is the DDI model for managing codelists and classifications good enough? Sadly not.
    One of the reasons I relied so heavily on the invalid <Code> @ids was that I needed a hook to tie codes and categories together and without this it becomes very difficult to manage what a ‘classification’ is in DDI. Furthermore, classifications don’t exist in DDI per se, but are a rather loose agreement that if you combine <CodeScheme>s and <CategoryScheme>s you get a good approximation. However, this falls apart when we try to document the classification itself.
    For example, where do you store the name of a whole classification? There are three viable places (excuse the XPath) – as a //CodeScheme/Label (being the label of the hierarchy), as a //CategoryScheme/Label (being the label of the collection of classifying categories) or as a //LogicalProduct/Label (the label of the immediate parent that contains both the hierarchies and the categories).
    However, each of these approaches has inherent issues, as neither of these are the documented way to manage this information, and if 3 different agencies approached the problem in different ways, then their metadata becomes incomparable. This needs to be discussed further, as it will become a bigger issue as more tools start to try and manage such an important, and conceptually early in the lifecycle piece of metadata.

It should be noted that these issues don’t excuse overlooking the actual standard leading to this predicament. However, given the chance to re-examine how to correct the problem in Virgil, also gives me a chance to examine some of the issues I came across while trying to maintain classifications within DDI. Over the coming month or so while I am going to continue writing up some of the issues I identified with classifications within DDI3.1, how to work around these in the short term, and look at ways to correct the problem in future versions of the standard.

Lastly, in the short-term there will be an update to correct the Code/id problem in the CSV to DDI conversion, so the original use case of being able to mine legacy systems to produce valid DDI will still be filled.

Thanks again to everyone at EDDI for their input and company.

Farewell to Europe (and EDDI) for another year

Here I sit in Helsinki Airport, awaiting a bitter sweet flight home. While it is always good to go home and be with my family and friends, I know I am leaving quite a few behind here in Europe and beyond.

By all accounts, the European DDI Users Group meetings were a great success. Along with seeing all the work people have done of the last year, we were able to sit and discuss and debate for several days and have a solid plan for future work.

While I was only at the Developers meetings, we covered improvements to the website, new ways of managing large DDI instances in relational and non-relational databases, examined new (and forgotten) ways to design software, debated the best ways to handle automated ID creation, listened to the results of the semantic DDI workshops, learned about the DDI Agency Registry, debated reducing or removing namespaces from DDI, raised the possibility of a shared DDI Blog/News aggregator and started the creation of not one, but two major additions to the DDI community – a new transport element nicknamed “The DDI Bucket” and started laying the groundwork for a DDI RESTful web interface standard.

And that was in just 3 days! And I am still eagerly awaiting to see how the “Data Without Borders” and “Longitudinal DDI” workshops went.

The week was made even more productive by the use of Google Docs to create a single, living recollection of the event. Watching everyone type up their notes in real time was great. Over the next few week I (and hopefully the rest of the DDI Developers community) will continue to clean up our collaborative notes and look forward to presenting information and recommendations to the whole DDI community in the new year.

We also discussed upcoming meetings for the DDI Developers group and 3 possibilities were raised, at IASSIST in June, RC33 in July and EDDI next December. While events will most likely go on at all of these events, I strongly encourage those who can come to RC33 to be held in Sydney next July to speak up or at the least contact me in private. There is a wealth of talent in Australia and New Zealand who are well worth getting in contact with and with a large enough group of DDI members in Australia I think a “DDI Developers Down-under” would be well attended and well worth the trip.

So with that in mind thankyou to everyone in the DDI Community for a great week – and especially to Olof Olsson of SND for kindly offering me a place to stay during the week. It was a fantastic week, and served to remind me how if you work hard you can contribute to a community, being called upon to answer questions during the meetings (and once during the question time of someone else’s talk!) was especially flattering. This has truly re-invigorated my love of metadata (I spent the better part of my evenings in Rome madly writing ideas for tutorials and examples I foolishly volunteered for during the meetings)

So, with that I wish the entire DDI Community a Merry Christmas, Happy Holidays and Happy New Year and look forward to seeing everyone again in the new year, be it in Washington for IASSIST 2012, Sydney for RC33 or wonderful Bergen for EDDI 2012!!!

Arrivederci

Arrivederci

https://lh6.googleusercontent.com/-Bt7M3EmQOVo/TuY2PWCS8hI/AAAAAAAAEOc/nmKN-M6tK-M/s512/IMG_20111211_184959.jpg

Upcoming improvements to the DDI Website

EDDI has generated a lot of discussion around DDI, and one area that I have been most interested in and have been guiding discussions around is examining how to improve the DDI Alliance website. As the Web Maintenance Chair, it would be great to rest on my laurels and admit the Website is perfect and leave it at that.

However, it isn’t and I wont.

So throughout EDDI, I have been compiling a list of gripes and grumbles (as well as positive remarks and suggestions) regarding the website. In the new year I will be sending out a short survey to DDI Users looking at how people use the website, their issues (both positive and negative) and how they think the DDI Website should look in the future.

One main issue that will definitely be addressed as a part of this exercise is the lack of positive examples of DDI available on the web. The reason for this is that the survey itself, and all its metadata will be made available for people to download and study. This will not be an easy job, but I look forward to contacting researchers and developers across the DDI community to help piece this together and make improvements for everyone.

Microupdate – Virgil-UI now has improved multilingual support

After a weekend spent literally fighting to getĀ multilingualĀ support working for me in Python and Qt, Virgil-UI now has wide ranging support for multiple languages – both in the editor and importing from CSVs with unusual character sets.

With this, and the previously unmentioned drag and drop support for reordering classifications, Virgil is approaching a point where it is almost ready for beta testing. In the coming week, I’ll be making a few small tweaks, along with a demonstration video. Hopefully, by early September it should be packaged up as a beta, for widespead testing amongst the DDI community – just in time for the close of submissions for this years European DDI Users meeting.

Below are a few screenshots showing of the two main multilingual support features in Virgil-UI – the ability to add and edit the labels and descriptions of codes, and the ability to view the classification tree in any language that has been added.

Along with all of these changes a number of bugs in the CSV to DDI import tool have been corrected and I’ll be pushing out an updated Windows binary of that alongside the main release of Virgil-UI.

Showing of the basic language editing functionality

Users can even select which language they want the tree structure of the classification to be displayed in.