Posts Tagged ‘ DDI

Managing Questions in DDI3.1 – “Other, please specify”

A still difficult problem in managing complex questions in DDI is those questions that ask a respondent to pick from a list of options, and if no suitable ones exist, that they write in their own. Below are examples of this kind of question from the US, UK and Australian Censuses (censii/census/censes?):

UK Census Extract
USA Census Extract

ABS Census Extract

In all three questions, respondents are asked about their origins, and are given the option to select from a list of common responses or provide a write in response. The easiest way to manage this is through the use of a DDI <MultipleQuestionItem>. A <MultipleQuestionItem> is a way to capture a complex question that asks two or more separate questions that are highly linked.

In the above examples we can split the questions into two, as illustrated in the generic answer below:

<MultipleQuestionItem>
    <SubQuestions>
        <QuestionItem>
            <QuestionText>
                <LiteralText>
                    What is your ancestral origin?
                </LiteralText>
            </QuestionText>
            <CodeDomain>
                <!-- This CodeDomain would include a reference to the list of countries or races -->
            </CodeDomain>
        </QuestionItem>
        <QuestionItem>
            <QuestionText>
                <LiteralText>
                    Please Specify:
                </LiteralText>
            </QuestionText>
            <TextDomain/>
        </QuestionItem>
    <SubQuestions>
</MultipleQuestionItem>

Here we have been able to split the question, while still managing it in a single item. This is needed as without each other, each subquestion is incomplete. This is not a new concept, and is quite an obvious solution to many people who have tried to solve this issue.

However, there is still the problem that this metadata doesn’t contain the restriction that a respondent should only be able to enter a free text option if the “other” option is selected. While there have been a number of published and attempted solutions, none have been satisfactory. Spliting the question outside of a MultipleQuestionItem and using IfThenElse clauses complicates the structure, and leaving this out makes designing self-interviewed computer systems difficult to manage directly from the metadata.

A possible solution, that resolves both of these issues is through the use of the <SubQuestionSequence>. This is illustrated in the DDI Fragment below:

<MultipleQuestionItem>
    <SubQuestions>
        <QuestionItem>
            <!-- Ancestral origin QuestionItem -->
        </QuestionItem>
        <QuestionItem>
            <!-- Please Specify QuestionItem -->
        </QuestionItem>
    <SubQuestions>
    <SubQuestionSequence>
        <ItemSequenceType>Other</ItemSequenceType>
        <AlternateSequenceType formalLanguage="Name Of Language Here" >
            <!-- Proprietary command to control logic -->
        </AlternateSequenceType>
    </SubQuestionSequence>
</MultipleQuestionItem>

In this we have used the SubQuestionSequence to hold the logic used to indicate when the “Other” field should be allowable. This field is used to control the specific sequence that the SubQuestions are shown, and in this sense we are controling this ordering, just to specify when a member is not shown – an excusable use of the field. This choice can be further rationalised, as an unfamiliar agent, for example when moving to a new piece of software, can still interpret the bulk of the metadata, however when presenting the above question would allow a respondent to fill in both sections. But this is no different to how a respondent of a paper-based survey may answer, so it is no great loss of granularity.

How any given agency may choose to populate the commands contained in the AlternateSequenceType will be an individual choice, and a standard way of expressing this may be needed, but this should help other groups more easy solve this problem by indicating where the solution can go and reducing the problem size.

In the next day or two I will be putting a more solid example up into the DDI Examples Repository for people to work with. As always critiques of these ideas and examples are welcome.

Always double check the standard before writing code

A few weeks ago, I had the privilege of presenting at a collection of DDI Developers in Gothenburg at EDDI. There I presented one of my larger pieces of work, the Virgil-UI DDI Codelist Editor, for critique. While there I received advice, praise and most importantly constructive criticism for which I am grateful. However, this has brought to light a rather large problem.

It was pointed out that I made a small error when dealing with <Code> elements in DDI and accidentally gave them @id attributes, and it was noted that this should be an easy fix. Unfortunately, due to my missing this very early on in the development of Virgil the underlying model relies on Codes having ids to be able to easily make connections between the hierarchical user interface, the <Code>s and the <Category>s that give them meaning.

What this means is that both the DDI coming out of Virgil is invalid, and any valid DDI would not actually be able to be read by Virgil. Essentially, the Virgil model for handling DDI is broken and needs to be almost entirely rewritten and this might take quite a while.

Unfortunately, at this stage rewriting also means re-examining a lot of the initial ideas about what Virgil should be and has highlighted some interesting questions about the DDI model and DDI software, such as:

  1. Is abstracting the DDI model away from a user a good approach to software design? Yes.
    This was the crux of my talk at EDDI, and I still feel that abstracting the DDI model away from day-to-day users is necessary. The DDI model is complex and covers a wide range of tasks. I believe that designing software that helps users relate the model to specific tasks they are trying to do is a key to getting people to use DDI and think about how they can make their metadata support themselves and those around them.
  2. Is DDI a standard that is suitable to use for day to day management of information? Probably.
    In practice, the DDI standard needs to be able to be passed between software if it is to move from an archival standard to a practical statistical metadata standard. One of the things I wanted to achieve with Virgil, was a tool that not only produced DDI, but could also consume it from other sources. In the simplest case this to me meant being able to take a DDI file, and edit the contents of part of it, leaving the rest untouched, and in a lot of cases this is possible with DDI. However, since having to rethink how to manage classifications using DDI, I have realised that there are some objects that are not captured well within DDI and unfortunately classifications are one such example.
  3. Is the DDI model for managing codelists and classifications good enough? Sadly not.
    One of the reasons I relied so heavily on the invalid <Code> @ids was that I needed a hook to tie codes and categories together and without this it becomes very difficult to manage what a ‘classification’ is in DDI. Furthermore, classifications don’t exist in DDI per se, but are a rather loose agreement that if you combine <CodeScheme>s and <CategoryScheme>s you get a good approximation. However, this falls apart when we try to document the classification itself.
    For example, where do you store the name of a whole classification? There are three viable places (excuse the XPath) – as a //CodeScheme/Label (being the label of the hierarchy), as a //CategoryScheme/Label (being the label of the collection of classifying categories) or as a //LogicalProduct/Label (the label of the immediate parent that contains both the hierarchies and the categories).
    However, each of these approaches has inherent issues, as neither of these are the documented way to manage this information, and if 3 different agencies approached the problem in different ways, then their metadata becomes incomparable. This needs to be discussed further, as it will become a bigger issue as more tools start to try and manage such an important, and conceptually early in the lifecycle piece of metadata.

It should be noted that these issues don’t excuse overlooking the actual standard leading to this predicament. However, given the chance to re-examine how to correct the problem in Virgil, also gives me a chance to examine some of the issues I came across while trying to maintain classifications within DDI. Over the coming month or so while I am going to continue writing up some of the issues I identified with classifications within DDI3.1, how to work around these in the short term, and look at ways to correct the problem in future versions of the standard.

Lastly, in the short-term there will be an update to correct the Code/id problem in the CSV to DDI conversion, so the original use case of being able to mine legacy systems to produce valid DDI will still be filled.

Thanks again to everyone at EDDI for their input and company.

Farewell to Europe (and EDDI) for another year

Here I sit in Helsinki Airport, awaiting a bitter sweet flight home. While it is always good to go home and be with my family and friends, I know I am leaving quite a few behind here in Europe and beyond.

By all accounts, the European DDI Users Group meetings were a great success. Along with seeing all the work people have done of the last year, we were able to sit and discuss and debate for several days and have a solid plan for future work.

While I was only at the Developers meetings, we covered improvements to the website, new ways of managing large DDI instances in relational and non-relational databases, examined new (and forgotten) ways to design software, debated the best ways to handle automated ID creation, listened to the results of the semantic DDI workshops, learned about the DDI Agency Registry, debated reducing or removing namespaces from DDI, raised the possibility of a shared DDI Blog/News aggregator and started the creation of not one, but two major additions to the DDI community – a new transport element nicknamed “The DDI Bucket” and started laying the groundwork for a DDI RESTful web interface standard.

And that was in just 3 days! And I am still eagerly awaiting to see how the “Data Without Borders” and “Longitudinal DDI” workshops went.

The week was made even more productive by the use of Google Docs to create a single, living recollection of the event. Watching everyone type up their notes in real time was great. Over the next few week I (and hopefully the rest of the DDI Developers community) will continue to clean up our collaborative notes and look forward to presenting information and recommendations to the whole DDI community in the new year.

We also discussed upcoming meetings for the DDI Developers group and 3 possibilities were raised, at IASSIST in June, RC33 in July and EDDI next December. While events will most likely go on at all of these events, I strongly encourage those who can come to RC33 to be held in Sydney next July to speak up or at the least contact me in private. There is a wealth of talent in Australia and New Zealand who are well worth getting in contact with and with a large enough group of DDI members in Australia I think a “DDI Developers Down-under” would be well attended and well worth the trip.

So with that in mind thankyou to everyone in the DDI Community for a great week – and especially to Olof Olsson of SND for kindly offering me a place to stay during the week. It was a fantastic week, and served to remind me how if you work hard you can contribute to a community, being called upon to answer questions during the meetings (and once during the question time of someone else’s talk!) was especially flattering. This has truly re-invigorated my love of metadata (I spent the better part of my evenings in Rome madly writing ideas for tutorials and examples I foolishly volunteered for during the meetings)

So, with that I wish the entire DDI Community a Merry Christmas, Happy Holidays and Happy New Year and look forward to seeing everyone again in the new year, be it in Washington for IASSIST 2012, Sydney for RC33 or wonderful Bergen for EDDI 2012!!!

Arrivederci

Arrivederci

https://lh6.googleusercontent.com/-Bt7M3EmQOVo/TuY2PWCS8hI/AAAAAAAAEOc/nmKN-M6tK-M/s512/IMG_20111211_184959.jpg

Virgil UI 0.0.1 Beta now live!!

After months of development, testing, coding and crying…. Virgil UI version 0.0.1b is now available for public beta testing.

This release sees the first public testing of a full-functional, classification and codelist specific editor based on and supporting the DDI Lifecycle XML format (DLML).

Features in this release of Virgil include:

Known issues in the 0.0.1b release that  will be fixed in a future release:

  • Codes or languages cannot be removed once added.
  • New CodeSchemes cannot be added manually, only when importing from CSV.
Also new is an updated version of the standalone CSV to DDI converter tool that fixes some outstanding bugs in multilingual imports and corrects a few mistakes when writing the DLML.
For more information on Virgil-UI there is a list of blog post outlining the development process, or you can checkout the Google Code page, view all the downloads, or submit bugs.

Virgil UI – Beta demo video

Just a quick update that was supposed to have gone up last night. There is a video up on youtube now, showing of some of the more finalised features of Virgil-UI.

This shows three big features – CSV import, drag-and-drop reordering of classifications and multilingual support for editing. This means a classification with a multilingual component, for example a Canadian Industry Classification could have the English and French components edited simultaneously.

As stated in the last post, there should be a Windows binary release of a beta version of Virgil-UI and an updated version of the convertor tool should be released early September.

Microupdate – Virgil-UI now has improved multilingual support

After a weekend spent literally fighting to get multilingual support working for me in Python and Qt, Virgil-UI now has wide ranging support for multiple languages – both in the editor and importing from CSVs with unusual character sets.

With this, and the previously unmentioned drag and drop support for reordering classifications, Virgil is approaching a point where it is almost ready for beta testing. In the coming week, I’ll be making a few small tweaks, along with a demonstration video. Hopefully, by early September it should be packaged up as a beta, for widespead testing amongst the DDI community – just in time for the close of submissions for this years European DDI Users meeting.

Below are a few screenshots showing of the two main multilingual support features in Virgil-UI – the ability to add and edit the labels and descriptions of codes, and the ability to view the classification tree in any language that has been added.

Along with all of these changes a number of bugs in the CSV to DDI import tool have been corrected and I’ll be pushing out an updated Windows binary of that alongside the main release of Virgil-UI.

Showing of the basic language editing functionality

Users can even select which language they want the tree structure of the classification to be displayed in.

 

Updates to the Virgil CSV to DDI Converter

A short and sweet update:

There was an oversight with the CSV converter not converting coded values to the proper place in the created DDI XML. This has been fixed and the changes have been pushed into SVN and a new version (0.0.2b) of the executable has been released on Google Code.

420 convert classifications everyday

With the recent release of the new Australian Standard Classification of Drugs of Concern from the ABS, there was the opportunity to field test the Virgil CSV to DDI converter with real data to see how it held up. Fortunately, the classification was released as an Excel data cube that conformed almost entirely with the structures that Virgil supports. After a little cleaning of the CSV, it was able to run through the converter without few issues at all. Incidentally the most major error highlighted the massive oversight that the converter fails to add values for the codes! However this has been corrected and changes have been pushed in the svn, and a new version of the Windows tool will be pushed out this weekend.

A screen shot of Virgil with the converted classification

A screen shot of Virgil with the converted classification

Opening the newly created DDI file in the Virgil DDI CodeList Editor was another story and pointed out a few flaws with how it handles empty data. With the structure from the Excel file not containing descriptions for any category or any labels for the CodeScheme, there were a few small corrections made to accommodate freshly created DDI, but many of these problems will be ironed out by the time the CodeList editor is available for download.

While the converter hasn’t been fully integrated into the CodeList Editor, it will shortly be possible to create a single DDI file and import numerous CSV files to create a series of classificatory codelists in a single package. A practical and soon to be realised example would be the Australian Standard Classification of Drugs of Concern with the lists of drugs of concern, forms of drug and methods of consumption codelists all contained in a single machine processable DDI package.

For those who haven’t been able to download or run the converter, the output from this example is available for testing.

Virgil UI – CSV Converter UI Files now up

Over the week I’ve been coding away wrapping the CSV to DDI converter module with a nice user interface. Well, after a weekend of work it has a user interface, whether it is nice is in the eye of the beholder. As with the rest of the Virgil project the python code for this tool is available on Google Code. Unfortunately I haven’t had time to compile this into a Windows executable suitable for novice use, but interested parties are again welcome to download and test the tool from source.

For the curious, I’ve again recorded a demonstration and put it up on youtube, which is embeded below:

Again there is no audio, but I’ve included a brief transcription below so people can get a better idea of what the demonstration is trying to illustrate:

  • Open the anzsic.csv file to briefly view the contents of the CSV holding the labels and some descriptions of categories in the 2006 Australia and New Zealand Standard Industrial Classification.
  • Execute the conversion tool,  and load the ansic.csv file
  • Select the correct structure options for the CSV, as per the allowed structures described in a previous post.
  • Add a default language code and ID prefix for the DDI Instance and all codes and categories.
  • Demoing the preview table, showing how the header row can be ignored.
  • Convert the file, in the background you can see debug text for each code encountered.
  • Open a folder to save, and confirm the folder is empty.
  • Open the newly created file.
  • Add some line breaks to the automatically created XML  and search for a term from the original CSV.

Hopefully by this time next week there will be a fully downloadable Windows executable available for people to try.

Monday Funday – Challenge: De-obfuscate some bad DDI

The solution is now available below

DDI can be a harsh mistress sometimes, and mistakes can sometimes be made when trying to use it. As a data format it is flexible enough to handle most situations, but this flexibility can sometimes be a shortcoming.

Below is a poorly written chunk of DDI I’ve written, that forms part of a survey instrument. The good news it can be written in a much better way. The challenge is how to rewrite it:

<d:Sequence id="MainSequence">
    <d:ComputationItem id="CompItem1">
        <d:Code>
            <r:Code programmingLanguage="pseudoCode">SET X = X + 1</r:Code>
        <d:Code>
    </d:ComputationItem>
    <d:IfThenElse id="ifblock1">
        <d:IfCondition>
            <d:Code>
                <r:Code programmingLanguage="pseudoCode">X == Y</r:Code>
            <d:Code>
        <d:IfCondition>
        <d:ThenConstructReference>
            <r:ID>A_different_sequence</r:ID>
            <r:ID>MainSequence</r:ID>
        </d:ThenConstructReference>
    </d:IfThenElse>
</d:Sequence>

If you think you can correct this code, email a solution to theodore.therone at gmail.com. At the end of the week (When I wake up this Saturday AEST) I’ll select a solution at random and give away a $15 voucher for 5senses coffee.

If you need clarification on anything in the example code, post it in the comments and I’ll clear it up.

 


Unfortunately, there were no correct responses, so the voucher will go to the next challenge, but the answer is still available below.

Solution – this was a simple computer science riddle wrapped in a layer of DDI. It was a loop rewritten as a recursive if-branch. Rewritten as a loop it comes out as this:

<d:Loop id="MainLoop">
    <d:LoopWhile >
            <d:Code>
                <r:Code programmingLanguage="pseudoCode">X == Y</r:Code>
            <d:Code>
    <d:LoopWhile>
    <d:StepValue>
        <d:Code>
            <r:Code programmingLanguage="pseudoCode">SET X = X + 1</r:Code>
        <d:Code>
    </d:StepValue>
    <d:ControlConstructReference>
        <r:ID>A_different_sequence</r:ID>
    </d:ControlConstructReference>
</d:Sequence>

A much cleaner solution!