A Request for Comments on a new XML Questionnaire Specification Format (SQBL)

This is an announcement and Request for Comments on SQBL a new
open-source XML format for the cross-platform development of questionnaire
specifications. The design decisions behind SQBL and additional details are the
subject of a paper to be presented in 2 weeks at the 2013 IASSIST conference in
Cologne, Germany:
– Do We Need a Perfect Metadata Standard or is “Good Enough” Good Enough?
http://www.iassist2013.org/program/sessions/session-c4/#c220
However, to ensure people are well-informed ahead time, I am releasing details
ahead to conference.

The gist

SQBL – The Structured (or Simple) Questionnaire Building Language is an
emerging XML format designed to allow survey researchers of all fields to
easily produce questionnaire specifications with the required structure to
enable deployment to any questionnaire platform – including, but not limited
to, Blaise, DDI, LimeSurvey, XForms and paper surveys.

The problem

Analysing the current state of questionnaire design and development shows that
there are relatively few tools available that are capable of allowing a survey
designer to easily create questionnaire specifications in a simple manner,
whilst providing the structure necessary to verify respondent routing and
provide a reliable input to the automation of questionnaire deployment.

Of the current questionnaire creations tools available, they either:
Prevent the sharing of content (such as closed tools like SurveyMonkey)
Require extensive programming experience (such as Blaise or CASES)
* or use formats that make transformation difficult (such as those based on DDI)
Given the high-cost of questionnaire design, in the creation, testing and
deployment of final questionnaires a format that can reduce the cost in any or
all of these areas will have positive effects for researchers.

Furthermore, by providing researchers with the easy tools necessary to create
questionnaires they will consequently create structured metadata, thus reducing
the well understood documentation burden for archivists.

Structured questionnaire design

Last year, I wrote a paper “The Case Against the Skip Statement”, that
described the computational theory of questionnaire logic – namely the
structures used to describe skips and routing logic in questionnaires. This
paper was awarded 3rd place in the International Association of Official
Statistics ’2013 Young Statistician Prize’ http://bit.ly/IAOS2012. This paper
is awaiting publication, but can be made available for private reading on
request. It proposed that this routing logic in questionnaires is structurally
identical to that of computer programs. Following this assertion, it stated
that a higher-order language can be created that acts as a “high-level
questionnaire specification logic” that can be compiled to any questionnaire
platform, in much the same way that computer programming languages can be
compiled to machine language. Unfortunately, while some existing formats
incorporate some of the principles of Structured Questionnaire Design, they are
incomplete or too complex to provide the proposed benefits.

SQBL – The Structured (or Simple) Questionnaire Building Language

SQBL http://sqbl.org is an XML format that acts as a high-level language for
describing questionnaire logic. Small and simple, but powerful it incorporates
XML technologies to reduce the barrier to entry and make the description of
questionnaire specifications, even in raw XML readable. Underlying this
simplicity is a strict schema that enforces single solutions to problems,
meaning SQBL can be transformed into a format for any survey tool that has a
published specification.

Furthermore, because of its small schema and incorporation of XML and HTTP core
technologies, it is easier for developers to work with. In turn, this makes
survey design more comprehensible through the creation of easier tools, and
will help remove the need for costly, specialised instrument programmers
through automation.

Canard – the SQBL Question Module Editor

Announced alongside the Request of Comments of SQBl is an early beta release of
the SQBL-based Canard Question Module Editor http://bit.ly/CANARD. Canard is
designed as a proof-of-concept tool to illustrate how questionnaire
specifications can be generated in an easy to use drag-and-drop interface. This
is achieved by providing designers with instant feedback on changes to
specifications through its 2 panel design that allows researchers to see the
logical specification, routing paths and example questionnaires all within the
same tool.

SQBL and other standards

SQBL is not a competitor to any existing standard, mainly because a structured
approach to questionnaire design based on solid theory has never been attempted
before. SQBL fills a niche that other standards don’t yet do well.

For example, while DDI can archive any questionnaire as is, this is because
of the loose structure necessary for being able to archive uncontrolled
metadata. However, if we want to be able to make questionnaire specifications
that can be used to drive processes, what is needed is the strict structure of
SQBL.

Similarly, SQBL has loose couplings to other information through standard HTTP
URIs allowing linkages to any networked standard. For example, Date Elements may
be described in a DDI registry, which a SQBL question can reference via its
DDI-URI. Additionally, to support automation a survey instrument described
inside a DDI Data Collection, rather than pointing to a DDI Sequence containing
the Instrument details can use existing linkages to external standards to point
to a SQBL document via a standard URL. Once data collection is complete,
harmonisation can be performed as each SQBL module has questions pointing to
variables, so data has comparability downstream.

SQBL in action

The SQBL XML schemas are available on GitHub http://bit.ly/sqbl-schema that
also contains examples and files from video tutorials.
There is a website http://sqbl.org with more information on the format that
provides more information on some of the principles of Structured Questionnaire
Design.

If you don’t like getting your hands dirty with XML you can download the
Windows version of the Canard Question Module Editor from Dropbox
http://bit.ly/canardexe and start producing questionnaire specifications
immediately. All that needs to be done is to unzip the file and run the file
named . Due to dependencies flowcharts may not be immediately
available, however this can be fixed by installing the free third-party
graphing tool Graphviz http://www.graphviz.org/

Lastly, there is a growing number of tutorial videos on how to use Canard on Youtube.

Video 1 – Basic Questions http://www.youtube.com/watch?v=ijk00SqoBGk (2:17 min)
Video 2 – Complex Responses http://www.youtube.com/watch?v=d3Vrn2B4EO4 (2:17 min)
Video 3 – Simple Logic http://www.youtube.com/watch?v=GrAWbOF-UW8 (4:11 min)

There is also an early beta video that runs through creating an entire
questionnaire showing the side-by-side preview.
http://www.youtube.com/watch?v=_FImaXn7EYk (13:21 mins)

Joining the SQBL community

First of all there is a mailing list for SQBL hosted by Google Groups:
https://groups.google.com/forum/?fromgroups#!forum/sqbl.

Along with this each of the GitHub repositories http://bit.ly/sqbl-schema,
http://bit.ly/CANARD include issue trackers. Both Canard and SQBL are in
early design stages so there is an opportunity for feedback and input to ensure
both SQBL and Canard support the needs of all questionnaire designers.

Lastly, while there are initial examples of conversion tools to transform SQBL
into DDI-Lifecycle 3.1 and XForms, there is room for growth. Given the
proliferation of customised solutions to deploy both paper and web-forms there
is a need for developers to support the creation of transformations from SQBL
into formats such as Blaise, LimeSurvey, CASES and more.

If you have made it this far thank you for reading all the way through, and I
look forward to all the feedback people have to offer.

Cheers and I look forward to feedback now or at IASSIST,

Samuel Spencer.
SQBL & Canard Lead Developer
IASSIST Asia/Pacific Regional Secretary

http://about.me/legostormtroopr

http://au.linkedin.com/in/legostormtroopr/

Beginning the soft launch of SQBL and Canard

Over the past week I’ve start finalising a version of Canard and SQBL ready for early-Beta testing and public review ahead of IASSIST2013. While I’ll be putting together more documentation later in the week, the first of a series of short tutorials on how Canard will eventually be used.

Also, later this week will see the source code for Canard as shown in the below video released on GitHub, as well as a beta binary for easy of use during testing. For now the SQBL schemas can be seen on GitHub and the main SQBL website contains more information. For now, enjoy the two videos below to see how a strict structure can make questionnaire design easier than ever before!

“FingerTabs” – Horizontal Tabs with Horizontal Text in PyQt

On advice from someone far more experienced in user interface, I was given some feedback on Canard (a questionnaire specification editor) and was pointed in the direction of FingerTabs. Although, its not a widely used term (unless you are an archer) I couldn’t find anything other term for what are height-wise stacked (in PyQt this west or east positioned tabs) tabs with horizontal labels. FingerTabs are call so, because they look like a bunch of long little fingers, although this visual metaphor breaks down if you have more than 5 tabs, or a Lovecraftian imagination. For example:

Normal (or top aligned) tabs.

Normal (or top aligned) tabs.

Left (or west) aligned tabs with PyQt Default text orientation.

Left (or west) aligned tabs with PyQt Default text orientation.

Left aligned tabs with normal (or horizontal) oriented text.

Left aligned tabs with normal (or horizontal) oriented text.

If you want to go down the path of stacked tabs, the last way is probably the best to go for as it is much easier to read, and you can fit more tabs in the space vertical space, with little loss of horizontal space, as the examples above illustrate. Interestingly enough getting the last one, is quite easy, although not well publicised.

Changing from the top to the bottom is just a matter of extending the QTabBar of the QTabWidget and overriding the default paintEvent and sizeHint. This allows you to override the original text orientation, and insert it in a more readbale fashion. The difficult bit was determining how to reuse the default tab sytling (line 10 and 17 in FingerTabs.py below).

For what its worth, those 38 lines of code took about 4 hours to write for a staggering 1 line of code every 6 minutes (and 20 seconds).

Thanks go to the two threads from StackOverflow, where the first answer got me close enough to implement the above:

Book Review – Fight Club

I’ve had a chance to read some of Chuck Palahniuk’s other work and the only word for his work I’ve read so far is visceral. Given his pride at always being able to have people faint during his readings of Guts, many people would say the same. So despite his gripping writing and that Fight Club was the book that launched him into the spotlight, but have never gotten around to reading it. Having seen the movie, and read that there is a lot content from the book, I was curious to see how it played out, despite knowing the twist at the end. I’m also going to assume that most people do know, if you do not, I don’t spoil anything, but on with caution.

Knowing the twist does take some of the sting out of the book. There are a few clever little hints through the book that a reader with fresh eyes might gloss over, but having seen the movie you are just kind of waiting for that “lost of cabin pressure” moment. That said, if you go in know that you know the end, it is still an excellent read. Palahniuk writing is disjointed and punctuated by whitespace – lots of whitespace, so you read it faster than you expect. If I was told that the book was written not after the movie, but during by someone given only one chance to write what was happening down, I would not be surprised. Dialogue is hard to follow and I found myself rereading to track who said what, but this fits with the theme of the unreliable narrator quite well. Some of the most fleshed out passages are around fight scenes and even they are short and brutal, with Palahniuk’s ability to  write gore and brutality given a chance to shine. I couldn’t tell you how many times I grimaced while reading it.

While this review sounds conflicted, its really hard to compliment Palahniuk’s work – its like trying to compliment a really great painting of shit. I mean it could be a really great painting, with excellent composition that speaks to the subject matter, and incorporates the texture of the paint in the final design.  But it is still a painting of shit, and the better the depiction the more it is going to fill you with revulsion.

Fight Club is a gripping read, that flows well and doesn’t waste time with its message and I can’t recommend it enough – but its still a book about guys beating each other in an attempt to escape rampant consumerism, all of them led by someone with mental health issues.

Improving performance in technical interviews by overcoming Dunning-Kruger

At the moment I’m going through some of technical interviews that are designed to see if I have the right stuff to join the ranks of Silicon Valley. The problem is that I tend to not think of myself as an exceptional programmer, which has a negative impact on my self-esteem that gives me the jitters, making me less than perfect in these technical interviews. Very self-fulfilling indeed!

Now the reasons for this are probably due to the Dunning–Kruger effect and Impostor Syndrome. The short version of both of these is that “Impostor Syndrome is a psychological phenomenon in which people are unable to internalize their accomplishments”, where as the “Dunning–Kruger effect is a cognitive bias in which unskilled individuals mistakenly rate their ability much higher than average.” Even more briefly, if you are skilled in an area you mistakenly think you aren’t, and if you aren’t you think you are. Before its pointed out, I already get the irony that as someone unskilled in psychology, I am trying to write about these with any authority!

Paradoxically, it gets even worse. If you don’t know if you are skilled or unskilled, you don’t know if you underrate or overrate your performance, which means knowing about the above two phenomena might even mean you inflate or deflate your perceived skill even more!

But help is at hand, and it really just comes in the form of getting some perspective on your skills. First of all get a good friend who has some level of knowledge in your field and talk with them. Just getting outside perspective on someones perception of your skills, or even just a blank stare can go a long way when you are exmplaining to someone why you beating yourself up over not knowing when to use a Binary Search Tree vs a Binary Heap or when you know you can optimise better than O(n*k) to O(n*log(k)), but you just couldn’t quite explain it.

Secondly, build a list of the cool things you’ve done. Good recruiters will want to hear about the projects you’ve done both at work and for fun (if you aren’t building code for fun you should probably start). Not only with this help refresh your memory so you can call on work you did a long time ago, it also gives you time to review your own work, and pick the cream of the crop and see how much you have done.

It wasn’t until I sat down and looked at all of the things I’ve had done, mostly for fun in the past few years, such as:

Looking back, I can say I’ve kept myself busy, remaining on top of certain trends and having fun. Which is important, but an honest personal stocktake is also important to give yourself the confidence necessary to not underestimate your skills, or identified weaknesses that can be addressed, avoided or discussed as potential for growth.

So those are my two tips for success when going into a technical interview – internal and external perspective, and a solid understanding of your ability, strengths and weaknesses. Pretty simple really, but its one of those things that doesn’t quite clarify until you’ve already gone through it.

Sir Roland Wilson – The man who burned the census

“Bloody well do what we tell you and you’ll be fine.”
Roland Wilson (Secretary of the Treasury) speaking to Billy Wilson (Treasurer)

Sir Roland Wilson headed up Commonwealth Bureau of Census and Statistics (precursor to the Australian Bureau of Statistics), the Treasury and the Department of Labour and National Service, amonst others. If you are interested, you can read about his wide career (including his history as an early student of the Chicago School of Economics) in his well written obituary. While I won’t try and rewrite this, what itdoesn’t capture is my personal favourite story of Sir Wilson’s career, during his time as Commonwealth Statistician.

This story comes from Informing a Nation: the Evolution of the Australian Bureau of Statistics, which details the core prinicples at the heart of the public service, trust and service to the community. The quote below goes into detail, however the short version is that when faced with having to relinguish confidential information on individuals, violating the privacy of the Bureau and the public, Roland Wilson chose to torch the records and defy parliament than violate the trust of the public.

That I think is the level of bravery and commitment that should stand at the heart of all public servants. Its just a shame that they don’t sell those little silicon wrist bands branded WWRWD, that remind us all to ask “What Would Roland Wilson Do”?

Throughout the history of the Bureau, its statisticians have preserved the confidentiality of the information provided by individuals and businesses. Today, the Census and Statistics Act protects the confidentiality of data reported to the Bureau. However its statisticians through the decades have always ensured that the data reported to them by individual respondents remained confidential.

For example, Sir Roland Wilson (Commonwealth Statistician 1936–1940 and 1946–1948) once told the story of how legislation for a Census of Wealth was hastily drawn up in the early days of World War II. The legislation was badly drafted and mentioned that the Commissioner of Taxation could have access to the data – without making it clear that he could only access the collated information.

Subsequently, during a tax evasion case, the Commissioner of Taxation formed the view that he could win the case by accessing the defendant’s individual Census of Wealth data.

‘[He] … came storming into my office one day and demanded this bloke’s wealth card and I said he couldn’t have it. “Why?” “Because they are confidential and if it was used in a court case it could wreck our reputation”.

The Commissioner of Taxation, not content with this reply, took the matter to Cabinet and convinced it to approve his access to the individual’s data. Then he went back to Wilson to collect the information.

‘Oh, he was on the seventh heaven of delight and he came storming along with his two Deputies, waved the Cabinet decision at me and said, “You’ve got to hand those cards over to me”. “I’m sorry … I can’t.” [Said Wilson] “What do you mean? I’ve got a Cabinet decision!” [The Commissioner exclaimed]. ‘[Wilson replied] “You’re about a week too late. I piled them onto two trucks last week, sent them down to Sydney and incinerated them”.

- Sir Roland Wilson, interviewed in 1984.

Why I’ve chosen to make a new XML standard for questionnaires

XKCD #927

Normally I don’t like XKCD, but this is so true.

I’ve made no secret of the fact that I’ve been working on a new format for questionnaires. I recently registered a domain for the Structured Questionnaire Building Language, and have been releasing screenshots and a video of a new tool for questionnaire design that I’m working on. Considering that I’ll be covering this work at at least one conference this year, and given my close ties in a few technical communities I felt that it would be good to discuss why this is the case, and answer a few questions that people may have.

Why is a new format for questionnaire design necessary?

Over the past few years I’ve done a lot of research analysing how questionnaires are structured in a very generic sense. Given the simplistic nature of the logic traditionally found in paper and electronic questionnaires and their logical similarity to computer programming, I’ve theorised that it should be possible to use the same methods (and thus the same tools) to supports all questionnaires – including the oft ignored paper questionnaire. Unfortunately, attempts to improve questionnaires have focus on proprietary or limited use cases, which is why tools and formats such as Blaise, CASES and queXML exist, but generally only support telephone or web surveys. Likewise, all of these attempts have ignore the logical structure in various ways and discouraged questionnaire designers from becoming intimately, and necessarily familiar with the logic of their questionnaires.

SQBL on the other hand is an attempt at designing a specialised format to support the capture of the generic information that describes a questionnaire. Likewise, Canard is a parallel development of a tool to allow a researcher to quickly create this information, as a way to help them create their questionnaire, rather than just document it afterwards.

As a quick aside, if you are interested in this research on Structured Questionnaire Design, I’m still waiting publication, but if you email me directly, I’ll be glad to forward you as much as you care to read – and probably more.

Why not just use DDI?

Given the superficial overlap between SQBL and DDI, this is not an uncommon question even at this early stage. I’ve written previously that writing software for DDI isn’t easy, and when trying to write software that is user friendly, and can handle all of the edge cases that DDI does, and operate using the referential structures that make DDI so powerful its hard. Really hard. Given that a format is nothing without the tools to support it, I looked written a three part essay on how to extend DDI in the necessary ways to support complex questionnaires. However, even this is fraught with trouble as software that writes these extensions would have trouble reading “un-extended” DDI. What is needed is a tool that is powerful enough to capture the content required of well structured questionnaires, in a user-friendly way, and it seemed increasingly unlikely that this was possible in DDI.

A counterpoint is to also ask “why DDI?” DDI 2 and 3 are exemplary formats when looking at archival and discovery, however this is because both are very flexible, and can capture any and every possible use case – which is absolutely vital when working in an archive to capture what was done. However, when we turn this around and ask look at formats that can be predictably and reliably written and read what is needed is rigidity and strict structures. While such rigidity could be applied to DDI, it risks fracturing the user base leading to “archival DDI”, “questionnaire DDI” and who knows what else.

Thus the I deemed the decision to start again, with a strict narrow use case, uncomfortable but necessary.

What about DDI?

I did some soul searching on this (as much soul searching one can do around picking sides in a ‘standards war’), and realised that there really is no point in “picking sides”. SQBL isn’t perfect and isn’t yet complete, and more to the point it supports a very narrow use case. If I personally view DDI as an flexible archival format, there is a lot of work necessary to support conversion into and out of it to support discovery and reuse. Likewise, if I view SQBL as a rigid living format for creating questionnaires, the question becomes how to link this relatively limited content with other vital survey information. By definition SQBL has a limit useful timeframe, and once data has been collected (if not earlier) it is no longer necessary so conversion or linkages to other formats become required.

Some where between these overlaps is where DDI and SQBL will handshake, and perhaps in future standards this handshake will be formalised. Which means there is a lot of work on both sides of the fence, that I look forward to playing an active part. But in the interim, and for questionnaire design, I believe SQBL will prove to be a necessary new addition to the wide world of survey research standards.

A developers new years resolution: Stop talking about the “user”

[The term "user"] splits people into two discrete groups – “programmers” and “users”, … the unhelpfully divisive “us” and “them”

User interface, user experience, user story, user error. Application programming is devoted to helping users. But while there have been objections to the term user, and there have been suggestions to humanise the term by encouraging developers to consider people, as opposed to ‘users’. But even then we are just changing the term, not fixing a problem. There is no generic term for people that is adequate, as all people interact with the world differently, and as such people interact with software in different ways and expecting different interactions.

To see this, lets look at the first example of a user story on the Wikipedia pageAs a user, I want to search for my customers by their first and last names.

As a developer, if I were to pick this up I have no clue as to what kind of interaction I need to provide. Is this “user” a business clerk looking as sales customers, or a MMORPG administrator looking for game players, or veterinarian looking through a list of pets. In each of these cases there are subtle differences, where customers can be people, avatars or pets and as such each set of information may need to be presented differently.

While the term user, may be appropriate when we discuss the broad fields of user experience and user interaction, acting as a catch all for all possible users of software. But when we begin looking at specific classes of people we want to assist with software it ceases to be a helpful term and it splits people into two discrete groups – “programmers” and “users”, “IT-side” and “business-side”, or the unhelpfully divisive “us” and “them”. Who is “us” and who is “them” generally depends on which side of the table you are on, but it remains that the term “user” draws a clear line in the sand, between our group and the dreaded “others”. Worse still, the term for the others is so unhelpfully vague that we can only think in metaphors or stereotypes. By the way, have you ever noticed that programmers type (Ctrl+i)like this(Ctrl+i), but users type (clicks Bold)like this(clicks Bold again)?.

The solution: Stop using the word “user”

As soon as you think about a “user”, immediately stop and mentally replace the term with an appropriate noun.

As a new years resolution, I want all programmers, developers, business analysts, and so on, to try something. As soon as you think about a “user”, immediately stop and mentally replace the term with an appropriate noun that mentions whose needs are being met by the tool you are designing. Regardless of how contrived or long winded that noun needs to be, replace the term user EVERY TIME it enters your thoughts, even before you write it down. Even if you weren’t going to write something, the second you think “the user wants…” or “a user will see…” immediately go back and replace that term.

So see why this is helpful, I propose another user story: Clicking the button will allow the user to save their work.

And picture how this interaction may take place and answering some of these questions: What does the button look like? How will the use know to click that button? How will the interface react when the user clicks the button? Given the vagueness of the story, they may be really tough questions to answer.

Now, contrast this with the following by answering the same questions about the following three user stories:

  • Clicking the button will allow the author of a novel to save their work.
  • Clicking the button will allow the programmer to save their work.
  • Clicking the button will allow the child who is drawing to save their work.

When you pictured these interactions, did you visualise them differently? Did you picture different images on the button shown to the author and the child? Is the size of the button different between the child’s button and the others? More than that did the user story sound unnecessary, for example, does the idea of a programmer clicking a button seem to make sense, or would they be more likely to just use a menu or even keyboard shortcut?

By forcing ourselves to go back and reexamine the “user” of any given piece of work, or line of code, it forces programmers to think about what they are trying to achieve. While it might be conceivable that a “user” might want to do some given action, it is a helpful mental exercise that forces us to step back and really contemplate if the user really will want or need to do something.

 

Why are there so few survey design tools that use DDI?

Having been a close part of the DDI community for some time, and having attended a number of DDI focused conferences I have noticed a disturbing trend. There are relatively few content editors that use DDI. I have chosen this term very carefully, as there are a number of DDI Editors but these are tools whose primary function is to produce DDI XML. When I say a DDI-powered content editor I mean a tool with a limited use case that happens to use DDI as the storage format. As an example, we can look at Colectica – a leading DDI Editor. In this tool to create a survey with some pathing between questions, first I create a QuestionScheme, with some Questions, then I create an Instrument, which create for me a  ControlConstructScheme, then I can start pulling questions into this. If a new question needs to be made, I switch back to my QuestionScheme view, and make a new question, then switch back to the instrument and drag it in. While it is able to make perfectly valid DDI, this is not entirely how people think during this process. This is analogous to opening a Word processor to write a letter, and having to write an alphabetical list of words that I can then drag into the appropriate place in the document, rather than just typing away. But this isn’t on any part the fault of Colectica itself, but more the only way that an editor that uses DDI could feasibly be written.

To look at why this is, I want to examine two simple use cases that should be able to be done using a simple tool and have the corresponding data managed in DDI. Firstly, how does a survey designer go about reusing an existing question in their survey, and secondly, how does a survey designer create a new question inside of an existing survey instrument? Now to answer these questions I want to look at it from a uer interaction point of view, and pull out what a survey designer would have to do ensure that they have the bare minimum content needed to be ‘good’ DDI.

Use case 1: Reusing a question

One of the commonly stated advantages of DDI is the reusability of its managed content, so it should be the case that reusing a question is a relatively simple affair. For this use case, we picture a hypothetical user interface, where a survey designer wants to insert a new question into an existing sequence of questions. In DDI terms, they wish to insert a QuestionConstruct into a Sequence, not make a new QuestionItem in a QuestionScheme. So ideally the designer should need to:

  1. Search for a question using some search parameters
  2. If a suitable question is found, drag this question into the sequence.

However, this isn’t the case. First of all, the user interface needs to differentiate between the QuestionItem and the QuestionConstruct, as the QuestionConstruct is used to insert a question into a sequence by reference. So already we need the survey designer to understand DDI well enough to differentiate these objects. Secondly, if the needed QuestionConstruct doesn’t exist, this needs to be created by the user, which then necessitates that the user is prompted for the ControlConstructScheme that the new QuestionConstruct lives in. So what actually has to happen is this

  1. Search for a question using some search parameters
  2. If a suitable question is found, look at the list of QuestionConstructs (each with their own different contexts), and drag the appropriate one into the sequence. Nothing further needs to be done.
  3. If an appropriate QuestionConstruct doesn’t exist, create it with its own label and description.
  4. Prompt the user for where the QuestionConstruct should be maintained
  5. Search for a ControlConstructScheme using some search parameters, selecting the appropriate one.
  6. If none is found, create one with its own label, description, version, etc…

Here the simple act of reuse has tripled in size, now requiring the survey designer to understand more of the DDI model than necessary, as well as in many cases having to then become administratively responsible for further content than just their original survey content.

Use case 2: Creating a question

However this user interaction becomes much more complex when a user wants to add a new question. Again this should be a relatively simple affair, where a survey designer has made the decision that a new question needs to be created. In DDI terms, they wish to insert a QuestionConstruct into a Sequence, and create a new QuestionItem in a QuestionScheme . So ideally the designer should need to:

  1. Click to create a new question in the location needed.
  2. Add the corresponding information, such as question text, a label and description and intent.

Again however, this is far from how it would work using a DDI compatible tool.

  1. Click to create a new question in the location needed.
  2. Add the corresponding information, such as question text, a label and description and intent.
  3. Prompt the user for the QuestionScheme where the QuestionItem should be maintained.
  4. Search for a QuestionScheme using some search parameters, selecting the appropriate one.
  5. If none is found, create a QuestionScheme with its own label, description, version, etc…
  6. Create the necessary QuestionConstruct with the corresponding information, such a label and description.
  7. Prompt the user for where the QuestionConstruct should be maintained
  8. Search for a ControlConstructScheme using some search parameters, selecting the appropriate one.
  9. If none is found, create one with its own label, description, version, etc…

Here the act of simply adding in a new question is a 9 step process. It can be argued that not all of the steps are necessary, or that content for ‘unimportant metadata’ could be filled in at a later stage, but this means that objects remain empty for an indeterminate amount of time or relies on conventions to hide information from users, e.g. A QuestionItem can only link to one QuestionConstruct so they can be treated as ‘the same’. However, while valid DDI, this violates the ‘spirit of the standard’.

Why is this important?

Ultimately, users and their tools make or break a standard, if no one can write DDI, or write tools that write DDI, or write tools that people want to use, then the very purpose of the standard is called into question. But the wider implication is this, the reuse of content stored as DDI is contingent on its reuse, but it must initially come from somewhere.  Perhaps in its current state DDI can be made to work for post-hoc research archivists. However, it is still lacking as a living standard where it can be used through the survey lifecycle simply due to the over engineered state.

How can this be resolved?

Firstly, by drastically simplifying the content requirements and referential structure in DDI, and this will be achieved by talking with users and determining their needs. Archivists, survey researchers and central bankers will all have very different needs from each other as they all do wildly different things. While its not infeasible that one standard could meet their needs, it comes from identifying their needs first. As a first step I offer this as an opening question: Does anyone actually want to reuse just a single question? I ask this as in my limited experience, I’ve seen that people really just want to be able to reuse large modules of questions, a limited number of questions with their own internal logic can be reused across a number of areas. It will probably come to mind that the question of ‘Sex’ is reused across almost any population research, but the rebuttal is does anyone ever ask Sex, but not Age?

The DDI Identity Crisis and how to solve it – Part 1 : Versions and Identifiers

This is a 2 part post that examines the the different classes of identifiable object in DDI, and offers critiques for their current design and possible improvements to the standard with the aim to simplify the model and (hopefully) improve the uptake of people using the standard. But first we need to have a quick look at what the 3 different classes of identifiable object in DDI are and where they are used, in an increasing order of complexity:

  1. Non-identifiable – We’ll include this as the ‘base’ case of any DDI object that isn’t capture by those above. These objects are mostly used to capture basic metadata concepts, such as labels or descriptions for more complex objects.
  2. Identifiable – Objects that only require an ID attribute. These are mostly basic metadata, and below I’ll show the shady distinction between identifiables and non-identifiables being blurry and why these objects probably don’t need identifiers at all.
  3. Versionable – A level above identifiables, these require a version and an ID. This is probably the most commonly encountered type of core attribute, as they comprise the bulk of the survey objects people are used to dealing with – such as questions, variables and codelists. Further down I talk about how these objects don’t need a version, along with the administrative burden it adds – without a clear benefit.
  4. Maintainable – The most complex identifier – with an ID, a version and a reference to a maintainance agency. Maintainable objects are mostly used as either container objects, such as schemes, resource packages or groups; or high-level and survey wide objects such as Study Units or Archival objects. In the following post I’ll show how they are currently managed, and how they can be better managed as XML objects to simplify RESTful interfaces for DDI.

Identifiable objects don’t need identifiers

Identifiable objects are the subset of all objects within DDI that have only an ID, but no version or agency. In DDI, since ID attributes are only required to be local to the parent maintainable, this means that the reference an identifiable, its ID isn’t enough, you also needs the ID of the parent object as well! So while an identifiable can be referenced, to access it, it is necessary to first identify and gather the parent resource.

This becomes  interesting when we examine the list of objects which are only identifiable (not versionable or maintainable), shown below:

Abstract
Abstract
Access
ActionToMinimizeLosses
Attribute
Coding
CollectionEvent
CollectionSituation
CoordinateGroup
CreationSoftware
DataCollectionMethodology
DataFileIdentification
DefaultAccess
DeviationFromSampleDesign
Embargo
ExternalAid
ExternalInformation
ExternalInterviewerInstructionReference
Geography
GrossFileStructure
GrossRecordStructure
LifecycleEvent
Location
LogicalRecord
Measure
ModeOfCollection
Note
OtherMaterial
PhysicalRecordSegment
ProcessingEvent
Purpose
Purpose
RecordRelationship
Role
SamplingProcedure
Software
SpatialCoverage
TemporalCoverage
TimeMethod
TopicalCoverage
VariableSet
Weighting

All of these objects constitute (at least to my mind) very basic, textual and contextal dependent metadata. Concepts like an ‘abstract’ or ‘purpose’ only really make sense given the context of what you are summarizing. This is reinforced by the fact that this information can only be gathered by finding the object you are summarising first, before getting this information.

Which leads us to ask – what make identifiables different to non-identifiables? In my opinion, nothing – its a distinction made on convenience. Again, in my opinion, identifiables exist because Notes exist. Because the methods for extending and improving DDI were not made more obvious to early adopters, DDI Notes have become the most common way to annotate objects, and given the referential nature of Notes, this requires objects to have identities.

The solution: Remove IDs from identifiables – If Notes are deprecated as a solution, IDs on identifiers are no longer needed and there is no other reason to identify them and they can be scaled back to the ‘non-identifiable’ class of object.

 Versionable objects shouldn’t have versions

Versionable objects are the set of objects that have both an ID and a version, and (as the DDI User Guide states) “are elements for which changes in content are important to note.” However, both versions and maintainables have a version, that supports the tracking of changes to an object. This causes a very interesting problem to occur when dealing with objects in practice – the identifiers of objects can change, without them having changed at all!

Lets look at an example, with a maintainable QuestionScheme called QS1 with version 1, and two versionable Questions, Q1 and Q2, both on version 1 as well. Since the full identifier for a versionable is also comprised of its parent, the full ID for the most recent version of Q1 takes a form similar to QS1:V1|Q1:V1, simple enough. A problem arises when Q2 is changed to be version 2. Technically, since Q2 is a child of the QuestionScheme QS1, it has also changed.

Now, the complexity is that QS1 has changed, so the full ID for the most recent version of Q1 has now changed to, QS1:V2|Q1:V1. Which leads to the academic question – if Question Q1′s parent has changed, has Q1 itself also changed, meaning that to be apart of the updated parent it also needs a new version?

The discussion to resolve this problem with DDI versionables has actually been kicking around for quite a while, but again the solution for this is pretty clear as the section header states. The first thing to recognise is that all versionable objects are already versioned by their parent object, so strictly speaking, given only the full ID for the parent, and the ID of a current versionable, it is possible to identify a single object for the simple fact that all IDs on objects must be unique within their parent maintainable.

So by removing the version from versionables, and relegating them to instead be identifiables we simplify the model for abstract types in DDI is reduced to two classes, with very clear intentions. In the new model identifiables are objects which are reused through references within other objects to construct rich, linked metadata constructs, while Maintainables are the versioning objects that are used by agencies to administer cross-survey and cross-cycle metadata holdings.

However, as we’ll see in the next post, this change actually helps us take advantage of a number useful XML technologies to simplify the learning process for DDI, for implementers and developers alike.

Next up: How Maintainables aren’t properly maintained

In the next post, I’ll cover how to simplify the DDI XSD Schemas to take advantage of XML identities by removing inline schemes and restricting base elements to simplify identification and URI design, so DDI can utilise URLs and XML fragments to precisely define objects for RESTful interfaces.