Posts Tagged ‘ Instrument

Questionnaire design with DDI – Part 5: Can it be done?

This is the fifth in a 5 part series of working with questionnaires and surveys managed using the Data Documentation Initiative XML standard. With DDI being an emerging technology, it is important that users are provided with best practises to ensure that they use the standard in a way that is logical, coherent and most importantly usable and reusable. This series of tutorials and discussions is aimed toward users who have some knowledge of DDI and would like to know how to effectively design markup existing and future questionnaires in DDI.

Part 5: Can it be done?

On the eve of IASSIST 2011, we are going to look back at the last few tutorials and ask the big question: Can DDI 3.1 be used as a viable format for the creation of online survey instruments? The simple answer is yes, but there are a few caveats.

Caveat 1: Only use a the transformation of a finalised survey

This is a very basic idea, but one that should be reiterated. When working to create a web survey, there will more than likely be an iterative process of design DDI, transform, examine form, refine DDI, repeat until done. This is a necessary part of development, as noone ever gets it right first time. However, once a form has been transformed there should be organisation sign-off to ensure that that form is never changed again. While the DDI will be versioned meaning the original DDI that the form would be based of would still be there, altering and overriding an in production form can only lead to data issues.

Caveat 2: Transforming DDI into an e-form a destructive transformation

InstrumentML as presented in Part 2 is not, and will never be, a part of the DDI 3.1 specification. It is a way to transform the logical structure of DDI into a form easily usable in certain situations. InstrumentML is a way of caching the implied instument flow to make machine processing easier. Likewise, the transformation from DDI to e-form is done to present the information in the DDI in a way that is easier for users to understand.

Caveat 3: Any changes to a HTML instance of a form will not be pushed back to DDI

It is a fact that no transformation that transforms a logical structure into a visual page will be perfect. So it is reasonalbe to assume that web pages created from DDI may need to be altered. Perhaps a list of options for response is better as a list of radio buttons compared to a drop down list, a line break is needed to alter word flow or the dimensions of text box need to be precise. These types of issues will rise and there are 3 cascading ways of resolving this:

  1. Alter the DDI to match the expected output,
  2. Alter the CSS to alter the presentation,
  3. If neither of the above produce the needed results, then and only then edit the generate HTML

With DDI allowing XHTML tags in most labels to allow semantic markup, this is the first and most appropriate way to alter pages. If what needs to be changes is wording this is also the only place to edit.

An example of where editing the HTML is need is where we have a question based on age where we are measuring labour figures. This is a numeric question, but in this case we have decided the populations below 18 and over 65 will be too small, so we are going to code everything outside these ranges to the codes ‘<18′ and ‘>65′. So we have maximums and minumums, and their codes, but on advise from a survey methodologist we won’t be restricting users from entering figures outside these ranges.
However our pre-generated HTML (based on the suggestions in part 4) will bring these across, so in this case we can edit this question to move the minimum from 18 to 0, as age still needs to be above 0 and remove the upper restriction.

Caveat 4: Final transformations should be packagable to be stored with the DDI

If a DDI instrument was transformed into a PDF format for printing, no good archivist would think twice about storing a copy of this as a manifestation of the instrument. Likewise, when transforming from DDI to other formats to ease machine use, keeping a copy of this is paramount. Software will change, and how one version of a tool transforms the DDI way not always be the same as how another transforms it.

In the instance of Ramona, a package consists of the HTML used to semantically mark the questions, the InstrumentML to contain the logical flow and the the CSS of the presentation layer. By storing this information, examining the survey at a layer date becomes easier for researchers who may no longer have access to the software that did the original transformation.

What is also important is making sure any valid alterations to a form, like those described in the previous caveat, are stored as a part of this package.

What we have is proof that it can be done, but its not done yet

As a closing point, people may be wondering if they can see the code that a lot of this information is based on. At this stage I’m not planning on releasing the code for Ramona just yet, mostly because its not very good. These tutorials are based on things I learned as I went about trying to create a DDI Web-form viewer, and as such it is less production code, and more a bloody narrative showing my battles and trials against what can be a very tough and complex standard.

Over the next few weeks I’ll be stripping the code down, separating the wheat from the chaff and rewriting large sections of it and with luck by the time EDDI2012 rolls around there should be a full functional DDI eforms tool ready to roll out into production.

This brings to a close this series of tutorials dealing with online survey development in DDI. In the coming weeks I’ll also start putting together a short introduction to DDI for new users, and spending the bulk of my free time actually writing code, rather than writing about writing code.

And lastly, enjoy IASSIST 2011 everyone!

Questionnaire design with DDI – Part 4: What can I say?

This is the fourth in a 5 part series of working with questionnaires and surveys managed using the Data Documentation Initiative XML standard. With DDI being an emerging technology, it is important that users are provided with best practises to ensure that they use the standard in a way that is logical, coherent and most importantly usable and reusable. This series of tutorials and discussions is aimed toward users who have some knowledge of DDI and would like to know how to effectively design markup existing and future questionnaires in DDI.

Part 4: What can I say?

One of the more useful aspects of electronic forms is they give us the ability to provide instant feedback to the user regarding their answers. HTML5 web forms especially are able to provide instant feedback to users on invalid data in number of ways. More importantly, electronic forms also allow us to perfom critical validation on data to ensure that we gather the most accurate data possible.

First off, lets look at the simple issue of finding what metadata is available in DDI 3.1 to provide validation. In DDI QuestionsItems contain the abstract idea of ResponseDomains, which in turn can be used to provide hints for systems to encourage certain responses from users, including Codes or Categories, various types of Numbers, Strings, Dates and Times, or Geographic data. Each of these explicitly imposes a specifc type of expected answer for a question, as well as additional restrictions:
For example, when asking for…

  • age we can explicitly request integers over 0
  • percentages we can define responses to be decimal numbers between 0 and 100
  • location data, we can define the geographic coordinate system
  • a unit code for a university course, we can specific a code scheme this must come from
  • a post code as a string, we can specify a pattern the string must match

This is just a short list of possible examples for the use of ResponseDomains, and if there is an issue with restricting responses this is the first place those restrictions could apply. ResponseDomains can even specify top and bottom codes, if for example one wanted to top code ages censor ages above 65. With such a wealth of information available, the only issue becomes using this effectively to assist in response validation.

When looking at validation, web pages can act quite different to desktop applications. On a desktop application the programmer has a lot more control over what the user does and what data gets entered, however on the web, the client can effectively rewrite the webpage before submission making data validation much more important. What both paradigms have in common is that both will unfortunately interact with a user – the least secure, most buggy and most important part of any system.

Client-side / UI validation

When talking about client-side validation we are discussing ways to ensure that the user is effectively engaging with the system. By quickly and politely notifying the user of their mistakes, or gently preventing them from making mistakes we can try and prevent frustration, hopefully increasing response rates.

The main theme from my talk from the European DDI User Group in Utrecht last year, was around looking at how DDI can be transformed for use on the the web. With HTML 5 offering so many new features for data collection on the web, using DDI to leverage these new capabilities. Lets look at two examples from the slides for that talk.

A question with a numeric domain and specific range

A question with a numeric domain and specific range

A question with a text domain and specific pattern

A question with a text domain and specific pattern

 

In these examples the colours indicate how the metadata in DDI can be used to populate a webform. Although there may be debate about the use of some DDI data for certain parts of the HTML Form, this demonstrates how this data can immediately be used. To see this in action there is an example at sandbox.kidstrythisathome.com.

For legacy browsers that don’t support HTML5, this still isn’t a huge issue as these can ‘fake’ a lot of this functionality with Javascript. This can be accomplished either through the use of libraries to add HTML5 functionality to legacy browsers or designing transforms and proprietary Javascript to handle the user interface. The idea however, is to provide users with gentle hints to encourage them to provide the ‘right’ answers (ages aren’t words, and must be over 0) not to provide security about the data…

Server-side / data validation

That is the role of server or data validation, to ensure that the data that we have recieve is correct. As mentioned, on a webpage hints for the user are provided using javascript or through browser APIs. However, it is trivial for a user to bypass this on a client. Client-side validation should not be relied upon as the client can transmit back almost anything. The issue is that server-side validation is must less well defined: while HTML is a web known standard there is no ubiquitous langauge for server-side systems. However, the trade-off is that the programmer has complete control over the actions of the server.

So what should a programmer do to support good validation of data? The idea I approach in Ramona, was that each question was an object of a question in its own right. Furthermore, by using object inheritence in Python, the question class was sub-classed so that each ResponseDomain in DDI had its own class. This allowed each sub-class to define its own exceptions and errors, as well as inheriting those of the question super class.

For example, the question class defined the ‘validate’ method, that each sub-class would futher implement in diferent ways. Numeric questions would check their upper and lower bounds and throw exceptions if the response was outside this range, string questions would check the response against the pattern, etc.. and the server would pass the returned value to each class to validate in order.

In summary, DDI provides excellent support for questionnaire designs looking to control response from users and with this metadata provided in an accessible way providing application support for this an easy way for developers to promote DDI by demonstrating the real-world applications of DDI.

Next up… Questionnaire Design with DDI – Part 5: Can it be done? – A look back at the previous 3 tutorials and an indepth explanation of the realisation of a Ramona – A functional DDI Questionnaire tool.

Questionnaire design with DDI – Part 3: What am I doing here?

This is the third in a 5 part series of working with questionnaires and surveys managed using the Data Documentation Initiative XML standard. With DDI being an emerging technology, it is important that users are provided with best practises to ensure that they use the standard in a way that is logical, coherent and most importantly usable and reusable. This series of tutorials and discussions is aimed toward users who have some knowledge of DDI and would like to know how to effectively design markup existing and future questionnaires in DDI.

Part 3 : What am I doing here?

So far in the last two parts of this series we have been looking at how to logically structure questionnaires in DDI. In this post we now switch to looking at how to effectively present DDI sequences in an application and how to structure DDI sequences so they are unambiguous and will display in predictable ways in all viewers.

One of the first difficulties in displayign a DDI Intrument in a webpage is, how many questions to show on a page at anyone time. The easiest, and most appropriate method is to display each sequence as a page and perform logic around those sequences on the server when the users submits data. However, Sequences can contain more than questions, and when we deal with mixed types the it can be difficult to predict how an agent may display the form.

For background. a DDI Sequence is able to reference any ControlConstruct as a child. We are able to separate ControlConstructs into two different types – branches and leaves. In tree data structures branches are able to have as children either other branches or leaves, where as leaves are the ‘end’ of the structure as they cannot have children. In DDI objects “tree-branch ControlConstructs” are objects which cannot reference other ControlConstructs, these are only StatementItem’s and QuestionConstruct’s, and every other ControlConstruct is therefore a ‘tree-branch’.

A Statement item is a way of including headers or textual information in an instrument and QuestionConstruct is a way of including a QuestionItem by reference into an Instrument. Therefore, it is these that are the most vital elements in questionnaire design, with loops and branches adding ‘syntactic sugar’ that improved the form, but don’t gather data directly. Presenting these on a web page is trivially easy, simply just copying the textual nodes of these into a webpage will suffice. In fact, in Ramona these are almost completely hidden from users, with the server using these to determine the next sequence to present for a user, but forever remaining in the background.

The question is though, how should an agent handle a sequence that contains a mix of leaves and branches? Let us work through this example, where we have a mix of leaves and branches, whose types are unimportant:

Sequence id="Seq 1"
    leaf   reference: L1
    branch reference: B2
    leaf   reference: L3

Here we have two object we know need to be displayed to a user separated by branch that may link to any number of other constructs. In a state-less web context, like Ramona, its going to be hard to come back to this page and make sure the right information is still displayed for the user. So we may decide to show all the leaves we can, and then to the branch and continue on like that, safe in the knowledge that even if the connection to the user fails, we’ll have got all the information in this sequence. However, this effectively reorders the questions in the instrument.

Alternatively, if we are using a desktop application where we have more control over state, it may be possible to present the two leaves and the objects in the branch in the same view. Here we come across an immediate problem: depending on the viewer the user is shown two vastly different forms. The solution is simple: leaves and branches should never be the children of the same parent, and that leaves should always be the child of a DDI Sequence.

This immediate resolves this issue by making the display of objects predictable, when displaying a sequence of leaves the agent only has to display the appropriate text of the statements and questions. When dealing with the logic of loops and branches the agent then can cache this using a format similar to that suggested in Part 2 of this tutorial series Where am I and where do I go next?. This allows the logic and presentation of a survey to be separated completely, making the display of a survey a much more predictable action.

There is an issue with this solution, and that is that it deviates from the structure imposed by DDI making survey design more strict about how elements can be combined. Furthermore, this solution requires designers of surveys understand these restrictions and voluntarily comply with them.

If the standard can’t enforce this restriction, and user can choose not to follow it, is it a good solution? Good, probably not – but it may be the best.

Admittedly, I hold a biased view in this respect. When designing Ramona, I made a decision not to suppose mixed sequences for this two reasons. Firstly, as above it makes the display of sequences more predictable, but secondly it is just plain easier to work with when you can apply these restrictions. When pre-compiling an instrument Ramona will error if it finds a sequence containing mixed types, and for the foreseeable future, when Ramona gets released, even as a research idea, it will continue to hold these restrictions because it prevents users from making ambiguous surveys.

Next up… Questionnaire Design with DDI – Part 4: Can it be done? – A look back at the previous 3 tutorials and an indepth explanation of the realisation of a Ramona – A functional DDI Questionnaire tool.

Questionnaire design with DDI – Part 2: Where am I and where do I go next?

This is the second in a 5 part series of working with questionnaires and surveys managed using the Data Documentation Initiative XML standard. With DDI being an emerging technology, it is important that users are provided with best practises to ensure that they use the standard in a way that is logical, coherent and most importantly usable and reusable. This series of tutorials and discussions is aimed toward users who have some knowledge of DDI and would like to know how to effectively design markup existing and future questionnaires in DDI.

Part 2 : Where am I and where do I go next?

In the previous post on questionnaires using DDI we looked at how it is possible in DDI to repeat sequences of questions, potentially leading to endless loops of questions within a survey. In a paper survey this would be quickly picked up, but in an electronic survey this might not always be the case. To help identify these issues  a tool called Sheri was introduced that consumes a DDI instrument and check for potential issues of repeated questions and endless loops. Sheri also is able to generate a non-standard XML format representing an entire survey marked up in DDI. But this leads to the question “if one were to design a survey in DDI, why introduce another XML format?”

During research into DDI as a data capture standard it became quickly apparent, that for all of its benefits in designing surveys, DDI was very difficult for programs to consume to create instances of these instruments. This is due to how DDI describes an Instrument. In DDI a whole survey instrument is represented by a single tag – Instrument -, which in turn references a single Sequence within a ControlStructureScheme. This is simple enough, to understand, but once the first Sequence is found the structure becomes quite interesting. Every Sequence, or other ControlConstruct, maintains references to its child sequences. For example, a Sequence can reference multiple other Sequences, which reference more Sequences and so on. Likewise, a Loop maintains a list of linked ControlStructres to loop over, and conditional IfThenElse tags contains references in both Then and Else clause that refer to other ControlStructures. What this means is that although there is an implied hierarchy, it is stored as a flat list of data structures with references between them.

What this leads to is a difficulty in determining where exactly in the hierarchy one is if they are given only the id of the current structure, especially when dealing with the stateless nature of the web. For example, the test software I was writing, Ramona, would take a sequence id as an argument and render the corresponding DDI sequence as a form on a webpage with web controls for each question. What quickly became an issue was determining the what the next page to display was when a user was done with a sequence.

In a DDI ControlStructure, the ID of a child element is stored as the text of an ID element under a ControlConstructReference. What this means is that to determine the next possible sequence given a sequence id, you need to look for references to the object, not the object itself. Then from the reference, search back for an appropriate ancestor element and then return the next sibling to show the correct form. This quickly becomes complicated and when dealing with large DDI test  files doing plain text searches throughout an unmanaged * hierarchy it became apparent that this method was too slow and complex for real time returns.

A further complication arises if a survey reuses an object using multiple references (as opposed to using Loops). In cases such as this, using the above method it becomes impossible to easily determine which is the correct parent. The reason being that a reverse lookup for referring objects for an object referenced multiple times will only return a list of referring objects with no context about which one we need to follow.

For example in the following sequence:

Sequence id="Seq 1"
    sequence reference: Seq 3
Sequence id="Seq 2"
    sequence reference: Seq 3
Sequence id="Seq 3"
    question reference: Q1: Where did I come from?

Resolves to have a structure like:

Sequence id="Seq 1"
    Sequence id="Seq 3"
        question reference: Q1: Where did I come from?
Sequence id="Seq 2"
    Sequence id="Seq 3"
        question reference: Q1: Where did I come from?

But, given just the id of Sequence 3 we can’t determine if after answer the question the survey is over (if we arrived at Sequence 3 from Sequence 2), or if we still have to go to Sequence 2.

To solve these issues of both speed, development and determining location, it is therefore necessary for applications using DDI to “pre-compile” Instruments into a traditional hierarchy form. In both Ramona and Sheri, the solution is to resolve the references and copy the referenced elements into the parent structure. This allow much of the DDI metadata to be retained and used.

This is not valid DDI, and it is unlikely that it ever will be.

This is also not a problem. DDI is useful as an archival and transportation language for statistical metadata, and the flexibility that the current structure provides is quite useful. However, when looking at data collection, it can be safely assumed that the DDI Instrument would be relatively stable. If best practices are followed, once a DDI Instance is published it will never change. Thus it is a perfectly valid action to transform an Instrument in this way, as long as two conditions are met: this is seen as a one-way destructive transformation, and that the resulting pseudo-DDI instrument is never changed. To provide another example of why this is a normal situation, there are tools that are being developed to transform DDI into PDF questionnaires: this is very much the same process, the PDF is seen as a projection of the original DDI to make it easier for people to use, but not the actual ‘source-of-truth’. Transforming DDI Instruments into a dereferenced pseudo-DDI Instrument is exactly the same, a transform of complex metadata into a form that machines can easily work with.

There is one issue that these pseudo-DDI Instruments will have, and that is when a single data structure is referenced multiple times, it will occur multiple times in the resultant tree. In cases like this, it is still difficult to determine which element is the correct one when just given an ID. There are two possible solutions to this issue, the first being that instead of managing state based on a single ID, it is managed as the full XPath of the element, possibly speeding up traversal, but also presenting the possibility the the structure of the form could be shared with users – which may or may not be a security issue depending on the form. Alternatively, as discussed in the part one of these tutorials, restructuring Instruments so no structure is referenced more than once, making it easier to traverse through the form, as well as limiting user frustration.

It should be noted that restructuring is not a necessity in making DDI Instruments processable by software, just something that makes them easier to use using traditional methods and existing XML libraries, and can provide benefits to execution times if that is an issue. However, when dealing with desktop software, many of the issues of web development with regards to stateless vs. stateful systems or the scalability of systems with concurrent users cease to be an issue. In such situations, it is quite possible to work using the DDI Instrument directly, with manipulating the data strucutre, and in some cases may be preferable.

In conclusion, how strictly we manage the DDI metadata structure depends very strongly on the role the metadata plays in a system. In some cases, such as computer-aided interviewing where the instrument should be extremely stable, the transformation of DDI to a format that is more easily processed by systems, or even users, can be preferable to using plain DDI. what is important is to focus on DDI as a tool for increasing transparency and reusability in statistical processing, but as long as the methods used to transform DDI into intermediate forms are well documented there is no reason why this cannot be done as such the use of well-documented transformations would not violate existing best practice.

Next up… Questionnaire Design with DDI – Part 3: What am I doing here? – A look at best practices for what control structures to include in sequences and how to deal with logical structures and questions.

* Unmanaged in the sense that the hierarchy is stored as references between XML elements, and not as a traditional XML hierarchy, and as such traditional tree traversal methods for XML cannot be used.