Posts Tagged ‘ DDI

Why are there so few survey design tools that use DDI?

Having been a close part of the DDI community for some time, and having attended a number of DDI focused conferences I have noticed a disturbing trend. There are relatively few content editors that use DDI. I have chosen this term very carefully, as there are a number of DDI Editors but these are tools whose primary function is to produce DDI XML. When I say a DDI-powered content editor I mean a tool with a limited use case that happens to use DDI as the storage format. As an example, we can look at Colectica – a leading DDI Editor. In this tool to create a survey with some pathing between questions, first I create a QuestionScheme, with some Questions, then I create an Instrument, which create for me a  ControlConstructScheme, then I can start pulling questions into this. If a new question needs to be made, I switch back to my QuestionScheme view, and make a new question, then switch back to the instrument and drag it in. While it is able to make perfectly valid DDI, this is not entirely how people think during this process. This is analogous to opening a Word processor to write a letter, and having to write an alphabetical list of words that I can then drag into the appropriate place in the document, rather than just typing away. But this isn’t on any part the fault of Colectica itself, but more the only way that an editor that uses DDI could feasibly be written.

To look at why this is, I want to examine two simple use cases that should be able to be done using a simple tool and have the corresponding data managed in DDI. Firstly, how does a survey designer go about reusing an existing question in their survey, and secondly, how does a survey designer create a new question inside of an existing survey instrument? Now to answer these questions I want to look at it from a uer interaction point of view, and pull out what a survey designer would have to do ensure that they have the bare minimum content needed to be ‘good’ DDI.

Use case 1: Reusing a question

One of the commonly stated advantages of DDI is the reusability of its managed content, so it should be the case that reusing a question is a relatively simple affair. For this use case, we picture a hypothetical user interface, where a survey designer wants to insert a new question into an existing sequence of questions. In DDI terms, they wish to insert a QuestionConstruct into a Sequence, not make a new QuestionItem in a QuestionScheme. So ideally the designer should need to:

  1. Search for a question using some search parameters
  2. If a suitable question is found, drag this question into the sequence.

However, this isn’t the case. First of all, the user interface needs to differentiate between the QuestionItem and the QuestionConstruct, as the QuestionConstruct is used to insert a question into a sequence by reference. So already we need the survey designer to understand DDI well enough to differentiate these objects. Secondly, if the needed QuestionConstruct doesn’t exist, this needs to be created by the user, which then necessitates that the user is prompted for the ControlConstructScheme that the new QuestionConstruct lives in. So what actually has to happen is this

  1. Search for a question using some search parameters
  2. If a suitable question is found, look at the list of QuestionConstructs (each with their own different contexts), and drag the appropriate one into the sequence. Nothing further needs to be done.
  3. If an appropriate QuestionConstruct doesn’t exist, create it with its own label and description.
  4. Prompt the user for where the QuestionConstruct should be maintained
  5. Search for a ControlConstructScheme using some search parameters, selecting the appropriate one.
  6. If none is found, create one with its own label, description, version, etc…

Here the simple act of reuse has tripled in size, now requiring the survey designer to understand more of the DDI model than necessary, as well as in many cases having to then become administratively responsible for further content than just their original survey content.

Use case 2: Creating a question

However this user interaction becomes much more complex when a user wants to add a new question. Again this should be a relatively simple affair, where a survey designer has made the decision that a new question needs to be created. In DDI terms, they wish to insert a QuestionConstruct into a Sequence, and create a new QuestionItem in a QuestionScheme . So ideally the designer should need to:

  1. Click to create a new question in the location needed.
  2. Add the corresponding information, such as question text, a label and description and intent.

Again however, this is far from how it would work using a DDI compatible tool.

  1. Click to create a new question in the location needed.
  2. Add the corresponding information, such as question text, a label and description and intent.
  3. Prompt the user for the QuestionScheme where the QuestionItem should be maintained.
  4. Search for a QuestionScheme using some search parameters, selecting the appropriate one.
  5. If none is found, create a QuestionScheme with its own label, description, version, etc…
  6. Create the necessary QuestionConstruct with the corresponding information, such a label and description.
  7. Prompt the user for where the QuestionConstruct should be maintained
  8. Search for a ControlConstructScheme using some search parameters, selecting the appropriate one.
  9. If none is found, create one with its own label, description, version, etc…

Here the act of simply adding in a new question is a 9 step process. It can be argued that not all of the steps are necessary, or that content for ‘unimportant metadata’ could be filled in at a later stage, but this means that objects remain empty for an indeterminate amount of time or relies on conventions to hide information from users, e.g. A QuestionItem can only link to one QuestionConstruct so they can be treated as ‘the same’. However, while valid DDI, this violates the ‘spirit of the standard’.

Why is this important?

Ultimately, users and their tools make or break a standard, if no one can write DDI, or write tools that write DDI, or write tools that people want to use, then the very purpose of the standard is called into question. But the wider implication is this, the reuse of content stored as DDI is contingent on its reuse, but it must initially come from somewhere.  Perhaps in its current state DDI can be made to work for post-hoc research archivists. However, it is still lacking as a living standard where it can be used through the survey lifecycle simply due to the over engineered state.

How can this be resolved?

Firstly, by drastically simplifying the content requirements and referential structure in DDI, and this will be achieved by talking with users and determining their needs. Archivists, survey researchers and central bankers will all have very different needs from each other as they all do wildly different things. While its not infeasible that one standard could meet their needs, it comes from identifying their needs first. As a first step I offer this as an opening question: Does anyone actually want to reuse just a single question? I ask this as in my limited experience, I’ve seen that people really just want to be able to reuse large modules of questions, a limited number of questions with their own internal logic can be reused across a number of areas. It will probably come to mind that the question of ‘Sex’ is reused across almost any population research, but the rebuttal is does anyone ever ask Sex, but not Age?

The DDI Identity Crisis and how to solve it – Part 1 : Versions and Identifiers

This is a 2 part post that examines the the different classes of identifiable object in DDI, and offers critiques for their current design and possible improvements to the standard with the aim to simplify the model and (hopefully) improve the uptake of people using the standard. But first we need to have a quick look at what the 3 different classes of identifiable object in DDI are and where they are used, in an increasing order of complexity:

  1. Non-identifiable – We’ll include this as the ‘base’ case of any DDI object that isn’t capture by those above. These objects are mostly used to capture basic metadata concepts, such as labels or descriptions for more complex objects.
  2. Identifiable – Objects that only require an ID attribute. These are mostly basic metadata, and below I’ll show the shady distinction between identifiables and non-identifiables being blurry and why these objects probably don’t need identifiers at all.
  3. Versionable – A level above identifiables, these require a version and an ID. This is probably the most commonly encountered type of core attribute, as they comprise the bulk of the survey objects people are used to dealing with – such as questions, variables and codelists. Further down I talk about how these objects don’t need a version, along with the administrative burden it adds – without a clear benefit.
  4. Maintainable – The most complex identifier – with an ID, a version and a reference to a maintainance agency. Maintainable objects are mostly used as either container objects, such as schemes, resource packages or groups; or high-level and survey wide objects such as Study Units or Archival objects. In the following post I’ll show how they are currently managed, and how they can be better managed as XML objects to simplify RESTful interfaces for DDI.

Identifiable objects don’t need identifiers

Identifiable objects are the subset of all objects within DDI that have only an ID, but no version or agency. In DDI, since ID attributes are only required to be local to the parent maintainable, this means that the reference an identifiable, its ID isn’t enough, you also needs the ID of the parent object as well! So while an identifiable can be referenced, to access it, it is necessary to first identify and gather the parent resource.

This becomes  interesting when we examine the list of objects which are only identifiable (not versionable or maintainable), shown below:

Abstract
Abstract
Access
ActionToMinimizeLosses
Attribute
Coding
CollectionEvent
CollectionSituation
CoordinateGroup
CreationSoftware
DataCollectionMethodology
DataFileIdentification
DefaultAccess
DeviationFromSampleDesign
Embargo
ExternalAid
ExternalInformation
ExternalInterviewerInstructionReference
Geography
GrossFileStructure
GrossRecordStructure
LifecycleEvent
Location
LogicalRecord
Measure
ModeOfCollection
Note
OtherMaterial
PhysicalRecordSegment
ProcessingEvent
Purpose
Purpose
RecordRelationship
Role
SamplingProcedure
Software
SpatialCoverage
TemporalCoverage
TimeMethod
TopicalCoverage
VariableSet
Weighting

All of these objects constitute (at least to my mind) very basic, textual and contextal dependent metadata. Concepts like an ‘abstract’ or ‘purpose’ only really make sense given the context of what you are summarizing. This is reinforced by the fact that this information can only be gathered by finding the object you are summarising first, before getting this information.

Which leads us to ask – what make identifiables different to non-identifiables? In my opinion, nothing – its a distinction made on convenience. Again, in my opinion, identifiables exist because Notes exist. Because the methods for extending and improving DDI were not made more obvious to early adopters, DDI Notes have become the most common way to annotate objects, and given the referential nature of Notes, this requires objects to have identities.

The solution: Remove IDs from identifiables – If Notes are deprecated as a solution, IDs on identifiers are no longer needed and there is no other reason to identify them and they can be scaled back to the ‘non-identifiable’ class of object.

 Versionable objects shouldn’t have versions

Versionable objects are the set of objects that have both an ID and a version, and (as the DDI User Guide states) “are elements for which changes in content are important to note.” However, both versions and maintainables have a version, that supports the tracking of changes to an object. This causes a very interesting problem to occur when dealing with objects in practice – the identifiers of objects can change, without them having changed at all!

Lets look at an example, with a maintainable QuestionScheme called QS1 with version 1, and two versionable Questions, Q1 and Q2, both on version 1 as well. Since the full identifier for a versionable is also comprised of its parent, the full ID for the most recent version of Q1 takes a form similar to QS1:V1|Q1:V1, simple enough. A problem arises when Q2 is changed to be version 2. Technically, since Q2 is a child of the QuestionScheme QS1, it has also changed.

Now, the complexity is that QS1 has changed, so the full ID for the most recent version of Q1 has now changed to, QS1:V2|Q1:V1. Which leads to the academic question – if Question Q1′s parent has changed, has Q1 itself also changed, meaning that to be apart of the updated parent it also needs a new version?

The discussion to resolve this problem with DDI versionables has actually been kicking around for quite a while, but again the solution for this is pretty clear as the section header states. The first thing to recognise is that all versionable objects are already versioned by their parent object, so strictly speaking, given only the full ID for the parent, and the ID of a current versionable, it is possible to identify a single object for the simple fact that all IDs on objects must be unique within their parent maintainable.

So by removing the version from versionables, and relegating them to instead be identifiables we simplify the model for abstract types in DDI is reduced to two classes, with very clear intentions. In the new model identifiables are objects which are reused through references within other objects to construct rich, linked metadata constructs, while Maintainables are the versioning objects that are used by agencies to administer cross-survey and cross-cycle metadata holdings.

However, as we’ll see in the next post, this change actually helps us take advantage of a number useful XML technologies to simplify the learning process for DDI, for implementers and developers alike.

Next up: How Maintainables aren’t properly maintained

In the next post, I’ll cover how to simplify the DDI XSD Schemas to take advantage of XML identities by removing inline schemes and restricting base elements to simplify identification and URI design, so DDI can utilise URLs and XML fragments to precisely define objects for RESTful interfaces.

When DDI isn’t enough Part 3 – Picking the right approach to improving the standard

Warning: This post is a wordy and contains some pretty heavy-handed criticisms about DDI and its implementations – so I’ll reiterate that this is my personal opinion as an open-source developer working with DDI-Lifecycle.

So in two recent posts, I presented 2 alternative approaches(2) to extending the DDI information model – one method using XML Substitution Groups, the other using XSI Type Extensions - which leads us to ask which is better? The unhelpful, fence-sitting answer is both have their advantages – while being a vast improvements on the DDI Note Element.

[DDI Notes] limit the ability for implementers to coexist with the future vision of the standard

Ultimately, Notes are the last way one would want to extend the DDI specification becuase they are wholly unstructured (despite the ironic fact that the Content of the Note is of a StructuredStringType). The content of this note can take any form – plain text, html, csv, even XML embedded within CDATA – and a receiving party, be it a person or software, will in most cases be none the wiser about the structure of the content. Given how far removed Notes can be from the extended object, the receiving party may not even be aware of any extension at all.

 The advantage is that users could create XML extensions, generate their documentation, push it upstream, and limit the amount of change in their own systems - its a clear case where self-interest helps everyone.

This lack of structure ultimately, removes any benefit of using a standard – to that point that in my own personal opinion, Notes should be deprecated as soon as practical. This is due to people being able to add Notes when a better approach within the standard is either difficult to find, or hard to match to existing legacy software. Furthermore, they limit the ability for implementers to coexist with the future vision of the standard. The two approaches I’ve illustrated conform to good XML practice and encourage good documentation of changes, but they also provide direct ways to extend the standard in sharable ways, while also providing starting points for future changes. Since both methods require creating XML Schema to properly document the extensions, it is not unfeasible that schema created in this way could almost be directly imported into new versions of the standard. The advantage is that users could create XML extensions, generate their documentation, push it upstream, and limit the amount of change in their own systems - its a clear case where self-interest helps everyone.

But, having gotten down from my high horse…

Substitution Groups give the ability to create entirely new custom objects, they are unable to be easily recognized by existing tools they are limited on where they can go based on the original and unchangeable schema. While on the other hand, XSI Extensions can be used anywhere and on any element, without causing trouble for existing tools, they can only add information to existing to help fill data gaps. Ultimately, it is up to the implement to weight these benefits to determine the correct approach, before sharing these solutions with the community to support maintained ongoing support for the standard.

Note: I should point out that in the second post on XSI:types I specifically singled out a Note use in the Colectica implementation of DDI that could have been rewritten as using XSI extensions. Since the above diatribe could be read as an overly harsh criticism of their use of Notes, I feel that it should be stated that notes within DDI exported from Colectica are well-documented and Algenta provided support that helped push the time taken element into DDI 3.2.

When DDI isn’t enough Part 2 – XSI Type and DDI

So a colleague left a comment on the last post of extending DDI that brought my attention to the use of XSI:Type extensions to XML elements, that for lack of a better term make my last post look like childs’ play! After having a quick look, this technique can basically be used to make additions to practically every part of an XML-based data model – such as DDI. The important question is how does it work?

When we add an element is definition is implicitly determined by its namespace and element. This definition tells us  exactly what attributes and elements are required or optional. What we can do, is add an explicit type to the element that allows us to add an extended definition to the element.

For example, in the last post, there is a demonstration of an Extended Conditional Text object that includes default and static text options. The downside of this is that a tool that handles the basic (non-extended) DDI 3.1 schema would not be able to use this content as it is, for all intents and purposes, hidden. An alternative approach is to use the ExtendedConditonalTextType we defined in the previous blog post, and instead of creating a new element, declare our standard DDI ConditionalText to be an extension of this within the XML, like so:

<d:ConditionalText xsi:type="xd:ConditionalText" xmlns:xd="ddi:ExtendedDataCollection:3_1">
    <d:Expression>
        <r:Code programmingLanguage="Pseudocode">if sex == 'Male' {return 'he'} else if sex == 'Female' {return 'she'} else {return 'they'}</r:Code>
    </d:Expression>
    <xd:Default>...</xd:Default>
    <xd:Static>he/she</xd:Static>
</d:ConditionalText>

What this achieves is the ability to add(1) additional elements to the ConditionalText, without having to create a new element. Any software that can process an element of this type can continue to work, without having to accomodate any changes, and any additional elements will be (or should be ignored).

As a second example of an extension thats already being used we will look at Algenta’s Colectica tool, which is probably the leading DDI Editor available. This software introduced the ability to document the approximate time taken to complete a question. While this “time taken” content is being add to the DDI 3.2 specification, in DDI 3.1, this information is currently stored as a Note, making management and distribution of this information difficult (we will cover why Notes are difficult to manage in the next section of this now 3-part tutorial).

An alternative approach is through the creation of a new XML Schema complex type combined with the use of a similar XSI:Type extension. Below is an example of the XML Schema required to describe the additional element required.

Here we see the declaration of the element type, as well as its extension and lastly the new element <ApproximateTimeToComplete>. Its important to note that rather than having a basic numeric string for seconds or minutes, we are reusing the XML data type, xs:duration - an implement of the duration portion of the ISO 8601 Date Time standard.

When we combine these we get a QuestionItem that looks similar to that below:

<d:QuestionItem id="exampleQuestion" xsi:type="xd:QuestionItemWithTimeTaken">
    <d:QuestionText>
        <d:LiteralText>
            <d:Text>You told me your dog likes to play fetch, what does </d:Text>
        </d:LiteralText>
        <d:ConditionalText xsi:type="xd:ExtendedConditionalTextType">
            <d:Expression>
                <r:Code programmingLanguage="Pseudocode">if sex == 'Male' {return 'he'} else if sex == 'Female' {return 'she'} else {return 'they'}</r:Code>
            </d:Expression>
            <xd:Default>...</xd:Default>
            <xd:Static>he/she</xd:Static>
        </d:ConditionalText>
    </d:QuestionText>
    <d:TextDomain/>
    <xd:ApproximateTimeToComplete>PT2M30S</xd:ApproximateTimeToComplete>
</d:QuestionItem>

When this is all put together, we get an XML fragment, that can be widely understood by DDI compliant software, but also contains additional metadata necessary for specific agencies or applications.

Just like last time, the full code for the above examples is available on pastebin – with the Extensions schema, and the example DDI Instance both available for review. In the next post I’ll go over each of these two approaches and cover their advantages, pitfalls, and when to use each – as well as covering why with both of these approaches, why Notes are unnecessary and what implications this has for the standard in general.

Footnote:

  1. As of yet I haven’t figure out how to remove elements (or if it is even possible) … I wouldn’t hold your breath for this one.

When DDI isn’t enough Part 1 – XML Schema Extensions and DDI

No standard is perfect – in fact the DDI specification made this quite clear through the inclusion of the ‘Note‘ object to support extensions and to hold additional information. However, DDI Notes are usually seen as a mechanism of last resort for describing structured content as they are by their very nature unstructured. There is however an intermediate solution between the implementation of Notes and leaving out vital information or using less optimal modeling to document everything. The way that I’ll demonstrate here is through the use of XML Schema substitution groups.

From the XML Schema Documentation on substitution groups:

XML Schema provides a mechanism, called substitution groups, that allows elements to be substituted for other elements. More specifically, elements can be assigned to a special group of elements that are said to be substitutable for a particular named element called the head element.

In essense, this allows for a schema designer to specific what classes of element can validly exist within an XML tree, before designing more complex child elements. Similarly, it allows for extensibility by third-party designers.

Within DDI Lifecycle there are a number of Substitution groups that can support these kinds of extensions.

For example, the ControlConstruct and ControlConstructScheme are used in this manner to support the inclusion of complex questionnaire logic.

<xs:complexType name="ControlConstructSchemeType">
    <xs:annotation>
        <xs:documentation>A set of control constructs maintained by an agency, and used in the instrument. </xs:documentation>
    </xs:annotation>
    <xs:complexContent>
        <xs:extension base="r:MaintainableType">
            <xs:sequence>
        <!-- Elements removed -->
                <xs:element ref="ControlConstruct" maxOccurs="unbounded">
            <!-- Elements removed -->
                </xs:element>
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>
<xs:element name="ControlConstruct" type="ControlConstructType" abstract="true">
    <!-- Elements removed -->
</xs:element>
<xs:element name="IfThenElse" type="IfThenElseType" substitutionGroup="ControlConstruct"/>

Here the ControlConstructScheme declares the existance of a ControlConstruct child element, while the ControlConstruct acts as the Head Element for the substitution group by declaring it to be an abstract, which supports the declaration of the IfThenElse element as a part of this substitution group.

To extend this we can create a new element, and declare it as an extension of the ControlConstruct, to support a new metadata object. A trivial example is below:

<xs:element name="Foo" type="FooType" substitutionGroup="d:ControlConstruct"/>
<xs:complexType name="FooType">
    <xs:complexContent>
        <xs:extension base="d:ControlConstructType"/>
    </xs:complexContent>
</xs:complexType>

Here the Foo Element is defined as a part of the ControlConstruct group, of the complex FooType, which has ComplexContent based on the ControlConstructType as defined in the head element. Provided that the XSD that defined this new element was included correctly within the final DDI Instance, this would be valid DDI 3.1 XML. This means that the following fragment with the correct imported schemas would validate as DDI:

<d:ControlConstructScheme id="FooBar">
    <sqdx:Foo id="Bar"/>
</d:ControlConstructScheme>

Now, lets look at this in practice. The ConditionalText element in DDI is used to document the existence of dynamic text in a survey instrument – be it as part of a question, statement or instruction. A conditional text exists as a part of the substitution group Text in the DataCollection Module. One issue with this element as it exists, is although it defines how it should display the dynamic text, there is no declaration of what the default text may be, or what to display in a static environment. We can however, improve this through the creation of an extension of this using the above techniques, as shown below.

<xs:element name="ExtendedConditionalText" type="ExtendedConditionalTextType" substitutionGroup="d:Text"/>
<xs:complexType name="ExtendedConditionalTextType">
    <xs:annotation>
        <xs:documentation>Text which has a changeable value, based on a condition expressed in Code. This is an extension of the standard DDI ConditionalText in the DataCollection Module, that provides support for default values for conditional text and text for static environments.</xs:documentation>
    </xs:annotation>
    <xs:complexContent>
        <xs:extension base="d:ConditionalTextType">
            <xs:sequence>
                <xs:element name="Default" type="r:StructuredStringType">
                    <xs:annotation>
                        <xs:documentation>The text to display prior to a dynamic change of text in an electronic environment.</xs:documentation>
                    </xs:annotation>
                </xs:element>
                <xs:element name="Static" type="r:StructuredStringType">
                    <xs:annotation>
                        <xs:documentation>The text to display when dynamic changes of text are not available. For example, on paper forms or non-dynamic electronic forms - such as javascript less environments.</xs:documentation>
                    </xs:annotation>
                </xs:element>
            </xs:sequence>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

In the above XML fragement, the ExtendedConditionalText is defined as an extension of the standard DDI ConditionalTextType, with additional elements defined as necessary.

<d:QuestionText>
    <d:LiteralText>
        <d:Text>You told me your dog likes to play fetch, what does </d:Text>
    </d:LiteralText>
    <sqdx:ExtendedConditionalText>
        <d:Expression>
            <r:Code programmingLanguage="Pseudocode">if sex == 'Male' {return 'he'} else if sex == 'Female' {return 'she'} else {return 'they'}</r:Code>
        </d:Expression>
        <sqdx:Default>...</sqdx:Default>
        <sqdx:Static>he/she</sqdx:Static>
    </sqdx:ExtendedConditionalText>
</d:QuestionText>

This use of XML Schema extensions then means, that not only is the data ctructure properly defined and sharable using standard XML technologies, it also provides an easy way for defining possible advancements for future versions of the standard.

So, where can these extensions be used in DDI – here is a list of some of the substitution groups that exist in DDI 3.1:

So where any of these substitution groups exist, a newly defined object could take their place. However, there are a few place where substitution groups would be advantages for future versions, the two main ones being a substitution group for Questions for incusion in QuestionSchemes, and as a replace for the reusable Code element to allow for more defined, system-independant and reusable logic within DDI.

Lastly, the example Schema for the above ExtendedConditionalText is available on Pastebin, with a more indepth example showing how a Case/Switch control construct could be created to define higher-order questionnaire logic.
There is also an example DDI instance on Pastebin that has concrete examples of all of the extensions listed.

DDI Tip of the Day – Google is now your friend!

A relatively recent, but unadvertisted change to the DDI Alliance website was the inclusion of a link to the Field Level Documentation on the front page. This small change has allowed the search engine web crawlers to mine through the Field Level Documentation, which has made the documentation much easier to search through.

So if you are looking for information on a DDI element, attribute, scheme or phrase something else in the XML schema itself, a simple search for “DDI” and your search term should bring up exactly what you want, like so:

Beware that sometimes Google might try and be helpful and split an element into separate words and give you incorrect search results, for example giving results for “DDI Control Construct” instead of “DDI ControlConstruct“. However, all you need to do is wrap the element name in quotes and it should give you better answers.

DDI, marketing and how to sell a standard

While the DDI-Lifecycle is an excellent standard for research, statistical and social science metadata, it is still relatively unknown outside of a small community of agencies – and even within those agencies it is still relatively obscure. What the problem is, isn’t a lack of experience working with the standard, its a lack of communication of this experience, especially to new users.

Communicating to new users, especially non-technical ones, requires being able to think like a novice. This means being able to present information in a way that is accessible and engaging. Accessible so a user isn’t overwhelmed with information and engaging so they have an incentive to learn.

Which brings us back to the problem in the DDI community – being inundated with experts, it is very difficult to get into the mindset of new users. While its true that “DDI can be used to describe the entirety of a social science survey”, ” the entirety of a social science survey” is quite a lot of metadata with no easy entry point. To solve this we need to make the standard more accessible – by providing easy to follow starting points for the standard – and engaging – by displaying this information in a clear and understandable way.

So, to this end, I have produced what will hopefully be the first in a number of posters and handouts to help promote DDI available in the posters section or you can click on the image below to get a fullsize copy of the DDI visualisations poster from IASSIST 2012. But if you have any ideas for for other possible ways to present DDI in an accessible and easy to illustrate way, feel free to add a comment below.

DDI Poster Thumbnail

Managing Questions in DDI3.1 – “Other, please specify”

A still difficult problem in managing complex questions in DDI is those questions that ask a respondent to pick from a list of options, and if no suitable ones exist, that they write in their own. Below are examples of this kind of question from the US, UK and Australian Censuses (censii/census/censes?):

UK Census Extract
USA Census Extract

ABS Census Extract

In all three questions, respondents are asked about their origins, and are given the option to select from a list of common responses or provide a write in response. The easiest way to manage this is through the use of a DDI <MultipleQuestionItem>. A <MultipleQuestionItem> is a way to capture a complex question that asks two or more separate questions that are highly linked.

In the above examples we can split the questions into two, as illustrated in the generic answer below:

<MultipleQuestionItem>
    <SubQuestions>
        <QuestionItem>
            <QuestionText>
                <LiteralText>
                    What is your ancestral origin?
                </LiteralText>
            </QuestionText>
            <CodeDomain>
                <!-- This CodeDomain would include a reference to the list of countries or races -->
            </CodeDomain>
        </QuestionItem>
        <QuestionItem>
            <QuestionText>
                <LiteralText>
                    Please Specify:
                </LiteralText>
            </QuestionText>
            <TextDomain/>
        </QuestionItem>
    <SubQuestions>
</MultipleQuestionItem>

Here we have been able to split the question, while still managing it in a single item. This is needed as without each other, each subquestion is incomplete. This is not a new concept, and is quite an obvious solution to many people who have tried to solve this issue.

However, there is still the problem that this metadata doesn’t contain the restriction that a respondent should only be able to enter a free text option if the “other” option is selected. While there have been a number of published and attempted solutions, none have been satisfactory. Spliting the question outside of a MultipleQuestionItem and using IfThenElse clauses complicates the structure, and leaving this out makes designing self-interviewed computer systems difficult to manage directly from the metadata.

A possible solution, that resolves both of these issues is through the use of the <SubQuestionSequence>. This is illustrated in the DDI Fragment below:

<MultipleQuestionItem>
    <SubQuestions>
        <QuestionItem>
            <!-- Ancestral origin QuestionItem -->
        </QuestionItem>
        <QuestionItem>
            <!-- Please Specify QuestionItem -->
        </QuestionItem>
    <SubQuestions>
    <SubQuestionSequence>
        <ItemSequenceType>Other</ItemSequenceType>
        <AlternateSequenceType formalLanguage="Name Of Language Here" >
            <!-- Proprietary command to control logic -->
        </AlternateSequenceType>
    </SubQuestionSequence>
</MultipleQuestionItem>

In this we have used the SubQuestionSequence to hold the logic used to indicate when the “Other” field should be allowable. This field is used to control the specific sequence that the SubQuestions are shown, and in this sense we are controling this ordering, just to specify when a member is not shown – an excusable use of the field. This choice can be further rationalised, as an unfamiliar agent, for example when moving to a new piece of software, can still interpret the bulk of the metadata, however when presenting the above question would allow a respondent to fill in both sections. But this is no different to how a respondent of a paper-based survey may answer, so it is no great loss of granularity.

How any given agency may choose to populate the commands contained in the AlternateSequenceType will be an individual choice, and a standard way of expressing this may be needed, but this should help other groups more easy solve this problem by indicating where the solution can go and reducing the problem size.

In the next day or two I will be putting a more solid example up into the DDI Examples Repository for people to work with. As always critiques of these ideas and examples are welcome.

Always double check the standard before writing code

A few weeks ago, I had the privilege of presenting at a collection of DDI Developers in Gothenburg at EDDI. There I presented one of my larger pieces of work, the Virgil-UI DDI Codelist Editor, for critique. While there I received advice, praise and most importantly constructive criticism for which I am grateful. However, this has brought to light a rather large problem.

It was pointed out that I made a small error when dealing with <Code> elements in DDI and accidentally gave them @id attributes, and it was noted that this should be an easy fix. Unfortunately, due to my missing this very early on in the development of Virgil the underlying model relies on Codes having ids to be able to easily make connections between the hierarchical user interface, the <Code>s and the <Category>s that give them meaning.

What this means is that both the DDI coming out of Virgil is invalid, and any valid DDI would not actually be able to be read by Virgil. Essentially, the Virgil model for handling DDI is broken and needs to be almost entirely rewritten and this might take quite a while.

Unfortunately, at this stage rewriting also means re-examining a lot of the initial ideas about what Virgil should be and has highlighted some interesting questions about the DDI model and DDI software, such as:

  1. Is abstracting the DDI model away from a user a good approach to software design? Yes.
    This was the crux of my talk at EDDI, and I still feel that abstracting the DDI model away from day-to-day users is necessary. The DDI model is complex and covers a wide range of tasks. I believe that designing software that helps users relate the model to specific tasks they are trying to do is a key to getting people to use DDI and think about how they can make their metadata support themselves and those around them.
  2. Is DDI a standard that is suitable to use for day to day management of information? Probably.
    In practice, the DDI standard needs to be able to be passed between software if it is to move from an archival standard to a practical statistical metadata standard. One of the things I wanted to achieve with Virgil, was a tool that not only produced DDI, but could also consume it from other sources. In the simplest case this to me meant being able to take a DDI file, and edit the contents of part of it, leaving the rest untouched, and in a lot of cases this is possible with DDI. However, since having to rethink how to manage classifications using DDI, I have realised that there are some objects that are not captured well within DDI and unfortunately classifications are one such example.
  3. Is the DDI model for managing codelists and classifications good enough? Sadly not.
    One of the reasons I relied so heavily on the invalid <Code> @ids was that I needed a hook to tie codes and categories together and without this it becomes very difficult to manage what a ‘classification’ is in DDI. Furthermore, classifications don’t exist in DDI per se, but are a rather loose agreement that if you combine <CodeScheme>s and <CategoryScheme>s you get a good approximation. However, this falls apart when we try to document the classification itself.
    For example, where do you store the name of a whole classification? There are three viable places (excuse the XPath) – as a //CodeScheme/Label (being the label of the hierarchy), as a //CategoryScheme/Label (being the label of the collection of classifying categories) or as a //LogicalProduct/Label (the label of the immediate parent that contains both the hierarchies and the categories).
    However, each of these approaches has inherent issues, as neither of these are the documented way to manage this information, and if 3 different agencies approached the problem in different ways, then their metadata becomes incomparable. This needs to be discussed further, as it will become a bigger issue as more tools start to try and manage such an important, and conceptually early in the lifecycle piece of metadata.

It should be noted that these issues don’t excuse overlooking the actual standard leading to this predicament. However, given the chance to re-examine how to correct the problem in Virgil, also gives me a chance to examine some of the issues I came across while trying to maintain classifications within DDI. Over the coming month or so while I am going to continue writing up some of the issues I identified with classifications within DDI3.1, how to work around these in the short term, and look at ways to correct the problem in future versions of the standard.

Lastly, in the short-term there will be an update to correct the Code/id problem in the CSV to DDI conversion, so the original use case of being able to mine legacy systems to produce valid DDI will still be filled.

Thanks again to everyone at EDDI for their input and company.

Farewell to Europe (and EDDI) for another year

Here I sit in Helsinki Airport, awaiting a bitter sweet flight home. While it is always good to go home and be with my family and friends, I know I am leaving quite a few behind here in Europe and beyond.

By all accounts, the European DDI Users Group meetings were a great success. Along with seeing all the work people have done of the last year, we were able to sit and discuss and debate for several days and have a solid plan for future work.

While I was only at the Developers meetings, we covered improvements to the website, new ways of managing large DDI instances in relational and non-relational databases, examined new (and forgotten) ways to design software, debated the best ways to handle automated ID creation, listened to the results of the semantic DDI workshops, learned about the DDI Agency Registry, debated reducing or removing namespaces from DDI, raised the possibility of a shared DDI Blog/News aggregator and started the creation of not one, but two major additions to the DDI community – a new transport element nicknamed “The DDI Bucket” and started laying the groundwork for a DDI RESTful web interface standard.

And that was in just 3 days! And I am still eagerly awaiting to see how the “Data Without Borders” and “Longitudinal DDI” workshops went.

The week was made even more productive by the use of Google Docs to create a single, living recollection of the event. Watching everyone type up their notes in real time was great. Over the next few week I (and hopefully the rest of the DDI Developers community) will continue to clean up our collaborative notes and look forward to presenting information and recommendations to the whole DDI community in the new year.

We also discussed upcoming meetings for the DDI Developers group and 3 possibilities were raised, at IASSIST in June, RC33 in July and EDDI next December. While events will most likely go on at all of these events, I strongly encourage those who can come to RC33 to be held in Sydney next July to speak up or at the least contact me in private. There is a wealth of talent in Australia and New Zealand who are well worth getting in contact with and with a large enough group of DDI members in Australia I think a “DDI Developers Down-under” would be well attended and well worth the trip.

So with that in mind thankyou to everyone in the DDI Community for a great week – and especially to Olof Olsson of SND for kindly offering me a place to stay during the week. It was a fantastic week, and served to remind me how if you work hard you can contribute to a community, being called upon to answer questions during the meetings (and once during the question time of someone else’s talk!) was especially flattering. This has truly re-invigorated my love of metadata (I spent the better part of my evenings in Rome madly writing ideas for tutorials and examples I foolishly volunteered for during the meetings)

So, with that I wish the entire DDI Community a Merry Christmas, Happy Holidays and Happy New Year and look forward to seeing everyone again in the new year, be it in Washington for IASSIST 2012, Sydney for RC33 or wonderful Bergen for EDDI 2012!!!

Arrivederci

Arrivederci

https://lh6.googleusercontent.com/-Bt7M3EmQOVo/TuY2PWCS8hI/AAAAAAAAEOc/nmKN-M6tK-M/s512/IMG_20111211_184959.jpg