There was an oversight with the CSV converter not converting coded values to the proper place in the created DDI XML. This has been fixed and the changes have been pushed into SVN and a new version (0.0.2b) of the executable has been released on Google Code.
With the recent release of the new Australian Standard Classification of Drugs of Concern from the ABS, there was the opportunity to field test the Virgil CSV to DDI converter with real data to see how it held up. Fortunately, the classification was released as an Excel data cube that conformed almost entirely with the structures that Virgil supports. After a little cleaning of the CSV, it was able to run through the converter without few issues at all. Incidentally the most major error highlighted the massive oversight that the converter fails to add values for the codes! However this has been corrected and changes have been pushed in the svn, and a new version of the Windows tool will be pushed out this weekend.
A screen shot of Virgil with the converted classification
Opening the newly created DDI file in the Virgil DDI CodeList Editor was another story and pointed out a few flaws with how it handles empty data. With the structure from the Excel file not containing descriptions for any category or any labels for the CodeScheme, there were a few small corrections made to accommodate freshly created DDI, but many of these problems will be ironed out by the time the CodeList editor is available for download.
While the converter hasn’t been fully integrated into the CodeList Editor, it will shortly be possible to create a single DDI file and import numerous CSV files to create a series of classificatory codelists in a single package. A practical and soon to be realised example would be the Australian Standard Classification of Drugs of Concern with the lists of drugs of concern, forms of drug and methods of consumption codelists all contained in a single machine processable DDI package.
Virgil UI is now starting to get ready for release to the public, with the first step being that it now has a Google Code project site, which is starting to include.
The first code to go up in the public repository is the utility code for the CSV to DDI conversion described in the last post. This includes the specialised CSV parser, library to create DDI 3.1 Codes and Categories and a sample command line interface to pull it all together.
For those who can’t wait for either a GUI tool or an executable command-line app, or just want to play with the code, feel free to grab the source. Keep in mind that this is very much pre-beta and is under active development, but if (or more likely when?) you come across a bug, be sure to report it through the issue tracker on the site. To help folks along a few sample CSVs are included to give developers an idea of the required format for the converter to consumer, but that isn’t a conclusive list of all the possible combinations of code and category list types.
In lieu of actual usage documentation (which will be added during the week) below is a sample execution:
python2.7 ./converter_cli.py -i ./test_files/anzsic.ss-pd.csv -c SemiStructured -C PreDefinedColumn -o outfile.xml -d DDIInstance_ID --codeSchemeID=Test_codeSchemeID --categorySchemeID=Test_categorySchemeID
python2.7 ./converter_cli.py - Needed to execute the script
-i ./test_files/anzsic.ss-pd.csv - The CSV file to transform
-c SemiStructured - The CSV CodeList type (see previous blogpost for more info)
-C PreDefinedColumn - The CSV Category type (see previous blogpost for more info)
-o outfile.xml - File to save the DDI to, if blank output to console
-d DDIInstance_ID - The ID for the new parent DDIInstance of the resultant file
--codeSchemeID=Test_codeSchemeID - ID for the CodeScheme that will hold the codes - is also part of the prefix for all DDI code IDs
--categorySchemeID=Test_categorySchemeID - ID for the CategoryScheme that will hold the codes - is also part of the prefix for all DDI category IDs
note. This does need Python 2.7 to use some of the more advanced XPath options in ElementTree that the DDI module uses.
When I’m not writing about writing code, I occasionally get to hop into a terminal and tear out a few lines of code. While Ramona was a bit of a bust that needs to revisit the drawing board before its ready to leave the nest, Virgil has taken off. Virgil is something I’ve been doing in-between other tasks with the sole purpose of allowing users to edit and manage CodeLists managed in DDI. This is based on work I did mid-last year to turn DDI Code and Category Schemes into interactive webpages. To support this I’ve been working on a tool to allow users to properly edit Codelists in DDI.
A CodeList is a combination of two DDI objects, a CodeScheme and a CategoryScheme and enables users to manage complex hierarchies of coded information, as small as codifiying “Yes/No” responses to managing large industrial classifications.
The video got downscaled when it was uploaded (pressing the expand button helps) but for those having trouble understanding whats in the video, the features demo’d in the video are:
Open the ANZLIC DDI File in the Vim text editor and searching for the term “LOOK HERE”. This search term isn’t in the file… yet
Virgil-UI is run and the same file is loaded
Data from the DDI File for a Category is loaded and is displayed in English and German
The term “LOOK HERE” is added to the description of a category and the file is saved
The file is then reloaded in the Vim text editor and the term “LOOK HERE” searched for
The search term “LOOK HERE” is found
When ready (hopefully mid-August for open-beta) Virgil-UI will be released under an free open-source licence and will support the following features – ** Indicates a feature that is fully or partially implemented already
** Complete multilingual support, for both the UI and multilingual DDI files.
** DDI3.x file support
** Full rich-text editing for DDI Descriptions and Labels
** Support for Windows, Mac and Linux
* Export support for Virgil-Web an existing tool for generating Web-pages from DDI CodeLists
* Import from CSV
* Drag-and-drop re-ordering of CodeLists
Planned features after the initial release include:
* DDI2.x file support
* DDI3.x support from a custom-built repository
* DDI3.x support from a Colectica repository