Posts Tagged ‘ code

Autocompletion using PyQt4 and QScintilla

I’m currently toying with the idea of creating a specialised code editor for questionnaires in Python using PyQt4. One of the best plugins for Qt for this job is QScintilla, a wrapper around Scintilla - an open source editing component. Scintilla to its credit has a huge amount of great features, including code-folding, syntax highlighting and auto-completion of words. But the documentation can be a little lacking at times, and there aren’t a whole lot of examples on how to use some of the features.

The one feature, that I was trying to implement was auto-completion, but I had no luck finding python examples that demonstrated everything you needed to get it working. So, after some searching, some hacking, some crying and then finally some reading for the documentation, I came up with the following minimal example with auto-completion in a basic editor:

#!/usr/bin/env python
# -*- coding: latin1 -*-
 
"""
Basic use of the QScintilla2 widget
 
Note : name this file "qt4_sci_ac_test.py"
Base code originally from: http://kib2.free.fr/tutos/PyQt4/QScintilla2.html
"""
 
import sys
from PyQt4.QtGui import QApplication
from PyQt4 import QtCore, QtGui, Qsci
from PyQt4.Qsci import QsciScintilla, QsciScintillaBase, QsciLexerPython
 
if __name__ == "__main__":
    app = QApplication(sys.argv)
    editor = QsciScintilla()
 
    ## Choose a lexer
    ## This can be any Scintilla lexer, but the original example used Python
    lexer = QsciLexerPython()
 
    ## Create an API for us to populate with our autocomplete terms
    api = Qsci.QsciAPIs(lexer)
    ## Add autocompletion strings
    api.add("aLongString")
    api.add("aLongerString")
    api.add("aDifferentString")
    api.add("sOmethingElse")
    ## Compile the api for use in the lexer
    api.prepare()
 
    editor.setLexer(lexer)
 
    ## Set the length of the string before the editor tries to autocomplete
    ## In practise this would be higher than 1
    ## But its set lower here to make the autocompletion more obvious
    editor.setAutoCompletionThreshold(1)
    ## Tell the editor we are using a QsciAPI for the autocompletion
    editor.setAutoCompletionSource(QsciScintilla.AcsAPIs)
 
    ## Render on screen
    editor.show()
 
    ## Show this file in the editor
    editor.setText(open("qt4_sci_ac_test.py").read())
    sys.exit(app.exec_())

Thanks to Kib2 and his example on code for QScintilla2 that this example is based on.

Virgil UI – CSV to DDI converter now available for Windows

The day is finally here – Virgil c2d is available for Windows. You can download the zip archive from Google Code. In future this will be the place that new versions of the tool will be made available, and I am hoping that as people start using it and bug do get noticed that there will be activity, so be sure to check back often to see if changes are available.

For the time being though, download a copy of the beta, checkout some of the example CSVs and  learn about how the different CSV types look.

If you have issues getting the application to run, check the converter_ui.exe.log log file for any errors and be sure to raise a bug through the issue tracker.  If there are issues getting a file to covert check the structure settings are correct, and check the line that the error dialog indicates may be causing the issue. If you are still unable to get the CSV to convert raise an issue and attach the offending CSV file and I’ll see if the problem can be resolved.

When checking out the example CSVs the filenames give some hints to the structure of the data in them:

  • ss: semi-structured
  • mono: monolingual
  • pd: pre-defined language
  • pe: prefix embedded language

For the other files they have type:

  • anzsic 2006 – codes and titles.csv — Semi-strucutred, Monoglingual
  • anzsic.csv – Semi-strucutred, Monoglingual

 

Virgil UI – CSV Converter UI Files now up

Over the week I’ve been coding away wrapping the CSV to DDI converter module with a nice user interface. Well, after a weekend of work it has a user interface, whether it is nice is in the eye of the beholder. As with the rest of the Virgil project the python code for this tool is available on Google Code. Unfortunately I haven’t had time to compile this into a Windows executable suitable for novice use, but interested parties are again welcome to download and test the tool from source.

For the curious, I’ve again recorded a demonstration and put it up on youtube, which is embeded below:

Again there is no audio, but I’ve included a brief transcription below so people can get a better idea of what the demonstration is trying to illustrate:

  • Open the anzsic.csv file to briefly view the contents of the CSV holding the labels and some descriptions of categories in the 2006 Australia and New Zealand Standard Industrial Classification.
  • Execute the conversion tool,  and load the ansic.csv file
  • Select the correct structure options for the CSV, as per the allowed structures described in a previous post.
  • Add a default language code and ID prefix for the DDI Instance and all codes and categories.
  • Demoing the preview table, showing how the header row can be ignored.
  • Convert the file, in the background you can see debug text for each code encountered.
  • Open a folder to save, and confirm the folder is empty.
  • Open the newly created file.
  • Add some line breaks to the automatically created XML  and search for a term from the original CSV.

Hopefully by this time next week there will be a fully downloadable Windows executable available for people to try.

Virgil UI – Converting from legacy to CSV to DDI

While my main machine has been out-of-action, I’ve devoted a more time to one of the first use cases that prompted the development of Virgil – transforming legacy CSVs into DDI 3.1.

One of the main features of Virgil is the ability to help users transition from legacy systems, using non-standard formats to using DDI as the main data language for managing codes, categories and classifications. Unfortunately, there is no way for any one system to support every format for classifications, however by targeting a lowest-common denominator we can process the bulk of the work. In this case the lowest common denominator is CSVs.

If a user or developer of a legacy system is able to transform their legacy format into one of several different CSV formats supported by Virgil, then they will be able to import, at the least the basic structure and metadata of their codes and classifications into DDI. With most of the code for the conversion tools done, I’ve begun putting together the wizard interface for Virgil UI, which will also form part of a standalone conversion tool. Within the next few weeks the standalone conversion tool will be ready for release, and made available as open-source with the supporting code.

Below is a list of questions that users and developers may have around how to prepare CSVs for conversion to DDI listing the convertible metadata, preferred CSV structure and developer support. Although there are restricted possibilities for CSV structuring options for conversion, if there is a need for expanding the formats or metadata available for conversion, make your needs known and this can be incorporated in to future development.


What metadata will be supported?

A user will be able to import the code values and the hierarchy of a classification, as well as labels and descriptions of categories. Labels and descriptions can be multilingual, and multiple languages per item are able to be imported.

Will I have to use Virgil-UI to use this converter

No. This converter will be available as a wizard within Virgil, but the UI for the wizard will be available as a standalone program for users who need to convert from a legacy system to DDI. Lastly, as the code will be entirely open-sourced, the Python module that performs the transformations will be able to be imported into any other Python piece of software. Lastly, since the converter module is written entirely using modules from the Python standard libraries, it will be usable by programs using languages that are compatible or have compatible python compilers – such as Java using Jython[http://www.jython.org/] or .Net using IronPython[http://ironpython.net/].

In summary there will be at least four ways developers and users will be able to implement the Virgil CSV-DDI converter tools.

What ‘formats’ of CSV will be supported?

CSVs are generally without structure, and are just a basic way of storing tabular data, but by using a simple combination of the following code and category forms within a CSV. When picking a structure, it is important that the ‘code’ columns come before any ‘category’ columns. However, and combination of a code and category column format if created correctly should convert from CSV to DDI without trouble.

Column options for importing codes and their hierarchy

Referential CSV Codelist
Order: Code , Parent
Notes: This can be reversed to go Parent, Code. If a parent is blank it is assumed that this node is a top level code in a CodeScheme

Example:
A, ,
1,A,
2,A,
B, ,
3,B,
4,B,

Semi-structured CSV Codelist

Order: (Empty,)*Code,
Notes: If the code is the first entry in a row then it is considered a top code in the CodeScheme. Any children of a code should be indented by only one column. The columns for labels and descriptions start in different columns depending on level of the hierarchy.

Example:
A,
 ,1,
 ,2,
B,
 ,3,
 ,4,

Aligned Semi-structured CSV Codelist

Order: (Empty,)*Code,(Empty,)*
Notes: If the code is the first entry in a row then it is considered a top code in the CodeScheme. Any children of a code should be indented by only one column. All nodes should be padded so that the columns for labels and descriptions start in the same columns.

Example:
A, ,
 ,1,
 ,2,
B, ,
 ,3,
 ,4,

Column options for importing multilingual categories

Prefix-embedded Language

Order: (Label,Description)+
Notes: As many languages as needed can be be repeated within the column as long as they have unique language codes.

Example: en-au;Chocolate,en-au;Confectionery based on the seed of the cacao plant,fr;Chocolat,fr; Confiseries à base de la graine de la plante de cacao

Pre-defined Column

Order: (language,Label,Description)+
Notes: As many languages as needed can be be repeated within the column as long as they have unique language codes.

Example: en-au,Strawberries,Tasty fruit that isn't a true berry,fr,Frasie,Fruits savoureux qui n'est pas une baie vrai

Monolingual

Order: (Label,Description)
Notes: When only importing a single language that isn’t expressed in the CSV a default language will need to be given when invoking the converter.

Example: Vegemite,A yeast extract spread only edible by people from Australia. No other translations exist because no one else can stand it.

Can this tool support tab-separated files?

Yes. In the wizard users will be given the opportunity to select from a range of delimiter options or enter their own delimiting character. When using this module in other code, it will also support any delimiter as long it is specified when calling the module.

How should a developer write CSV for the converter?

With no agreed upon standard for CSVs its hard for developers to try and write ‘standard’ CSVs. To simplify development and be as lenient as possible the Virgil CSV-DDI converter using the Python CSV module[http://docs.python.org/library/csv.html]. If you are writing your own CSV writer I’d suggest testing it against this module to make sure it works.

In a nut shell though – leading and trailing whitespace is trimmed and any entry that contains a comma (or specified delimiter) should be quoted with double (“) or single (‘) quote marks.

What will the wizard and standalone converter look like?

Something like this:

PyQT Mockup of the CSV/DDI ConverterClick for bigger…

Virgil UI – Announcement and Pre-alpha demonstration

When I’m not writing about writing code, I occasionally get to hop into a terminal and tear out a few lines of code. While Ramona was a bit of a bust that needs to revisit the drawing board before its ready to leave the nest, Virgil has taken off. Virgil is something I’ve been doing in-between other tasks with the sole purpose of allowing users to edit and manage CodeLists managed in DDI. This is based on work I did mid-last year to turn DDI Code and Category Schemes into interactive webpages. To support this I’ve been working on a tool to allow users to properly edit Codelists in DDI.

A CodeList is a combination of two DDI objects, a CodeScheme and a CategoryScheme and enables users to manage complex hierarchies of coded information, as small as codifiying “Yes/No” responses to managing large industrial classifications.

To demonstrate how this may be done, I’ve uploaded a screencast of Virgil-UI in action opening a DDI version of the coded hierarchy from the Australian and New Zealand Standard Industrial Classification (ANZSIC) editing and saving the file.


The video demonstration is available on youtube – here.

The video got downscaled when it was uploaded (pressing the expand button helps) but for those having trouble understanding whats in the video, the features demo’d in the video are:

  • Open the ANZLIC DDI File in the Vim text editor and searching for the term “LOOK HERE”. This search term isn’t in the file… yet
  • Virgil-UI is run and the same file is loaded
  • Data from the DDI File for a Category is loaded and is displayed in English and German
  • The term “LOOK HERE” is added to the description of a category and the file is saved
  • The file is then reloaded in the Vim text editor and the term “LOOK HERE” searched for
  • The search term “LOOK HERE” is found

When ready (hopefully mid-August for open-beta) Virgil-UI will be released under an free open-source licence and will support the following features – ** Indicates a feature that is fully or partially implemented already
** Complete multilingual support, for both the UI and multilingual DDI files.
** DDI3.x file support
** Full rich-text editing for DDI Descriptions and Labels
** Support for Windows, Mac and Linux
* Export support for Virgil-Web an existing tool for generating Web-pages from DDI CodeLists
* Import from CSV
* Drag-and-drop re-ordering of CodeLists

Planned features after the initial release include:
* DDI2.x file support
* DDI3.x support from a custom-built repository
* DDI3.x support from a Colectica repository

Twitter Sparkline Generator using Unicode

NB: This post uses examples of Unicode that may not show up in some browsers.

One of my main gripes with twitter is the ability to add only text. People often have the desire to share small snippets of data, but to no avail. The ideal idea to share data in such tiny chunks of data Edward Tufte idea of a Sparklines.

For those of you disinclined to read the wikipedia page, sparklines are “data-intense, design-simple, word-sized graphics”, designed to be entered inline with text, at similar height to help illustrate an idea.

Now I am not the first person to suggest entering sparklines in to twitter, in fact the second entry for a google search for sparkline turns up Alex Kerin’s article. However, there are two slight problems with Kerin’s implementation. Firstly, the unicode block characters he is using are not designed to be lined up, and examples that are shown on his page demonstrate this. To be fair, this isn’t his fault at all as unicode compliance isn’t 100%. The second is that a bar and a line can provide two very different perceptions: bar charts generally being used to display discrete data (or continuous data being shown as discrete) and line charts being used to show continuous data – for the record there is no good time to use a pie chart.

To this end I have created a tool for producing two different types of sparkline from an input data source – A crude line graph and a 5-figure box-plot.

Here is an example showing this are using the June 30th 2010 Perth weather data from the Bureau of Meterology, with bars delimiting 3 hour blocks:

The weather yesterday in Perth was quite cool (4.1┣▇▇|▇━━┫17.7) with a maximum of 17.7 degrees occuring around 2pm, before quickly cooling down until 3pm. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶).

Limiting this example further, restricting ourselves to the 140 characters of twitter:

Perth 30/06/10: Cool (4.1┣▇▇|▇━━┫17.7), max at 2pm, cooling to around 13°C after 3pm, steady afterwards. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶)

This is a 115 character weather report leaving 25 characters for a url to the full data. This may be for temperature only, but it shows the potential and can place 2 dataset in a twitter post with commentary.

I think the boxplots look quite good, however the tool does take a few liberties with the braille layout, relying on people to see a pair of vertical dots as a value in between the two, but it helps convey the message quite well in a limited, text-based format.

Release Day – Perl/Email/Twitter Gateway

Today is the alpha release day of PET-Gateway. A clunky interface to translate between a pair of email addresses and twitter (written in Perl, wouldn’t you guess). At the moment it has far too little documentation, but this will be corrected after I get back from the slopes.

However, the short version is:

Firstly, download the script from Google Code, and its numerous dependencies from CPAN.

Then, setup two emails, one as the twitter-server proxy and a second as the twitter-client proxy (you may already have this, like your work email thats hidden behind an oppressive  or an email you can access in an even more oppressive dictator regime).

In the poorly documented config file you at the smtp and pop3 details (IMAP coming later), and the client proxy as the “to email” and the add the server proxy username and password under email.

At this stage the pop3 and smtp servers both need to support ssl and use the same details, again this will be updated at a later stage.

Now, copy the .rc file with the same name to your home directory and run the script. It will ask to be oAuthed against Twitter using a URL and the returned PIN code. Make sure you are logged into to Twitter when you browse to the given URL.

After that you are all setup, just add a cron job with a command like:

*/2 * * * * ~/path_to_pet-gateway/pet-gateway.pl >> ~/.pet-gateway.log

to make the process run as often as you need and you are all set to tweet to your hearts content, even when you can’t access Twitter!

When you send to the server email address, it only posts the message body upto the first line break, splitting it across tweets if it has too, and also looks for attached images to upload to imgur.com. Also, if you want to reply to another tweet, reply to the email the server-proxy sends you, and the it prepends the sending users username automatically, and sends the update properly including the originating tweets ID so the tweets thread correctly.

This was kind of rushed out the door so I could still Tweet on my upcoming holiday, but there should be a fair bit of development when I get back either on this or a python rewrite that should have less dependencies and be easier to distribute and run.

If you find this useful, add a comment here, or on PET-Gateways main page and let me know, and if you find issues with my code feel free to submit a bug, path, diff or complain on the google code page.