GPS is Killing the Street Directory Industry

So I’ve noticed the most anachronistic advertising recently, that sums up so much of the attitudes of ‘old media’:

I nearly got run over by a taxi getting this photo.

The message here is that your new-fangled, feature-packed, gizmo isn’t as good as your reliable, dead tree street directory. The problem with this is, with turn-by-turn, live route-finding voice recognition, GPS is in every mobile phone and subsequently every car, and most phones allow you to search Google Maps to look for route free and online when you’ve stopped. Which basically says, street directories are at risk of extinction.

UBD are an Australian manufacturer of street directories, and have been doing so for as long as I can remember. However, even despite UBD providing digital versions of their maps for use on CDs and DVDs they have very few ties to “In-car navigation devices”, and in their ads go so far as to try and sever these ties.

So why with all the efficiencies that GPS brings, and brand identity that UBD has did they never enter the GPS market, even to the extent to just attach their brand to devices for sale in Australia?

This question is moot, but serves to highlight the issues that a lot of companies do go through as the online space grows. As technology changes it will continue to enter more and more spaces, replacing books with fully browsable texts, stores with online distribution, the key is not to fight this but to ensure companies take what brand equity they had in their outgoing products and  use that to position themselves in the new markets that technology will unabatingly bring. Once people are used to the conveniences new technology brings, they will infrequently give them up, and fighting those tendencies will only serve to turn the public away, take the business models of the RIAA and MPAA for example.

Otherwise, you end up being like the company that were complaining that their business that did so well selling hay to stagecoaches needs a bailout to survive.

Examining the factors of Indigenous participation in Crime

Thats the gist of my statistics thesis thats taking up so much of my time right now.

I’m currently working with Anna Ferrante from the UWA Crime Research Centre on a project to examine some Australian Bureau of Statistics data from 2008. Currently reading my way through reams of papers on the subject and so far the answer is a resounding “we don’t quite know.” 

Currently reading my way through reams of papers on the subject and so far the answer is a resounding “we don’t quite know.”

Or more the fact that there is no real easy answer. The combination of social biases, along with substance abuse and socioeconomic inequities all contribute from the base reading of done.

The bad news from this perspective is that at the moment there appears to be no silver bullet for this hot button issue.

However, with the untouched data of over 13,000 anonymised persons at my fingertips I can only hope that this analysis proves to be helpful to this already wide body of research.

And with any luck I will get a chance to carry some of this experience to when I start my Masters in Statistics.

Twitter Sparkline Generator using Unicode

NB: This post uses examples of Unicode that may not show up in some browsers.

One of my main gripes with twitter is the ability to add only text. People often have the desire to share small snippets of data, but to no avail. The ideal idea to share data in such tiny chunks of data Edward Tufte idea of a Sparklines.

For those of you disinclined to read the wikipedia page, sparklines are “data-intense, design-simple, word-sized graphics”, designed to be entered inline with text, at similar height to help illustrate an idea.

Now I am not the first person to suggest entering sparklines in to twitter, in fact the second entry for a google search for sparkline turns up Alex Kerin’s article. However, there are two slight problems with Kerin’s implementation. Firstly, the unicode block characters he is using are not designed to be lined up, and examples that are shown on his page demonstrate this. To be fair, this isn’t his fault at all as unicode compliance isn’t 100%. The second is that a bar and a line can provide two very different perceptions: bar charts generally being used to display discrete data (or continuous data being shown as discrete) and line charts being used to show continuous data – for the record there is no good time to use a pie chart.

To this end I have created a tool for producing two different types of sparkline from an input data source – A crude line graph and a 5-figure box-plot.

Here is an example showing this are using the June 30th 2010 Perth weather data from the Bureau of Meterology, with bars delimiting 3 hour blocks:

The weather yesterday in Perth was quite cool (4.1┣▇▇|▇━━┫17.7) with a maximum of 17.7 degrees occuring around 2pm, before quickly cooling down until 3pm. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶).

Limiting this example further, restricting ourselves to the 140 characters of twitter:

Perth 30/06/10: Cool (4.1┣▇▇|▇━━┫17.7), max at 2pm, cooling to around 13°C after 3pm, steady afterwards. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶)

This is a 115 character weather report leaving 25 characters for a url to the full data. This may be for temperature only, but it shows the potential and can place 2 dataset in a twitter post with commentary.

I think the boxplots look quite good, however the tool does take a few liberties with the braille layout, relying on people to see a pair of vertical dots as a value in between the two, but it helps convey the message quite well in a limited, text-based format.

Thredbo Trip Report

For those of you not following my twitter feed, we got back from the snow yesterday. The weather was great, with snow on the first few days we were there, and then warming up over the week. It was kind of icy on the last two days, which contributed the majority of falls, almost exclusively by me. On Friday I had a bad fall and thought I had done some serious damage to my shoulder, but after a few days in a sling it was ship-shape again.

While we were over in Canberra, wife and I had a chance to look at some real estate in the Belconnen area for when we move over as part of the 2011 Graduate program with the Australian Bureau of Statistics.

For now, its back to work, and study, and code, etc… This semester should be a little lighter on my work load, especially with the wedding out of the way. So aside from uni/work code I should be able to continue active development on some of my personal projects. Namely, the DDI Classification Viewer and the Perl-Email-Twitter Gateway script. The latter of which got a serious test during the holiday, and may be rewritten in Python in the coming weeks.

Me on the snow

A victorious pose after shredding down the mountain!

Release Day – Perl/Email/Twitter Gateway

Today is the alpha release day of PET-Gateway. A clunky interface to translate between a pair of email addresses and twitter (written in Perl, wouldn’t you guess). At the moment it has far too little documentation, but this will be corrected after I get back from the slopes.

However, the short version is:

Firstly, download the script from Google Code, and its numerous dependencies from CPAN.

Then, setup two emails, one as the twitter-server proxy and a second as the twitter-client proxy (you may already have this, like your work email thats hidden behind an oppressive  or an email you can access in an even more oppressive dictator regime).

In the poorly documented config file you at the smtp and pop3 details (IMAP coming later), and the client proxy as the “to email” and the add the server proxy username and password under email.

At this stage the pop3 and smtp servers both need to support ssl and use the same details, again this will be updated at a later stage.

Now, copy the .rc file with the same name to your home directory and run the script. It will ask to be oAuthed against Twitter using a URL and the returned PIN code. Make sure you are logged into to Twitter when you browse to the given URL.

After that you are all setup, just add a cron job with a command like:

*/2 * * * * ~/path_to_pet-gateway/pet-gateway.pl >> ~/.pet-gateway.log

to make the process run as often as you need and you are all set to tweet to your hearts content, even when you can’t access Twitter!

When you send to the server email address, it only posts the message body upto the first line break, splitting it across tweets if it has too, and also looks for attached images to upload to imgur.com. Also, if you want to reply to another tweet, reply to the email the server-proxy sends you, and the it prepends the sending users username automatically, and sends the update properly including the originating tweets ID so the tweets thread correctly.

This was kind of rushed out the door so I could still Tweet on my upcoming holiday, but there should be a fair bit of development when I get back either on this or a python rewrite that should have less dependencies and be easier to distribute and run.

If you find this useful, add a comment here, or on PET-Gateways main page and let me know, and if you find issues with my code feel free to submit a bug, path, diff or complain on the google code page.

Hating <table>s considered harmful

Apologies to Dijkstra for butchering that quote again , but the rage against <table>s is getting out of hand. Back in the dark ages, when it was impossible to get consistent HTML rendering across browsers and platforms someone decided to use the <table> element , originally designed to markup tabular data, to layout webpages.

Ever since that moment <table>s have been an unfairly shunned part of the HTML spec. When CSS became a stable and supported spec many people started screaming from the roof tops “Stop using <table> it ruins webpages”, and too many people have taken this to heart, so much that any time tabular data needs to be presented bizarre alternatives are used.

This was an example of a bad table of data that caught my eye a while ago:

See the "pre" section below

A screenshot for posterity

Its supposed to give an idea of the cumulative revenue of iPhone Apps over time, but its hard to tell that as there is no title. Good data tells a story, good metadata explains that story. There is no metadata here to explain any of these numbers, here is what a search engine sees when they look at the HTML on that page:

Period ending.....Period downloads.....Cumulative downloads....Period revenues
Jun 2008............no apps...................no apps........................no revenues
Dec 2008.............600 M......................600 M..........................$ 172 M
Jun 2009..............800 M....................1.4 B.............................$  228 M
Dec 2009..........1.6 B.........................3.0 B............................$  458 M
Jun 2010...........2.0 B.........................5.0 B............................$  542 M
Total.................5.0 B.........................5.0 B............................$1.4 B

Accessibility and search robots aside, here is no context to any of this, the units change between rows, the number of dots changes each time. Its hard to say what this data is without purposefully reading the text around it. I could drone on about the lack of metadata describing the data in this table, but instead I’ll counter this with a better example.

Here is the same table, rewritten as an actual <table>, using only what’s defined in the HTML4 and above W3C specifications:

A few things instantly stand out: the columns all match up nicely, both in display and in the table-model; each of the years is clearly marked; the table is relatively self-explanatory; and lastly it appears as a table to anyone who reads it. Furthermore, all of the formatting is done using CSS; thats right the presentation is left to the CSS and the descriptive markup is done using a <table>, just how it is intended to be done. But even this is only a fraction of the possible metadata to place into a HTML table.

However, this is not the worse example of a ‘table’ I have seen, the one that sent me over the edge was this large image that highlights the relative pros and cons of different smart phones:

A very large table comparing different smartphones

This 'table' is too big to show inline, it needed to be reduced to a third of its size to fit

The main issue with this, is this image of a table is mostly text, is quite large and static. You cannot easily copy text out of this, you can’t easily reorder the table and unless you have a large screen it would be quite difficult to compare two products that weren’t both very close to each other. Unlike the table of iPhone revenues, I am not going to go to the effort of transcribing this into a proper HTML table.

Compare this with a similar table from wikipedia also providing a comparison of smartphones and the differences are obvious. First of all, the table is now web-crawlable – whatever data is in this table is now indexed in Google, instant bonus, and the user can easily search through the page to find what they needed to know. There is also a whole lot of Javascript on the page. For example, clicking the boxes next the each of the column titles reorders the whole table around that column. In the first table this isn’t too noticeable with most entries having either ‘Yes’ or ‘No’, but the Hardware and OS table from further down the same page is full of figures, and now I can easily find the lightest phone (it’s the HP iPAQ Voice Messenger, something that would be much harder to find out on the image ‘table’).

Furthermore there are plenty of ways that you can easily add functionality to tables, to reoorder, hide, expand or change to add plenty of use to your tables.
The web is supposed to be full of life and allow use to use our data in exciting new ways, so don’t stick with static images, or unstructured text for your tables when there is a much more useful alternative out there.

How little can one blog before they are no-longer a ‘blogger’?

I really should keep on top of this blog better, I have 3 half finished drafts of posts that I still haven’t gotten around to finishing, mostly for silly reasons like lack of time or sleep or appropriate photos to go with them.

  • Updates in my life since the last post:
  • Exams are over,
  • Am now married,
  • Off to Thredbo in a few days for a week of snowboarding,
  • Have a major piece of work (for work) hosted and ready to demo when I get to Canberra,
  • Rewriting a perl/email/twitter gateway in python so its easier to distribute.

Fortunately, I will never have to plan a wedding again (which can be a massive time-sink (but worth it)), so I should be able to keep on top of this when I get back from the snow.

Cornell Trip Report Days 3-5

So, I’m sitting in the Sydney Qantas Business Lounge, with plenty of time until my connecting flight home, so I’ll recap the last few days.

From what people had to say my presentation went quite well. The script I worked from is now available online, and the slides will soon follow. As soon as I have updated the slides with the transcript I will upload them and update this post.

There were however plenty of excellent talks at the IASSIST Conference this year. Due to the theme there was an abundance of social networking talks, including the keynote speaker on Friday, and they all presented excellent angles on the same question: “How can we get more people interacting with our data better?”

I think this was a great take on where data and metadata agencies are heading, as as I have said before, due to the fact that the more we have people interacting with data agencies, and using the data and metadata they provide, the more relevant those agencies become. The fact there are so many agencies asking this question and working on their own solutions and are willing to share those ideas is a great step forward for open data.

Cornell Trip Report – Day 1 & 2

So in the midst of preparing for Cornell, Uni work and wedding planning I haven’t had time to update, but taking a breather in a beautiful hotel room is a perfect time to get back to blogging.

I arrived in Ithaca yesterday, the trip up way gorgeous. I would say pun intended, but I didn’t know Ithaca was famous for it gorges until I got here. Spend the afternoon with folks from the DDI Alliance Expert Committee. That was great, didn’t say much, but listening to some extraordinarily smart people and having them do the same to me was honouring. Sadly after dinner I had to retire with a killer fog of jetlag and nearly immediately fell asleep.

This morning was much more eventful, the lovely staff at the Statler Hotel were able to get me a cord to charge my camera with, and with that I set off to explore the campus and surrounds. I walked round the campus before setting off for Collegetown. As my camera only had a short charge all my shorts of Collegetown are on my phone, whose cord is absent, so until I find it or figure out how to get the shots off, no photos from there yet.

But thats ok, as the best shots were down by Casadilla Gorge. The shots were great, and by the bend down the end there is (according to locals) a little wading area. The track along the creak from there was spectacular and green after last nights rain.

The only downside is a twisted my ankle some time during the walk, but I’m all strapped up, about to head of for lunch and see Jeremy Iverson from Algenta talk about Colectica.

Why you are still safer in church than in the strip club.

A recent article from news.com.au has pointed out how much safer Australians are at the local strip club, compared to the church, what is fails to point out is much of the story behind how these figures were calculated and what if any implications the figures have.

The first major issue in the article (aside from the lack of attribution to the data besides the vague “latest data”) is the liberal use of the word “in”:

‘…in the state’s “places of worship” in 2008.’
‘…just as likely to be assaulted or robbed in the sanctity of a church…’
‘using marijuana in places of worship’.

Without the knowing the exact publication to verify the methodology, we will examine the latest publication on crime from the NSW Bureau of Crime Statistics and Research, “The nature of assaults recorded on licensed premises”. This report discusses crimes in or around licensed venues, and breaks data down by the crimes location relative to a licensed venue – weather it was indoors or outdoors on premises, the footpath outside or near or not near to the premises.

“Could you be mugged in the confessional? The answer may surprise you”

If we assume that the Bureau uses similar methodologies in the unreferenced report, we can also assume that crimes ‘in’ churches may refer to crimes merely near or on the grounds of churches as well as inside. Once this emotive language is neutralised (“Could you be mugged in the confessional? The answer may surprise you”) the rest of this article begins to slowly fall apart.

The next big error is the comparison:

A breakdown of the figures showed that 85 people were assaulted in places of worship, compared to 66 at an adult entertainment premises.

Yes, there were more assaults at places of worship compared to adult premises, however it says nothing of the populations these are drawn from. According to the Australian Bureau of Statistics about 53% of NSW Citizens recorded a religion in the most popular choices and has a population of about 6.5 Million people. If we assume a modest 25% of religious people regularly goes to church that gives about 750,000 who regularly attend church, meaning a little over 1 in 10,000 people were assaulted in or around church.

Comparing this with males only, to get a similar population of strip club attendees requires about 23% of males to attend strip clubs with the frequency they attend church to make the comparison made in the article valid.

Before people point out the two issues with my own analysis, yes I am aware females also frequent strip clubs or other adult venues list in the article, but for the purposes of rough estimation we are leaving them out. The other issue is my own numbers assume only religious people attend church, which will be dealt with shortly.

So aside from the fact that 23% of males wouldn’t attend adult venues as often to make the comparison valid, there are still other concerns. Now, lets start looking at why the crimes are being committed. It is not hard to assume that the majority of crimes at adult venues would be influenced by alcohol and drugs or organised crime associated with these venues. furthermore, the bouncer on the door of the strip club may steady peoples emotions a little better than the priest, although neither have quite the ability to keep people in check than the man (or woman) upstairs.

But to look at church as an area of crime is a harder matter, why would someone commit a crime there? Well, ask any teenager and they will have at least had a passing though at drinking or having sex in the most unusual places – churches, graveyards, school grounds, etc… – just to get at the man.

So rather than the father toking up behind that curtain while taking confession, its most likely someone who is there well outside of mass hours
.

Along with this, every Sunday 100s of people show up at church in the morning and leave their cars unattended in quiet suburban car parks while the more secular citizens are fast asleep. This along could account for the number of theft from cars, where as adult venues are usually in nightclub areas and, to their patrons relief, don’t have car parks directly associated with them.

Lastly, many churches offer many services for the underprivileged, and while it is painting a large population with a broad generalisation, the desperation of homelessness can drive people to crime. So while there may be incidents at churches there could be other factors in play to explain them.

In summary this article does little but regurgitate the statistics it has been given and make broad generalizations based on them without trying to understand why they have come about. So, at least for the mean time, its a safer bet to head to the pulpit and drop some coin in the collection plate, than head out on the town with a fistful of singles.