Posts Tagged ‘ Statistics

3 quick questions to identify children in financially risky families

  1. Do you live with mommy and daddy? 
  2. Does mommy or daddy smoke?
  3. Do you eat a lot of fruit or nuts at home?

The Australian Bureau of Statistics recently released data cubes on Household Expendituregiving an incite into how Australians use their money. What is of interest is that they provide a breakdown of average weekly expenditure for a variety of products, tabulated against the number of financial stressors. The ABS has defined financial stressors are events where a household is unable to pay bills, goes without meals,

The reason this is interesting is that there are very few factors with a striking correlation to the number of financial stresses within a household as those below:

Number of indicators of financial stress experienced
Risk factors 0 1 2 3 4+
Tobacco products ($/week) 8.89 13.01 14.63 18.02 21.45
Newspapers ($/week) 3.17 2.80 2.64 1.47 1.48
Fruit and nuts  ($/week) 14.06 12.81 11.41 10.62 7.98
One parent family with dependent children (%) 1.9 4.7 8.3 9.8 19.3
All Renters (%) 18.6 28.3 34.0 39.3 54.4
Main source of income – Government pensions and allowances (%) 17.8 22.1 25.2 28.3 52.1

Note, that for newspaper and fruit and nut weekly expenditure, there is a negative correlation.

The problem with some of these metrics though is while they are all strongly correlated with financial stress, they may not all be obvious to children. For example, a child may not know about their parents income or if they live in a rental property, or certain actions may not be hidden from the child, such as a parent buying a newspaper on the way to work.

Other metrics however, are more obvious for children to notice and report. Such who they live with, obvious activities of their parents,  (like smoking) and their own diets. This leads us to the three questions listed above:

  1. Do you live with mommy and daddy?
  2. Does mommy or daddy smoke?
  3. Do you eat a lot of fruit or nuts at home?
Now while these may not account for same-sex couples, a child in a two parent same sex household would, given sufficient prompting, probably indicate they had two parents. Furthermore, this is based on aggregate information, however there is a good chance unit records may back these correlations up. Lastly, this is looking at correlation for risk factors, and cannot be used to suggest causation. Together however, these three questions can quickly give a strong indication of the risk of financial stress within a child’s household.

Using metadata within statistical software

Today is the release of a beta of a package I am writing for the R statistical package to make it easier for researchers to utilise metadata within R and to make it more worthwhile for statisticians to provide metadata.

Most of the methods for R to import data rely solely on the importing of undocumented data, in fact one of the most common ways to import data is through raw CSVs. However, with the release of DSPL.R it is now possible to browse the metadata of a dataset within a statistical package.

For example, the following output is example output from the US Retail Sales dataset provided by Google:

> print (prep.dspl("~/example/census-retail-sales.zip"))
DSPL Dataset - For more info see: [www.kidstrythisathome.com/dspl.r]
------------                  or: [code.google.com/apis/publicdata/]

Name : Retail Sales in the U.S.
Description : Monthly Retail Trade and Food Services report
            for the United States. This dataset was prepared by Google based
            on data downloaded from the U.S. Census Bureau.
Concepts : 3  -  Type of business, Seasonality, Retail Sales Volume
Slices   : 1  -  retail_sales_business
Tables   : 3  -  businesses, seasonalities, retail_sales_business_tbl
Topics   : 3  -  Industry, Business, Gender

As this example shows, a user is able to load in a new dataset, and get an immediate sense for what the dataset contains. By being able to allow a user to be able to understand the meaning behind a dataset, without having to leave the statistical environment, users are able to seamlessly work with their data and metadata within the same interface.

While DSPL is seen as a newcomer to the statistical world, and the R is perceived(albeit wrongly) to be inferior to more established commercial statistical tools, the agility of R and the brevity of the DSPL standard act as a strong indicator of how, given time statistical metadata could become an integral part of the all statistical processes.

Those who survived count themselves lucky, but they will not count themselves.

The following is a letter I wrote, with the intention to forward it to the heads of Statistics New Zealand and those behind the decision to cancel the 2011 New Zealand Census, however I think it rings true for all concerned about the future of the Census in all nations.

————————–

The earthquakes that recently devastated Christchurch were grave, terrible events and while New Zealand and its close neighbour Australia mourn for the loss of life, I implore you: halting the census will not ease this grief. In fact, it will have a much graver impact on the statistical world, outside our small Pacific community.

I do not need to educate any of the intended recipients of this letter about the role of the census, but for completion I feel it is necessary to remind all how vital the census is to the community.

Each census is a fully comprehensive snapshot of the entire population

The data from a census provide nations with the ability to better understand their past and present and dictate their future. Each census is a fully comprehensive snapshot of the entire population; by comparing them, we reflect on how nations change over time. Studying how demographics evolve, we as a society are able to create public policy that ensures all citizens are represented and provided the resources they need to thrive. Census information helps to create fair electoral boundaries, encourages the creation of schools, hospitals and roads where they are needed by the community, and even assists medical research. There is a reason censuses have been going on for thousands of years, and why the establishment of a census bureau is one of the first actions of a just and educated nation.

citizens are forgetting how the census benefits them more than it benefits the state.

While the role of these agencies may have been clear at first, over time citizens are forgetting how the census benefits them more than it benefits the state. Both national censuses and the agencies that run them are already under threat from many parties. There are those who seek to cripple the census, either to superficially save tax dollars or alter the demographic information to better skew their positions. The government of the Right Honourable Stephen Harper, Prime Minister of Canada, recently made changes to the 2011 Canadian Census to make portions of the census voluntary. This move was surrounded by controversy and outcry from the statistics community. Much of this controversy focused on the methodological issues a voluntary ‘census’ would have on the data and how this would impact the community, as well as acting as a direct challenge by the conservative government to the agency’s independence.  Ultimately, this led to the resignation of Mr. Munir Sheikh, then Chief Statistician of Canada, as an act of protest regarding the autonomy of Statistics Canada.

The actions of those who wish to harm our profession are not confined to Canada. In the United States of America, the 2010 Census came under fire from political conservatives over its relevance, including calls from influential political icons to call for boycotts of the census. These critiques ultimately ended in the removal of the “long form”, which may lead to immeasurable problems around the reduced data quality of the census. Likewise, in the past decade, the Office of National Statistics in the United Kingdom has faced similar issues over its autonomy and independence from government, leading community leaders to question its impartiality as a statistics provider.

… the census is no less vital today than it was in 1911 … and it will still be relevant in 2051

The cancelling of the 2011 New Zealand Census will be the first cancelled census by New Zealand since World War Two, and joins a very short list of cancelled censuses by a Commonwealth nation in the same time. While the act of cancelling the 2011 census is an act of compassion, we run the risk of those who wish to call for an end to the census to use this tragedy as an example of why census is no longer a required instrument of a just and democratic society. I assure you that the census is no less vital today than it was in 1911 when the Australian public was first counted, and it will still be relevant in 2051 when New Zealand marks its 200 years of census.

The statistics bureaus of Australia and New Zealand hold a special position amongst Commonwealth and anglophone countries, as two of the most autonomous official statistical agencies, but this comes at a price; we should look upon ourselves as role models in the statistical community and go as far as it takes to ensure our independence and relevance, but above all the unquestionable quality of our data. While their are deep emotional reasons for cancelling of the census, based purely for the respect of those most hurt by the tragic events of the Christchurch earthquakes, sometimes we must look at the rational reasons to push ahead with painful policies.

Right now Christchurch is in mourning and recovery, but soon it will be time to rebuild your great city. The reconstruction works will employ many of those put out of work by the disasters due to loss of lives, homes or business, but will not employ them all. By running the census you are able to employ additional thousands to prepare for its great counting, thousands who may not have otherwise been employed. These will be temporary positions that will allow people to immediately begin rebuilding their lives and city. Rather than just receiving government benefits to help the rebuilding, you will be employing people in stable short-term work to allow them to begin the road to normal lifes again, work that will benefit them and their country.

News reports are already indicating that Statistics New Zealand will honour its contacts and agreements with those employed throughout the census. If the decision has been made to spend this money, then spend it helping your fellow countrymen. Not only will you be employing those who may have been jobless, but you are also effectively helping to push money into a damaged economy. Each census worker will need shelter, food, clothes—all things that can be bought within their town, pushing more funds into the area, and speeding the recovery.

In peace, as in war, ANZACs should stand shoulder to shoulder, ready to help each other

With the Australian people also due to run a census in 2011, the Australian Bureau of Statistics is already in preparations for what is said to be “Australia’s largest peacetime operation”. In peace, as in war, ANZACs should stand shoulder to shoulder, ready to help each other regardless of our mission or the circumstances. Australia as a nation have already given much to help our Tasman neighbours, and we should stand ready to give more. With infrastructure in place for several weeks prior to the additional census, and with so many standards designed in collaboration between our nations, Australia is more than ready to help you perform your civil duties.

Lastly, as with the census in Australia being so close to the flooding in Brisbane, running a census in the shadow of a grave disaster such as the one that befell you, will help us measure the damage that has occurred and will improve future estimates of damage. Performing this census will allow communities to understand better the damage that can befall them, and to better prepare them for the worse.

In a census, all are equal, the rich or the poor, the young or the old, every minority enumerated for all to see.

Honourable readers, Prime Ministers, Commonwealth Statisticians and fellow citizens, the census is not just an exercise in counting, it is the snapshot of a nation and the prime way that we as a nation can better understand ourselves and our place in the world and can’t be replaced by a mere sample. It is one of the most noble acts of a society as we make every citizen count. In a census, all are equal, the rich or the poor, the young or the old, every minority enumerated for all to see. It is the candle in the darkness that shines a light on society and exposes our weaknesses and triumphs. But there are those who wish to extinguish this flame of impartiality for their own ends, they are those who wish to hide the inequalities of life, shun the impoverished and disenfranchised and maintain the bigotry that hurts us all.

When we are besieged by such evil forces, we cannot falter, and by cancelling the New Zealand census, you give these malcontents the fuel they need to further jeopardise the ability for all statistical agencies to fulfill their vital role. If you continue with this action you harm us all, and I plead of you, continue the census. Give the census a firm date and give the people of your nation a glimmer of light and show them that not even shattered earth will stop a statistician’s resolve. Those who survived may count themselves lucky, without your enumeration their efforts will be for nought.

Examining the factors of Indigenous participation in Crime

Thats the gist of my statistics thesis thats taking up so much of my time right now.

I’m currently working with Anna Ferrante from the UWA Crime Research Centre on a project to examine some Australian Bureau of Statistics data from 2008. Currently reading my way through reams of papers on the subject and so far the answer is a resounding “we don’t quite know.” 

Currently reading my way through reams of papers on the subject and so far the answer is a resounding “we don’t quite know.”

Or more the fact that there is no real easy answer. The combination of social biases, along with substance abuse and socioeconomic inequities all contribute from the base reading of done.

The bad news from this perspective is that at the moment there appears to be no silver bullet for this hot button issue.

However, with the untouched data of over 13,000 anonymised persons at my fingertips I can only hope that this analysis proves to be helpful to this already wide body of research.

And with any luck I will get a chance to carry some of this experience to when I start my Masters in Statistics.

Twitter Sparkline Generator using Unicode

NB: This post uses examples of Unicode that may not show up in some browsers.

One of my main gripes with twitter is the ability to add only text. People often have the desire to share small snippets of data, but to no avail. The ideal idea to share data in such tiny chunks of data Edward Tufte idea of a Sparklines.

For those of you disinclined to read the wikipedia page, sparklines are “data-intense, design-simple, word-sized graphics”, designed to be entered inline with text, at similar height to help illustrate an idea.

Now I am not the first person to suggest entering sparklines in to twitter, in fact the second entry for a google search for sparkline turns up Alex Kerin’s article. However, there are two slight problems with Kerin’s implementation. Firstly, the unicode block characters he is using are not designed to be lined up, and examples that are shown on his page demonstrate this. To be fair, this isn’t his fault at all as unicode compliance isn’t 100%. The second is that a bar and a line can provide two very different perceptions: bar charts generally being used to display discrete data (or continuous data being shown as discrete) and line charts being used to show continuous data – for the record there is no good time to use a pie chart.

To this end I have created a tool for producing two different types of sparkline from an input data source – A crude line graph and a 5-figure box-plot.

Here is an example showing this are using the June 30th 2010 Perth weather data from the Bureau of Meterology, with bars delimiting 3 hour blocks:

The weather yesterday in Perth was quite cool (4.1┣▇▇|▇━━┫17.7) with a maximum of 17.7 degrees occuring around 2pm, before quickly cooling down until 3pm. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶).

Limiting this example further, restricting ourselves to the 140 characters of twitter:

Perth 30/06/10: Cool (4.1┣▇▇|▇━━┫17.7), max at 2pm, cooling to around 13°C after 3pm, steady afterwards. (⣤⣤⣀⎸⣀⣀⣀⎸⣀⣀⡤⎸⠴⠚⠛⎸⠛⠛⠙⎸⠒⠒⠒⎸⠒⠲⠶⎸⠶⠶⠶)

This is a 115 character weather report leaving 25 characters for a url to the full data. This may be for temperature only, but it shows the potential and can place 2 dataset in a twitter post with commentary.

I think the boxplots look quite good, however the tool does take a few liberties with the braille layout, relying on people to see a pair of vertical dots as a value in between the two, but it helps convey the message quite well in a limited, text-based format.

Why you are still safer in church than in the strip club.

A recent article from news.com.au has pointed out how much safer Australians are at the local strip club, compared to the church, what is fails to point out is much of the story behind how these figures were calculated and what if any implications the figures have.

The first major issue in the article (aside from the lack of attribution to the data besides the vague “latest data”) is the liberal use of the word “in”:

‘…in the state’s “places of worship” in 2008.’
‘…just as likely to be assaulted or robbed in the sanctity of a church…’
‘using marijuana in places of worship’.

Without the knowing the exact publication to verify the methodology, we will examine the latest publication on crime from the NSW Bureau of Crime Statistics and Research, “The nature of assaults recorded on licensed premises”. This report discusses crimes in or around licensed venues, and breaks data down by the crimes location relative to a licensed venue – weather it was indoors or outdoors on premises, the footpath outside or near or not near to the premises.

“Could you be mugged in the confessional? The answer may surprise you”

If we assume that the Bureau uses similar methodologies in the unreferenced report, we can also assume that crimes ‘in’ churches may refer to crimes merely near or on the grounds of churches as well as inside. Once this emotive language is neutralised (“Could you be mugged in the confessional? The answer may surprise you”) the rest of this article begins to slowly fall apart.

The next big error is the comparison:

A breakdown of the figures showed that 85 people were assaulted in places of worship, compared to 66 at an adult entertainment premises.

Yes, there were more assaults at places of worship compared to adult premises, however it says nothing of the populations these are drawn from. According to the Australian Bureau of Statistics about 53% of NSW Citizens recorded a religion in the most popular choices and has a population of about 6.5 Million people. If we assume a modest 25% of religious people regularly goes to church that gives about 750,000 who regularly attend church, meaning a little over 1 in 10,000 people were assaulted in or around church.

Comparing this with males only, to get a similar population of strip club attendees requires about 23% of males to attend strip clubs with the frequency they attend church to make the comparison made in the article valid.

Before people point out the two issues with my own analysis, yes I am aware females also frequent strip clubs or other adult venues list in the article, but for the purposes of rough estimation we are leaving them out. The other issue is my own numbers assume only religious people attend church, which will be dealt with shortly.

So aside from the fact that 23% of males wouldn’t attend adult venues as often to make the comparison valid, there are still other concerns. Now, lets start looking at why the crimes are being committed. It is not hard to assume that the majority of crimes at adult venues would be influenced by alcohol and drugs or organised crime associated with these venues. furthermore, the bouncer on the door of the strip club may steady peoples emotions a little better than the priest, although neither have quite the ability to keep people in check than the man (or woman) upstairs.

But to look at church as an area of crime is a harder matter, why would someone commit a crime there? Well, ask any teenager and they will have at least had a passing though at drinking or having sex in the most unusual places – churches, graveyards, school grounds, etc… – just to get at the man.

So rather than the father toking up behind that curtain while taking confession, its most likely someone who is there well outside of mass hours
.

Along with this, every Sunday 100s of people show up at church in the morning and leave their cars unattended in quiet suburban car parks while the more secular citizens are fast asleep. This along could account for the number of theft from cars, where as adult venues are usually in nightclub areas and, to their patrons relief, don’t have car parks directly associated with them.

Lastly, many churches offer many services for the underprivileged, and while it is painting a large population with a broad generalisation, the desperation of homelessness can drive people to crime. So while there may be incidents at churches there could be other factors in play to explain them.

In summary this article does little but regurgitate the statistics it has been given and make broad generalizations based on them without trying to understand why they have come about. So, at least for the mean time, its a safer bet to head to the pulpit and drop some coin in the collection plate, than head out on the town with a fistful of singles.