2015-07-23

Visualising monthly visits data -- case: Espoo swimming pool visits

I was browsing Helsinki Region Infoshare pages when a data of 'Swimming pool visits in Espoo' caught my eye. One spreadsheet of data, less optimally organized -- should I bother? Knowing a bit of local context of course helps. One of the swimming pools of Espoo was closed for renovation some time ago, and I was wondering did our closest swimming pool capture all those visitors too -- it seems to be so busy!

So, I massaged the data into a more usable format, a tall table, where there are only 3 columns (later renamed 'swimmers' to 'visits' ) and a date is a date rather than combination of month name and year in different columns:

Then read that into Tableau Public, and started exploring the data (dashboards)

First tab: Line charts

Line charts are quite good for example for comparing how similar (or not!) past years have been (Total visits). Also, I think the power of small multiple charts is very visible in the Visits by place chart: when the scales of each chart are the same, it is easy to check those only once. Then the effort can be put on studying the actual matter ie how swimming pools and/or years compare.



It could be argued that it is not necessary to have a different colour for each place ie swimming pool as the columns already have the names. Frankly, I just though it looks nicer that way :-)

Anyway, you can spot many things from the above charts. For example, visits drop a lot during summer months. I happen to know that the swimming pools are closed for some period of time each, not necessarily for a full calendar month. But would there be visitors also during summer if the pools were kept open? At least in Tapiola the visits look quite steady on 2011, from April to August.

Second tab: bar charts

Leppävaara swimming pool was closed for renovation on summer 2013. Could the data tell what happen to the visitors that used to go there?


Left side shows the actual visits as stacked bar chart and the right side shows the share of visits per pool, summing up to 100% for each month. Leppävaara's share of visits has been about 15% of total month visits (excluding summer months), but after May 2013 visits went to zero. Looking at second half of 2013, from perhaps September onwards, it looks the shares are roughly equal between the remaining pools (I excluded Olari as it is so small). However, looking at total visits, perhaps the visits declined?

Third tab: closer look

As the seasonality is so strong, I decided to compare the 12 months before close of Leppävaara swimming pool to the next 12 months.



Top row of  charts shows how in total there were 120 000 visits less after the closing. So not all of Leppävaara's visits were lost (160 000), but a huge number still. Keski-Espoo and Tapiola number of visits were increased (+30 000, +17 000).

Bottom row shows with more detail how the monthly visits were spread between pools before and after Leppävaara closing. The included averages of monthly shares show how Espoonlahti has the highest share of visits, even if the total number of visits decreased.

Leppävaara pool is planned to be opened early 2016, with larger capacity and an outdoor pool for summer. Hopefully the 'lost' swimmers and many more do then return!






2015-07-21

How to explore Azure Marketplace data with Tableau (Public)?

Microsoft Azure Marketplace has many free data sets. One way to get an idea of a data set is to explore it with Tableau, of which there is a free app: Tableau Public. There is a couple of steps to get from a data set to exploration and then possibly to a dashboard, and I'll list them below.

However, first, here is an example of Azure data exploration as a publicly shared dashboard. Data is UNESCO Statistics data and reason for doing this viz was to understand a bit more what the data is. The main data set, 'Values', has no descriptions of what has been measured but only a reasonably cryptic 'data series id'. By selecting one data series id using radio buttons on the right, you can see at a glance
  • name or description of the measure
  • details of data in the data table
  • quick overview on how global the measure in the map
  • range and distribution of values in the histogram

So, the steps required:
  1. Use you Microsoft (former live ID) account to log in to Azure
  2. You also need a marketplace account, which is easy to create if you don't already have one
  3. Find an interesting data set browsing in here:
    (note you can filter only free data sets from the selections at left)
     Microsoft Azure Marketplace
  4. and when you find one, subscribe to the data using 'SIGN UP'
  5. Only after subscribing you'll see the 'Explore this dataset' link. Click it to see data fields and values in a separate 'data viewer'
  6. You'll see the main URL at the top of the data viewer, copy it:
    https://api.datamarket.azure.com/UnitedNations/UNESCO/v1/DataSeries
  7. Assuming you have already downloaded, installed and launched the Tableau Public app,
    make a data connection to the data. Select 'OData' and paste the URL you copied as the 'server'.
  8. Your Azure account key is needed to access the data. Copy it directly from the top of the Azure data viewer by clicking 'Primary Account Key' and paste it to the 'Account Key' box. The key can also be found from the 'My account' page.
  9. Now you'll see the fields of the data set. To see the content of the data, click 'New worksheet' icon at the bottom. Tableau will extract your data, so the more there is, the longer it will take. In this example I did not filter the data, but as there are only 174 rows, it does not take long.
And now we have Azure data nicely in Tableau. If there was only one table needed, this is where exploring with Tableau could start. However, in this data source, the data is actually organized into two tables, and this 'dataseries' one has only a bit more descriptive names for the actual data series. To get the actual measures, a new data connection in Tableau is needed.
  1. In Azure data viewer, click 'Values' and notice that there is now more than 300k of data. It may be a good idea to filter it to limit the time taking to extract the data, unless you'll need it all.
  2. In my example I filtered the data by setting 'TimePeriod' 'greater than or equal to' 2005. You'll see the updated number of rows in the viewer.
  3. Once happy, copy the query including the filter:
    https://api.datamarket.azure.com/UnitedNations/UNESCO/v1/Values?$filter=TimePeriod%20ge%202005
  4. In Tableau, select 'Data' -> 'New Data Source' -> 'OData' and paste both the new query as well as your account key.
  5. Clicking the 'New worksheet' icon at the bottom starts the data extractions process, which will take somewhat longer this time.
  6. Once done, you'll see two data sources in the Tableau sheet. Next you'll need to connect them:
    Select 'Data' -> 'Edit Relationships' . Make sure the 2nd data source, 'Values', is the primary and then the 1st data source, 'DataSeries' is the secondary.
  7. The common key between the data sets is the 'data series id'. However, the name of the field is different in the data sources. Hence, select 'custom' and 'add'. From the primary, select 'Data Series Id' and from the secondary select 'Id'. Close by clicking 'ok' and 'ok'.


Now you can start playing around with the data in Tableau. There are really good videos from which you can learn how to use Tableau (Public), so here are just some brief instructions.
Use the primary data source 'Values', here marked with red arrow in below picture, and pull fields from the left to the shelves in the right.

After studying the data a bit, it becomes clear that some filters are needed. For example, pull the 'Unit' dimension to the 'filter' shelf and set it to 'percent'. Now all observation values should be percentages rather than counts. Also, limit 'age group' to 'non applicable', 'sex' to 'all genders' and select only one 'data series id' to study one measure at a time.

To use the 'Name' field from the secondary data set (DataSeries), always include the key, 'Data Series Id' from the primary data set to the chart: Pull 'Data Series Id' at least to the 'Details' box (highlighted with yellow), then you are able to include the 'Name' field from the secondary data source for example to the top line of the chart:


To make a map, first create a new worksheet from the bottom. Then right-click the 'Reference Area' field -> 'Geographic Role' -> 'Country / Region'. Then, pull 'Reference Area' to the 'Detail' box and you'll get a dot per country. From 'Show me' menu at top right, select map with colored areas. Add a measure such as sum of  'Observation value' to the 'color' box to see a quickly the data on a map!

If you have used the free Tableau Public app, you cannot save your workbook but in the Tableau Public server. It will be visible to all, but I think it can be saved as 'hidden'. However, if your main point was just to understand more of the data to use it within Azure etc, then you may not even wish to save the workbook but you have learned enough of the data to continue with other tools.

So, maybe there were more than just a couple of steps! :-) But honestly, I think Tableau is one of the best tools to explore and understand a new data set because it is easy to use. Also, after having done the data connections a couple of times, it is very quick too.