Business Intelligence Blog

Latest Stories

Featured Stories

Filter By Categories
Ashleigh Trinh
By
May 10, 2019

Geocoding with R

Last week we had a look at (blah blah blah), this blog will be looking at gathering longitudes and latitudes using R.


Tableau already has a brilliant mapping engine where you can map all the roles in the drop-down (left), by creating the longitude and latitude values for you. Though, at least for the UK, postcode is limited to the outward code which is the first part of the postcode. This is designated to the postal town and district in the town, but just the outward code will not give enough detail for higher granularity such as street or a house.

geog-roles

 

There are a many packages in R that utilise mapping APIs to geocode data, such as: ggmap, photon, and nominatim. There are three main APIs that are used in these packages (that I am aware of): Google, Bing and OSM (Open Street Map).

 

Part of the process was choosing which API to use. Bing and Google APIs are no longer free and will require an API key for it to work. However, I will be using OSM as it is free and accessible to everyone.

So first thing you will need is to have Rserve set up to connect to R in Tableau, if you have not done so then there are instructions in the Using R in Tableau blog. Once that is done, you will need to install the relevant package(s) in R for geocoding. I will be using the photon package, which defaults to using photon public API.

 

A little bit more information about the photon package, photon has two functions: geocode, which we will look at using in this blog; and reverse, which we will explore in next week's blog to find the location using longitude and latitudes.

 

The geocode function requires the following input values:

geocode(location, limit = NULL, key = NULL, value = NULL, lang = NULL, locbias = NULL, server = NULL, quiet = NULL)

 

This means that the only field we must have is the location though the other fields might help you get more precise figures. More information regarding the other fields can be found in their documentation.

With that set up the next step is to have some data with fields that you wish to geocode. I will be using data on a very small chain of cafes that I like (they have cardamom buns which smell divine and their cinnamon buns are super tasty). I have gathered the addresses of the cafes (shown below), I kept them split into different parts of the address rather than concatenated into one line.

 

Fabrique data

 

It may seem slightly odd to do this if we want to geocode a precise location but there is method to my madness! To start with I tested the geocoding process in R and I stumbled upon issues in the accuracy (probably because I am using OSM rather than a better maintained API). One notable quirk was that when finding the longitude and latitude for a pub was that if I put in the full address it would not find the location specified; however, when I put in the name of the pub and then the city or general location the output would be correct. I researched into the location and OSM a bit more and found that the data in OSM is mostly entered by the general public and with some locations, like that pub, which was on a corner and hence on OSM they used the adjacent street rather than the one specified on the pub's website.

 

So... unfortunately, this method requires a little bit of trial and error and some knowledge of the type of establishment you're geocoding. Open Street Map has values known as keys and values that are assigned to locations, this is very useful for getting accurate longitude and latitude values. For the data set I have, the key is 'shop' and the value is 'bakery', a more extensive list can be found on OSM Wiki.

 

After many iterations, I have found that the best way to geocode an establishment is by using it's name and either it's postcode or street name (though the latter potentially has the issue mentioned previously). For the data set above I did the following:

  • Created the field to input as the location in the function in the required format
Input
  • Then I created the calculated field which will give me the latitude via the geocode() function in R - the calculation is shown below. I used SCRIPT_REAL since the output is a real number. One important thing to remember is that in the R code that is passed to R requires loading the necessary packages in too, hence the 'library(photon)' in the script. I assigned the function to loc (the name isn't important) so that I can extract just the latitude values.
Latitude
  • The same calculation is used for the longitude, but using loc$lon to extract the longitude values instead.
  • Tableau did not recognise the calculated fields as geographic automatically in my workbook, so I manually assigned them to their respective geographic role.
geog field
  • Then to plot the points on a map in Tableau (as shown below) bring in the calculated longitude and latitude - sometimes a connection error will appear due to the address/input not being in view, so they can be ignored and when you bring in your location field, a map should appear. TADA!
map
 

Above, we have a 'finished' map - there are some errors, there are two nulls and two points appear on the map even though there are five rows of data. There are actually three marks on the map (Fitzrovia and Notting Hill chains are overlapping). Two branches (null on map) could not be geocoded - one is due to an obscure location and the other is not registered as Fabrique (Holborn branch) on OSM yet - I checked the OSM map and it just hasn't updated from the previous cafe in that location. This is one example of OSM just not being fully up-to-date yet.

 

Despite the issues with the breadth of (or lack of) data provided by OSM, the locations that are coded are very accurate compared to just using the outward postcode that Tableau can generate. Below are screenshots of the maps showing one of the cafe's location on each type of map (as labelled). As you can see, the figures calculated by R (photon) are actuall very accurate when put next to Google Maps, however just using the postcode means it's quite far from the actual location.

 

different locations-1
 

This was just a little bit of insight into geocoding in Tableau using R. It was not the most accurate and perfect iteration but if you have a larger data set, R will get a decent percentage of locations correctly and then it is just more manual for a handful of locations.

 

I also used photon to geocode all the railway stations in the UK - this worked much better since they are very well established on OSM. Of over 2500 stations (note this is better to be done in R then import the data into Tableau or it will be quite slow), only 52 could not be geocoded correctly (name mix-ups with other countries). Not a perfect solution but manually entering 2% of locations is still an improvement.

 

There it is, geocoding done! It would be interesting to be able to compare the other APIs to check for accuracy but this should be a great solution for most. Next week, we shall look at reverse geocoding.

 

If you would like to find out more or want bespoke training on using R in Tableau please contact us.

Contact Us

 

Subscribe Email