What does the well-used saying, ‘Life is a journey, not a destination’ – most often attributed to essayist and poet Ralph Waldo Emerson – have to do with data-driven decisions and the science of where?
I think it fits pretty well as a way of describing the kind of activity you engage in when you're hunting for potential in your data, or you're exploring whether your data can answer a specific question.
Geography can play a unique role in this kind of exploratory data analysis, and this is where Emerson's words resonate for me.
Today, there are many tools available to the data scientist or analyst that offer a map as a destination – typically a visualisation element that illustrates the outcome of the analysis.
A precursor to this kind of visualisation is the exploratory stage where typically tabular data sources are interrogated, joined, merged, pivoted, sorted and filtered interactively to arrive at a result that assists in faster, more confident decision-making.
Armed with that result, the person performing this data analysis will most likely then seek out the best way of visualising this result graphically to get their message across, and a map is one option.
It’s during this exploratory stage (the journey) that geography and spatial analysis can elevate the process to a new level.
Being able to join, filter and query the data using geography is at first a little daunting if you’re used to just seeing maps as the final outcome of this process. However, when you see the possibilities and realise how simple it can be, it tends to leave you wondering, ‘Why didn’t I know about that?’.
There’s more to this than just the tools and the user experience though. To be truly useful, you need to be able to make this work in your organisation with your data – both spatial and non-spatial – and use those sources of data from their point of origin in their native form: no copying, no exporting, no transformation, no loss of fidelity.
To put this into real-world context, let’s say I’m a large enterprise and I have a substantial investment in data repository’s stored in Microsoft SQL Server databases. Lots of tables in lots of databases with some of those tables having a spatial component.
I’m also in the early stages of exploring the use of SAP's HANA database, and am already accumulating lots of customer data in that repository.
I have analysts who are working with Microsoft Excel spreadsheets that have implicit geography in them. By implicit I mean data that isn’t geography right now – but it could be.
A classic example is an address, but it could be X,Y coordinates from a GPS, a road name, a postcode, a store ID – anything that could be converted into geography. Let's say this data represents competitor store information.
My analytical journey might begin with a question like, ‘Looking at customers whose spend in the last six months has trended down, are there any geographic hotspots? And if there are, are they related to the proximity of competitor stores?"
Or, it might start with a simple desire to understand what stories the data can tell with regards to an observed trend like the customers who are spending less over time.
Either way, I need to integrate data from disparate sources, filter and query it from an attribute and spatial perspective, explore different ways of visualising the results – whether that’s via a map, chart, table, or all the above – then finally share the results, and perhaps even my workflow with my intended audience.