Thursday, 18 September 2014

Big Data and Location – A real or imagined new frontier?

In the run up to this month's Geo Big 5 Big Data event (30th Sep, IBM, London) Andy Coote reports on some of the insights gained from speaking and listening to some of the foremost experts in the field and ponders the place of location in Big Data. A glossary of Big Data terminology is also provided.

Big Data, why should I care?

In their recent report on Big Data, McKinsey[1] suggest it is becoming a key battlefield of competitive advantage, underpinning new waves of productivity growth, innovation, and consumer behaviour. One of the key application areas they highlight is geo-centric - personal navigation data. They assess the application of such data as being worth $800bn worldwide during the current decade. Even if McKinsey are an order of magnitude too high in this forecast, it is still a staggeringly large potential market for the location industry. 
Mobile devices, earth observation satellites and the Internet of Things are just a few sources contributing to creating the world of Big Data. But it is about more than just Volume. Big Data also describes data sets with a high Velocity of change (such as real time data streams), and with a wide Variety of data types - collectively known as the three V's[2].
This combination makes processing and analysis difficult using conventional tools. In particular, the volume and mix of structured and unstructured data, is a challenge for object-relational database management systems (such as Oracle and SQL*server) that most organisations currently use to underpin their data management. Here the major disruptive technology has been Hadoop, employed by search engines to produce the almost instant query response we have all come to expect from Google et al.
The huge additional business value to be derived from Big Data comes from what Accenture describe[3] as finding new insights. These might include identification of financial fraud, increasing retail sales or sources of inefficiency in Government. None of these are new, but the science of what is often termed predictive analytics in Big Data circles, is introducing new tools and techniques which rely heavily on what we might have previously called spatial analysis and 4D visualisation.


According to John Morton, until recently with SAS but now an independent Big Data consultant, location figures in a wide range of applications because of its ability to reveal new information patterns and present information to senior executives visually.
Some real examples were showcased at the recent Strata 14 conference on Big Data in San Francisco including:
Transport – Ian Huston, Data Scientist at Pivotal, sees Big Data analytics as a way to bring techniques from other disciplines, such as change point detection used in the wind turbine industry and cell population analysis from biology to complex problems of traffic management[4].
Retail – Susan Ethlinger, Altimeter Group, described as an example the use of location to identify problems in the supply chain of steak restaurants to illustrate deriving actionable intelligence from existing social and enterprise information sources[5].
Security – Ari Gescher, Palantir, presented “Adaptive Adversaries: Systems to stop fraud and cyber intruders”, where he described the use of geocoding of servers through IP addresses and various other “location assets” to provide intelligence to banks. 
Health – genomics, the science of gene sequencing which involves very complex calculations on very large datasets takes centre stage in this sector. However, the medical insurers, such as Kaiser Permanente in the United States are also making heavy use of tools such as ArcGIS as part of their Big Data strategy.

Location in Big Data Platforms

Different suppliers appear to have different views on the potential for location analytics in Big Data solutions.
SAP have taken the decision to embed Esri technology into the core of their product, which they believe will enable their users to more simply leverage geospatial tools as part of the HANA in-memory computing platform.

In contrast, Steve Jones, Cap Gemini, (partners with Pivotal in the Big Data space), believes the dominant approach will see designers building location analytics for their platforms as they find it useful. According to Jones, Big Data analytics will borrow the algorithms of GIS via good developers but will not try to “shoehorn” existing products into their architectures. 
Another aspect of the Big Data debate was outlined by Steve Hagen of Oracle. Speaking recently at a UN GGIM meeting, he suggested that real time feeds of location data are simply so huge that they are unmanageable in raw form and that filtering at source before loading into databases is the only viable solution. It seems to me however, that although deciding what to keep requires skills which geospatial practitioners unique possess, it does pre-suppose you know in advance what insights you might find.

Big Data & Location - Geo Big 5

So much energy is being pumped into the Big Data story, it won’t go away. Even if it is simply a rebranding of concepts that have existed for a long time such as business intelligence. Why is it important to the location market? Because it is potentially a huge opportunity - well over 50 % of the presentations at the Strata conference used geo-centric use cases to demonstrate their solutions or ideas. Furthermore, there seemed to be a general under-estimation of the richness of insight that location analytics (what we used to call spatial analysis) could bring to the party.

If you’d like to understand more about what Big Data means for the location industry, the AGI is organising an event on Tuesday 30th September in London titled simply “Big Data and Location”.  Hosted at the prestigious IBM Centre on the South Bank, it will bring together the main players from the Big Data and Geospatial worlds to explain technical concepts and showcase real applications.  For more information go to the AGI website

Andy Coote is Chief Executive at location consulting specialists ConsultingWhere
Email:, Twitter: @acoote

Glossary of Technical Terms:

  • Hadoop - is a database file system for storage and large-scale processing of data-sets on clusters of commodity processors.  The concept relies upon storing data items multiple times across different processors/disks for resilience and fast retrieval. Originally developed in 2005 by two of Yahoo’s engineers it underpins most of the search engines, Facebook, and many of th

  • Mapreduce – is the programming framework that enables fast retrieval of data from Hadoop clusters. Originally developed by Google, it is based on algorithms that schedule and handle parallel communications necessary to make that retrieval fast and reliable. Put another way, it supports massive multi-threading of processes.

  • NoSQL – is a term used to refer to the storage and retrieval of data which does not rely on SQL and the relational model of storage, of which Hadoop is typical. Although Hadoop is very efficient at dealing with certain types of tasks, such as retrieval from unstructured sources, relational systems, such as Oracle and SQL*Server, are better at operations on structured data, leading to the term being redefined recently to Not only SQL.

  • Data Mining – is about discovering patterns in large datasets involving various methods drawn from machine learning (what used to be referred to as artificial intelligence), statistics, database querying and visualisation.

  • Graphs - the mathematical structures used to model pairwise relations between objects. A "graph" in this context is made up of vertices or "nodes" and lines called edges that connect them. The classic graph in the geospatial world is the link and node structure used to represent a transport or utility network.

[1] McKinsey Global Institute: Next Frontier for Innovation, Competition and Productivity

[2] 9 levers for Converting Big Data and Analytics into Results. Christy Maver, IBM.

[5] Social Data Intelligence: Integrating Social and Enterprise Data for Competitive Advantage

Extracts from an article written by Andrew Coote, ConsultingWhere and published by GIS Professional magazine in their edition in June 2014

No comments:

Post a Comment