In the run up to this month's Geo Big 5 Big Data event (30th Sep, IBM, London) Andy
Coote reports on some of the insights gained from speaking and listening to
some of the foremost experts in the field and ponders the place of location in
Big Data. A glossary of Big Data terminology is also provided.
Big Data, why should I care?
In their recent report on Big Data, McKinsey suggest it is becoming a key battlefield of competitive advantage, underpinning new waves of productivity growth, innovation, and consumer behaviour. One of the key application areas they highlight is geo-centric - personal navigation data. They assess the application of such data as being worth $800bn worldwide during the current decade. Even if McKinsey are an order of magnitude too high in this forecast, it is still a staggeringly large potential market for the location industry.
Mobile devices, earth observation satellites and the Internet of Things are just a few sources contributing to creating the world of Big Data. But it is about more than just Volume. Big Data also describes data sets with a high Velocity of change (such as real time data streams), and with a wide Variety of data types - collectively known as the three V's.
This combination makes processing and analysis difficult using conventional tools. In particular, the volume and mix of structured and unstructured data, is a challenge for object-relational database management systems (such as Oracle and SQL*server) that most organisations currently use to underpin their data management. Here the major disruptive technology has been Hadoop, employed by search engines to produce the almost instant query response we have all come to expect from Google et al.
The huge additional business value to be derived from Big Data comes from what Accenture describe as finding new insights. These might include identification of financial fraud, increasing retail sales or sources of inefficiency in Government. None of these are new, but the science of what is often termed predictive analytics in Big Data circles, is introducing new tools and techniques which rely heavily on what we might have previously called spatial analysis and 4D visualisation.
Applications
According to John Morton, until recently with SAS
but now an independent Big Data consultant, location figures in a wide range of
applications because of its ability to reveal new information patterns and
present information to senior executives visually.
Some real examples were showcased at the recent
Strata 14 conference on Big Data in San Francisco including:
Transport – Ian Huston, Data Scientist at Pivotal, sees Big
Data analytics as a way to bring techniques from other disciplines, such as
change point detection used in the wind turbine industry and cell population
analysis from biology to complex problems of traffic management.
Retail – Susan Ethlinger, Altimeter Group, described as
an example the use of location to identify problems in the supply chain of
steak restaurants to illustrate deriving actionable intelligence from existing social
and enterprise information sources.
Security – Ari Gescher, Palantir, presented “Adaptive
Adversaries: Systems to stop fraud and cyber intruders”, where he described the
use of geocoding of servers through IP addresses and various other “location
assets” to provide intelligence to banks.
Health – genomics, the science of gene sequencing which
involves very complex calculations on very large datasets takes centre stage in
this sector. However, the medical insurers, such as Kaiser Permanente in the
United States are also making heavy use of tools such as ArcGIS as part of
their Big Data strategy.
Location in Big Data Platforms
Different suppliers appear to have different views on the potential for
location analytics in Big Data solutions.
SAP have
taken the decision to embed Esri technology into the core of their product,
which they believe will enable their users to more simply leverage geospatial
tools as part of the HANA in-memory computing platform.
In contrast, Steve Jones, Cap Gemini, (partners
with Pivotal in the Big Data space), believes the dominant approach will see designers
building location analytics for their platforms as they find it useful.
According to Jones, Big Data analytics will borrow the algorithms of GIS via
good developers but will not try to “shoehorn” existing products into their
architectures.
Another aspect of the Big Data debate was
outlined by Steve Hagen of Oracle. Speaking recently at a UN GGIM meeting, he
suggested that real time feeds of location data are simply so huge that they
are unmanageable in raw form and that filtering at source before loading into
databases is the only viable solution. It seems to me however, that although
deciding what to keep requires skills which geospatial practitioners unique
possess, it does pre-suppose you know in advance what insights you might find.
Big Data & Location - Geo Big 5
So much energy is being pumped into the Big Data
story, it won’t go away. Even if it is simply a rebranding of concepts that have
existed for a long time such as business intelligence. Why is it important to the location market?
Because it is potentially a huge opportunity - well over 50 % of the
presentations at the Strata conference used geo-centric use cases to
demonstrate their solutions or ideas. Furthermore, there seemed to be a general
under-estimation of the richness of insight that location analytics (what we
used to call spatial analysis) could bring to the party.
If you’d like to understand more about what Big
Data means for the location industry, the AGI is organising an event on Tuesday 30th September in London titled
simply “Big Data and Location”. Hosted
at the prestigious IBM Centre on the South Bank, it will bring together the
main players from the Big Data and Geospatial worlds to explain technical concepts
and showcase real applications. For more
information go to the AGI website www.agi.org.uk
Andy Coote is Chief Executive at location consulting specialists ConsultingWhere
Glossary of Technical Terms:
Hadoop - is a database file system for storage and large-scale processing of data-sets on clusters of commodity processors. The concept relies upon storing data items multiple times across different processors/disks for resilience and fast retrieval. Originally developed in 2005 by two of Yahoo’s engineers it underpins most of the search engines, Facebook, and many of th
Mapreduce – is the programming framework that enables fast retrieval of data from Hadoop clusters. Originally developed by Google, it is based on algorithms that schedule and handle parallel communications necessary to make that retrieval fast and reliable. Put another way, it supports massive multi-threading of processes.
NoSQL – is a term used to refer to the storage and retrieval of data which does not rely on SQL and the relational model of storage, of which Hadoop is typical. Although Hadoop is very efficient at dealing with certain types of tasks, such as retrieval from unstructured sources, relational systems, such as Oracle and SQL*Server, are better at operations on structured data, leading to the term being redefined recently to Not only SQL.
Data Mining – is about discovering patterns in large datasets involving various methods drawn from machine learning (what used to be referred to as artificial intelligence), statistics, database querying and visualisation.
- Graphs - the mathematical structures used to model pairwise relations between objects. A "graph" in this context is made up of vertices or "nodes" and lines called edges that connect them. The classic graph in the geospatial world is the link and node structure used to represent a transport or utility network.