(c)iStock.com/Erik Khalitov
By David Belanger, Senior Research Fellow in the Business Intelligence and Analysis Program at Stevens Institute of Technology and co-leader of the IEEE Big Data Initiative
Over the last week or so, I’ve had the opportunity to view a collection of talks and articles about new network technologies related to how data is gathered, moved, and distributed. A common element in all of these discussions is the presence of big data.
New network technology is often the driver on quantum increases in the amount of data available for analysis. In order, think of the Internet, web, 3/4G mobility with smartphones and 24×7 access, and Internet of Things. Each of the above technologies will make dramatic changes in the way networks gather, carry and store data, while taken together they will generate and facilitate far more data for us to analyse and use. The challenge will be getting more than proportional value from that increase in data.
We have already been through network technologies that have dramatically increased the number of hours a day that people could generate and consume data, and fundamentally changed our relationship with information. The half-life of usefulness of information has dropped as we can get, at nearly any time, many types of information in seconds using an army of “apps”; and the “inconvenience index”, or the amount of trouble we needed to go through to obtain information, has become measured in inches instead of feet or yards. The emergence of a vast array of devices connected to the Internet and measuring everything from human movement, to health, to the physical and commercial worlds, is starting to create an even larger flow of data.
This increase in the volume, volatility, and variety of data will be larger than any of its predecessors. The challenge is: will it create a proportionally large increase in the amount of information? Will it create a proportionally large increase in the value of that information? Or, will it create a deluge in which we can drown?
Fight or flight
Leaving aside the fact that much of the increase in “data” in flight over networks today is in the form of entertainment, the latest number I have heard is about 2/3, there is no question that the current flood of data has generated information of significant value. This is certainly true in the financial industry, not least algorithmic trading, which not only uses big data to make better decisions, but automates the execution of those decisions.
In consumer marketing, the use of big data has fundamentally changed the approach to targeting from segmentation and aggregates, to targeting individuals, or even personas of individuals, by their behaviours. This has created much more customisation for applications ranging from recommendations to churn. Much of the management of current communications networks is completely dependent on big data for functions such as reliability, recovery, and security. It is clearly true in many branches of science, and is becoming true in the delivery of health care. Leaving aside potential issues of privacy, surveillance cameras have changed the nature of policing. As video data mining matures, cameras will challenge entertainment for volume of network traffic, and provide another opportunity for value generation.
We typically think of the analytics associated with these data as leading to more accurate decision making, followed by more effective actions.
Size is not always important
The answer to the two questions above depends, in part, on how broadly based the skill set for effectively using this data becomes. Big data is not only a function of the volume (size), velocity (speed), and variety (text, speech, image, video) of the data available. At least as important are the sets of tools that allow a broad variety of people to take advantage of that data, the availability of people with the necessary skills, and new types of applications that evolve.
Over much of the last two decades, big data was the province of organisations that both had access to lots of data, and had the scientific and engineering skills build tools to manage and analyse it; and in some cases the imagination to create business models to take advantage of it. That has changed dramatically over the last several years. The set of powerful, and usable tools has emerged both commercially and as open source.
Understanding how companies can obtain access to data in addition to their operationally generated data are evolving quickly, and leaders in many industries are inventing new business models to generate revenue. Finally, and perhaps most importantly, there is a large, growing body of applications in areas such as customer experience, targeted marketing, recommendation systems, and operational transparency, that are important to nearly every business, and are the basis for competition in the next several years. Skills needed to take advantage of this new data are within the reach of more companies than a few years ago. These include not only data scientists, but also a variety of engineers and technicians to produce hardened systems.
Conclusion
So, how do we think about this? First, the newly generated data will be much more open than traditional operational data. It will be worthwhile to those who think about an organisation’s data to look very seriously at augmenting their operational data with exogenously created data.
Second, you need to think creatively about integrating various forms of data together to create previously unavailable information. For example, in telecommunications, it is now fairly standard to integrate network, service, customer, and social network data to understand both customers and networks.
Third, skill sets must be updated now. You will need data scientists, data miners, but also data technicians to run a production level information based decision and automation capability. You will need people skilled in data governance – policy, process, and practices – to manage risks associated with big data use.
It is time to start building this capability.
About the author:
Dr. David Belanger is currently a Senior Research Fellow in the Business Intelligence and Analysis Program at Stevens Institute of Technology. He is also co-leader of the IEEE Big Data Initiative. He retired in 2012 after many years as Chief Scientist and V.P. of Information, Software, and Systems Research at AT&T Labs.