All posts by IEEECloudComputing

Even bigger data: Getting the most out of your implementation

(c)iStock.com/Erik Khalitov

By David Belanger, Senior Research Fellow in the Business Intelligence and Analysis Program at Stevens Institute of Technology and co-leader of the IEEE Big Data Initiative

Over the last week or so, I’ve had the opportunity to view a collection of talks and articles about new network technologies related to how data is gathered, moved, and distributed. A common element in all of these discussions is the presence of big data. 

New network technology is often the driver on quantum increases in the amount of data available for analysis. In order, think of the Internet, web, 3/4G mobility with smartphones and 24×7 access, and Internet of Things. Each of the above technologies will make dramatic changes in the way networks gather, carry and store data, while taken together they will generate and facilitate far more data for us to analyse and use.  The challenge will be getting more than proportional value from that increase in data.

We have already been through network technologies that have dramatically increased the number of hours a day that people could generate and consume data, and fundamentally changed our relationship with information. The half-life of usefulness of information has dropped as we can get, at nearly any time, many types of information in seconds using an army of “apps”; and the “inconvenience index”, or the amount of trouble we needed to go through to obtain information, has become measured in inches instead of feet or yards. The emergence of a vast array of devices connected to the Internet and measuring everything from human movement, to health, to the physical and commercial worlds, is starting to create an even larger flow of data.

This increase in the volume, volatility, and variety of data will be larger than any of its predecessors. The challenge is: will it create a proportionally large increase in the amount of information? Will it create a proportionally large increase in the value of that information? Or, will it create a deluge in which we can drown?

Fight or flight

Leaving aside the fact that much of the increase in “data” in flight over networks today is in the form of entertainment, the latest number I have heard is about 2/3, there is no question that the current flood of data has generated information of significant value.  This is certainly true in the financial industry, not least algorithmic trading, which not only uses big data to make better decisions, but automates the execution of those decisions.

In consumer marketing, the use of big data has fundamentally changed the approach to targeting from segmentation and aggregates, to targeting individuals, or even personas of individuals, by their behaviours. This has created much more customisation for applications ranging from recommendations to churn. Much of the management of current communications networks is completely dependent on big data for functions such as reliability, recovery, and security. It is clearly true in many branches of science, and is becoming true in the delivery of health care. Leaving aside potential issues of privacy, surveillance cameras have changed the nature of policing. As video data mining matures, cameras will challenge entertainment for volume of network traffic, and provide another opportunity for value generation.

We typically think of the analytics associated with these data as leading to more accurate decision making, followed by more effective actions.

Size is not always important 

The answer to the two questions above depends, in part, on how broadly based the skill set for effectively using this data becomes. Big data is not only a function of the volume (size), velocity (speed), and variety (text, speech, image, video) of the data available.  At least as important are the sets of tools that allow a broad variety of people to take advantage of that data, the availability of people with the necessary skills, and new types of applications that evolve.

Over much of the last two decades, big data was the province of organisations that both had access to lots of data, and had the scientific and engineering skills build tools to manage and analyse it; and in some cases the imagination to create business models to take advantage of it. That has changed dramatically over the last several years. The set of powerful, and usable tools has emerged both commercially and as open source.

Understanding how companies can obtain access to data in addition to their operationally generated data are evolving quickly, and leaders in many industries are inventing new business models to generate revenue. Finally, and perhaps most importantly, there is a large, growing body of applications in areas such as customer experience, targeted marketing, recommendation systems, and operational transparency, that are important to nearly every business, and are the basis for competition in the next several years. Skills needed to take advantage of this new data are within the reach of more companies than a few years ago. These include not only data scientists, but also a variety of engineers and technicians to produce hardened systems.

Conclusion

So, how do we think about this? First, the newly generated data will be much more open than traditional operational data. It will be worthwhile to those who think about an organisation’s data to look very seriously at augmenting their operational data with exogenously created data.

Second, you need to think creatively about integrating various forms of data together to create previously unavailable information. For example, in telecommunications, it is now fairly standard to integrate network, service, customer, and social network data to understand both customers and networks.

Third, skill sets must be updated now. You will need data scientists, data miners, but also data technicians to run a production level information based decision and automation capability. You will need people skilled in data governance – policy, process, and practices – to manage risks associated with big data use.

It is time to start building this capability.

About the author:

Dr. David Belanger is currently a Senior Research Fellow in the Business Intelligence and Analysis Program at Stevens Institute of Technology.  He is also co-leader of the IEEE Big Data Initiative. He retired in 2012 after many years as Chief Scientist and V.P. of Information, Software, and Systems Research at AT&T Labs.

Faster still: Analysing big data analytics and the agile enterprise

(c)iStock.com/sndr

By Mark Davis, Distinguised Big Data Engineer, Dell Software Group, Santa Clara, California 

Big data technologies are increasingly considered an alternative to the data warehouse. Surveys of large corporations and organisations bear out the strong desire to incorporate big data management approaches as part of their competitive strategy.

But what is the value that these companies see? Faster decision making, more complete information, and greater agility in the face of competitive challenges. Traditional data warehousing involved complex steps to curate and schematise data combined with expensive storage and access technologies.  Complete plans worked through archiving, governance, visualization, master data management, OLAP cubes, and a range of different user expectations and project stakeholders. Trying to manage these projects through to success also required coping with rapidly changing technology options. The end result was often failure.

With the big data stack, some of these issues are pushed back or simplified. For example, the issue of schematizing and merging data sources need not be considered up front in many cases, but can be done on a more on-demand basis. The concept of schema-on-read is based on a widely seen usage pattern for data that emerged from agile web startups. Log files from web servers needed to be merged with relational stores to provide predictive value about user “journeys” through the website. The log files could be left at rest in cheap storage on commodity servers beefed up with software replication capabilities. Only when parts of the logs needed to be merged or certain timeframes of access analyzed, did the data get touched.

Distributing data processing on commodity hardware led to the obvious next step of moving parts of the data into memory or processing it as it streams through the system. This most recent evolution of the big data stack shares characteristics with high performance computing techniques that have increasingly ganged together processors across interconnect fabrics rather than used custom processors tied to large collections of RAM. The BDAS (Berkeley Data Analytics Stack) exemplifies this new world of analytical processing. BDAS is a combination of in-memory, distributed database technologies like Spark, streaming systems like Spark Streaming, a graph database that layers on top of Spark called GraphX, and machine learning components called MLBase. Together these tools sit on top of Hadoop that provides a resilient, replicated storage layer combined with resource management.

What can we expect in the future? Data warehousing purists have watched these developments with a combination of interest and some degree of skepticism. The latter is because the problems and solutions that they have perfected through the years are not fully baked in the big data community. It seemed a bit like amateur hour.

But that is changing rapidly. Security and governance, for instance, have been weak parts of the big data story, but there are now a range of security approaches that range from Kerberos protocols permeating the stack to integrated ReST APIs with authentication at the edges of the clustered resources. Governance is likewise improving with projects growing out of the interplay between open source contributors and enterprises that want to explore the tooling. We will continue to see a rich evolution of the big data world until it looks more and more like traditional data warehousing, but perhaps with a lower cost of entry and increased accessibility for developers and business decision makers.

About the author:

Mark Davis founded one of the first big data analytics startups, Kitenga, that was acquired by Dell Software Group in 2012, where he now serves as a Distinguished Engineer. Mark led Big Data efforts as part of the IEEE Cloud Computing Initiative and is on the executive committee of the Intercloud Testbed Executive Committee, as well as contributing to the IEEE Big Data Initiative.

How software defined networking and cloud computing pave the way towards a digital society

Picture credit: iStockPhoto

By Antonio Manzalini

Ultra-broadband network proliferation, advances in information technology and the evolution of endpoint devices have created the conditions for re-inventing telecommunications networks and services architectures.

Software defined networking (SDN) and network function virtualization (NFV) are just two facets of the so-called “IT-zation or softwarization” of telecom infrastructures. SDN decouples the software, control plane, from the forwarding hardware of nodes such as routers and switches and executes the control software in the cloud or in any standard or hybrid processing resources made available, such as blades or servers. SDN doesn’t just affect the evolution of Layer 2 and Layer 3 services such as switching and networking, but also impacts Layer 4 to Layer 7 network functions.

In reality, there are a lot of middle-boxes deployed in current networks, including Wide Area Network (WAN) optimizers, Network Address Translation (NAT), performance-enhancing-proxies, intrusion detection and prevention systems such as firewalls. Virtualizing these middle-boxes network functions allow for considerable cost savings, and this is where NFV plays a role: virtualized network functions can be dynamically deployed and moved to various locations of an infrastructure where processing power is available, not only distributed into the network but even into the cloud.

However, SDN and NFV are not dependent on each other, but they are certainly mutually beneficial. For an example, it will be possible in the medium-term to develop network functions and services, spanning L2-L7, as applications and execute them on virtual resources (e.g. virtual machines) hosted either in centralized cloud computing data centers, or in distributed clusters of mini-data centers.

SDN and NFV are not new principles, as they were already proposed and demonstrated in the past as far back as the ‘90s, but now there are the conditions for a breakthrough: in fact, the telecom industry is considering them potentially impactful due to the emergence of ultra-broadband, low-latency connections and high performance processing power. Technology today appears to be ready for a sustainable deployment of SDN and NFV.

This is a potentially big transformation of the telecom infrastructure.

In the long term, the distinction between the network and the cloud is likely to disappear, as more and more functions will be performed either in the network or in the cloud depending on the performance requirements and costs optimizations. The control and orchestration capabilities will be the key factor for success in taming the complexity of an infrastructure executing millions of software transactions or service chains. The most important requirement will be ensuring ultra-low application latency.

Looking at the evolution of telecom from this perspective, the so-called role of a “software-defined operator” is gaining more traction. A software-defined operator could be described as an operator owning basically software networks and services platforms, whose L2-L7 functions are decoupled from the hardware and executed and operated either in distributed computing capabilities or in the cloud. This will be possible in less than five years, but the full adoption of this innovation will depend on a number of factors, including business models, sustainability and a proper regulation.

In general, software-defined operators will see dramatic costs reductions, including 40-50% savings in energy, CAPEX reductions, improved efficiency in the overall operations (as much as 35% OPEX savings just by automating processes), and reduced time-to-market when deploying services. Other strategic scenarios are even possible. These operators could possibly “upload and execute” their networks and services platforms anywhere in the world if there are infrastructure providers willing to rent hardware resources such as processing, storage and transmission. This might represent a disruptive evolution of the business towards OPEX-centric models, which are typical of the cloud.

It should be mentioned that IT advances in computing and storage hardware have an investment profile in terms of CAPEX and OPEX, which are quite different from those of the traditional networks. In fact, telecom network investments span across longer periods, anywhere from five to ten years and require maintenance operations efforts that are geographically distributed and increase operational costs. On the other hand, software implies large upfront costs for development and testing and requires a longer initial phase to guarantee carrier grade applications.

Even still, these trends are making network and cloud computing innovation accessible to all enterprises in almost any part of the world on an equal basis. This evolution will reduce the thresholds for new players to enter the telecom and information communication technology (ICT) markets: competition is being moved to the realm of software, and lower CAPEX and / or OPEX will be required to start providing ICT services.

In general, we can argue that these trends are accelerating the transition towards the Digital Society and the Digital Economy, where the network infrastructures, more and more pervasive and embedding processing and storage capabilities, will become the “nervous system” of our society. This will enable new service scenarios.

In the book The Second Machine Age, the authors Brynjolfsson and McAfee argue the exponential growth in the computing power of machines, the amount of digital information and the number of relatively cheap interconnected devices will bring soon the “machines” to do things that we, humans, are usually doing today. This is again another facet of the same IT-zation trend: the creation of a new and pervasive “machine intelligence”, supported by an highly flexible network, capable of fundamentally changing the economy.

In fact, even today data and information are instantaneously reaching almost every corner of the world through ultra-broadband, low-latency networks, where a huge amount of computing via the cloud is available to transform it into knowledge. But this is the definition of “intelligence”: the capability of processing and exchanging information to understand what’s happening in the environment, to adapt to changes and to learn.Availability of huge amount of cloud processing and storage, interconnected by flexible and fast networks will be create a pervasive “machine intelligence” able to morph the space-time physical dimensions of life, as the physical direct presence of humans will be less and less required to perform certain jobs or tasks.

When these intelligent machines will “flood the society landscape”, there will be a number of socio-economic impacts: reduction of human efforts in jobs subjected to computerization, robotization; increase of local production; reduction of long distance transportation; optimization of socio-economic processes; and industries will not need relocating where human labor cost are lower.

Eventually, because of this evolution, several economists, as well as technologists, have started to wonder if the usual representation of relationships among a myriad of players in the telecom and cloud domains can still be modeled on the bases of value chains.  There is a growing consensus that value chains modeling shall have to be complemented by a broader view, considering the business ecosystems of the true Digital Economy.

Antonio Manzalini received the M. Sc. Degree in Electronic Engineering from the Politecnico of Turin and is currently senior manager at the Innovation Dept. (Future Centre) of Telecom Italia. His current development interests are in the area of Software Defined Networks (SDN) and Network Functions Virtualization (NFV), as it relates to the evolution to 5G. Manzalini is also chair of the IEEE SDN Initiative, which is now seeking authors to present at the IEEE International Conference on Network Softwarization (NetSoft 2015), its flagship event, 13-17 April 2015, at the Cruciform Building at University College London.

Opinion: What’s the real reason to do cloud, again?

By Joe Weinman, author, Cloudonomics

Cloud computing is having a dramatic impact on all aspects of our lives: as consumers, we spend our time on cloud-based social networks and using apps downloaded from the cloud; as employees, we use a variety of cloud-based software applications; as citizens, we file our taxes and even encourage social transformation via the cloud.  For such a richly applicable general purpose technology, how can one begin to characterise its benefits?

The conventional answer is that cloud computing reduces costs and increases “business agility,” but the cloud is much more powerful than that.  The cloud can enhance customer experience, by leveraging a dispersed footprint to increase availability, reduce latency for interactive services, and locally customize user interfaces.  It can reduce cycle times and thus accelerate time to market and time to volume by offering “near-infinite” resources to speed tasks such as drug discovery.  It can reduce risk by better aligning infrastructure expenses with variable and unpredictable revenue flows.

The cost and performance benefits can be quantified, using the laws of Cloudonomics as I identified in my book by the same name.  For example, all other things being equal, one can identify an optimal hybrid architecture of private and public on-demand, pay-per-use resources based on the variability of resource requirements and the relative unit costs of a do-it-yourself strategy vs. public cloud provider pricing.  One can associate distributed service node build-outs or on-demand resourcing with latency reduction.

All those benefits are valid, and broadly applicable.  However, where the cloud—and related technologies such as big data and analytics, mobile, social, and the Internet of Things—become particularly powerful is when they are employed by companies to achieve strategic competitive advantage.

I’ve found that companies can use digital technologies in four major ways, which I call “digital disciplines.”

The first is “information excellence,” where companies can leverage information to dynamically optimize manufacturing, service, and other business processes.  Traditionally, analysis was conducted offline and led to long term process improvement; now, data is accessible in real time and heuristics or algorithms can be used to reduce process costs and intervals, increase asset utilization, and reduce unintended variation.

“Solution leadership” is a strategic discipline oriented towards creating differentiated products and services, but ones that rather than being standalone are products that link across networks to cloud-enabled services.  Connected cars and connected activity monitors are examples of this new generation of solutions.

“Collective intimacy” offers the ability to use collaborative filtering and detailed data from all users or customers to provide targeted, individual recommendations to each user.  Movie recommendations and book upselling are good consumer entertainment examples, but other examples include generating specific therapies for patients based on their individual genetic characteristics as well as software-enhanced medical data, such as pathology results.

Finally, the cloud can be used to enable “accelerated innovation,” by enabling low cost experiments but also by creating the necessary infrastructure for new constructs such as idea markets, crowdfunding, crowdsourcing, and open innovation contests.

As individual clouds become increasingly interoperable, thanks to efforts such as the IEEE’s P2302 Standard for Intercloud Interoperability and Federation, and the related IEEE Intercloud Testbed initiative, additional benefits will arise.  On the cost side, interoperability standards will accelerate the development of competitive cloud markets. 

On the benefit side, interoperability will enable various business applications and also devices in the Internet of Things to work with each other, ushering in additional optimization and user experience benefits as well as opening up additional strategic opportunities.  For example, smart cities could dynamically optimize the routes and timing for garbage collection trucks, school buses, and ambulances, given connected vehicles and the right cloud-based software.

One way to look at this is the benefits of on-demand, pay-per-use, elastic infrastructure.  But a more powerful way to think about it is as globally optimal decision-making based on powerful information integration from heterogeneous systems.

Regardless of whether one takes a chief financial officer’s view or that of a chief strategy officer, the cloud and related technologies are bound to continue to have a dramatic and transformational impact on the spheres of businesses and consumers.

About the author 

Joe Weinman is the author of Cloudonomics and the forthcoming Digital Disciplines, and the chair of the IEEE Intercloud Testbed Executive Committee, supported by the IEEE Cloud Computing Initiative.

Calling for a common cloud architecture

By Kathy L. Grise

The overarching theme around cloud computing is truly ubiquitous and reaches across not just the computing industry and professionals, but really touches across academia, government, and industry, plus the average, general consumer.   

A successful cloud implementation happens when it all functions effectively and is transparent to the user. The user should not have to worry about the where, how, or what behind the cloud. Issues like privacy, security, reliability, and accessibility should be transparent. Naturally, the success is based upon a sound architecture(s) behind cloud computing.

There are numerous pieces and parts that host, drive, and support cloud computing, ranging from its SaaS, PaaS, etc. to the basic and fundamental physical components.

To use the “drive” analogy, let’s think about what drives an automobile. For the purpose of this analogy, if you are a collector of cars for display only stop reading, but if it’s important that your car runs then read on. A functional automobile implies it has to be usable, e.g. drivable, and normally requires an engine. Typically, an engine needs six cylinders and pistons. Also needed are a crankshaft and connecting rods. In terms of driving the cloud, the cloud engine needs software, services, networking, storage, and platform(s) to operate seamlessly.

The cloud provider should ensure that all these pieces and parts fit nicely together, and that all issues are covered. Additionally, it is not necessarily a requirement that a single cloud provider carries the full burden of providing and servicing all the pieces and parts, but does have to ensure that each piece or part can communicate and function together. This drives demand for a common platform that can be interchangeable, interoperable, and interconnected.

It is critical to have common architectures that are interchangeable, interoperable, and interconnected for successful cloud services and applications

For example, a cloud provider could develop and offer a competitive solution on cloud security that differentiates itself among its competitors. As part of that solution, the provider would pull together an overall package from other specialised providers.  As a result of a shared burden, the provider can minimize their overall costs and advance in their field of security. 

This common platform has enabled the rapid startup of literally hundreds of new companies advancing cloud security for a multi-billion dollar industry, resulting in the creation of new jobs, opportunities, and advancements in technology.

The IEEE has a global initiative to develop interoperability between cloud to cloud, and its federation. Its P2302 draft standard defines topology, functions, and governance for cloud to cloud interoperability and federation. Topological elements include clouds, roots, exchanges (which mediate governance between clouds), and gateways (which mediate data exchange between clouds). Functional elements include name spaces, presence, messaging, resource ontologies (including standardized units of measurement), and trust infrastructure. Governance elements include registration, geo-independence, trust anchor, and potentially compliance and audit. IEEE’s Intercloud Testbed provides for a practical application and verification of P2302.

Overall, it is critical to have common architectures that are interchangeable, interoperable, and interconnected for successful cloud services and applications. Common architecture translates into real business, real jobs, real dollars, real advancements in technology, and ultimately benefits the end consumer. So let us all move towards a more interchangeable, interoperable, and interconnected environment for cloud computing.

About the author

Kathy Grise, IEEE Future Directions Program Director, works directly with IEEE volunteers, IEEE staff, and consultants in support of new initiatives, and is the IEEE staff program director for the IEEE Cloud Computing Initiative, Big Data Initiative, Green ICT Initiative, and the IEEE Technology Navigator. Prior to joining the IEEE staff, Ms. Grise held numerous positions at IBM, and most recently was a Senior Engineering Manager for Process Design Kit Enablement in the IBM Semiconductor Research and Development Center.

Cloud computing: An important component of the physical infrastructure for smart cities

By Victor M. Larios

Today half of the world’s population is living in urban areas, and cities are growing their infrastructures and services to keep up. Traditionally city governments have different departments to oversee the metropolitan services for citizens; however, departments are not fully communicating their plans and actions, utilizing their services as independent entities. As a city grows, duplicated efforts and waste of resources emerge. In developing a smart city infrastructure, it is necessary to think of cities as complex systems with departments as subsystems sharing all resources and assets.

For example, a typical department of transportation models traffic patterns in order to plan new roads or arrange streets for efficient mobility. In a systemic approach streets in the city are a shared resource – the education department adds traffic at peak times according to school schedules; the sanitation department influences traffic with low speed vehicles collecting garbage; and the environmental department estimates degrees of pollution via the density of traffic identified by the transportation department. Also, the health department could use such information, as well as weather conditions, to increase its pharmaceutical stock in relation to pollution numbers or anticipated storms or natural disasters.

In this scope, it’s fundamental for cities in a smartification process to consolidate their infrastructure according to the basic principles of services design such as modularity, exportability, interoperability, extensibility, and scalability.

Cloud computing technologies offer a good solution for cities to consolidate their physical infrastructure. Cloud technologies provide different levels of services such as IaaS (infrastructure as a service), PaaS (platform as a service) and SaaS (software as a service) for efficiency, quality of service on demand and green infrastructure.

In 2013 the IEEE launched a new and ambitious educational program for the development of Smart Cities with the goal of identifying and sharing best practices to support cities in their smartification process. Guadalajara, Mexico was the first city selected for this IEEE initiative due in part to the city government’s decision to renew the downtown and build a Digital Creative City (Guadalajara Ciudad Creativa Digital or GCCD).  The master plan, designed by Prof. Carlo Ratti and Prof. Dennis Frenchman from MIT and several consultancy groups, proposed a master plan to transform the city without loosing its traditions, identity, and architecture.

During the kickoff workshop in Guadalajara in October 2013 local IEEE volunteers defined a strategy for six working groups to tackle different layers of the Smart City: 1) Physical Infrastructure, 2) Internet of Things, 3) Open Data Framework, 4) Analytics and Visualization, 5) Metrics for Smart Cities, and 6) Education for Smart Cities.

According to the GCCD original master plan, the city environment would have a device sensor network and a set of cloud services.  Infrastructure requirements include an optical fiber backbone network and data center facilities for urban informatics. In order to innovate interactions of citizens with information and services, the urban informatics are based on private cloud services supported by the concept of the Urban Operating System (UOS). The UOS is a complex event system manager based on data analytics from the sensor network that optimizes and forecasts the use of city resources and services by the citizens.

Cloud computing technologies offer a good solution for cities to consolidate their physical infrastructure

One of the first proposals is private cloud architecture for the city of Guadalajara. A roadmap in cloud technologies development envisioned for two years includes a local talent program to develop skills in cloud computing to encourage innovative solutions for the coming challenges.

One of the projects in progress at GCCD is the construction of the digital creative accelerator complex as a smart building to host small/medium business for creative industries, and an innovation center with living labs for IoT, smart cities, and other areas of research interest for the municipality. Besides this new smart building complex, a renewal smart building project called the “Ingenium Campus” will be the vector of knowledge within the city.

It comprises an incubator to support local start-ups in the digital creative and smart cities fields, a media arts magnet middle school and the Ingenium Institute as a joint effort between the educational universities of Guadalajara and the GCCD companies for talent engagement, education and entrepreneurship development.

Hence, the first services in the private cloud will be related to environmental, social and economic aspects of the city.

The challenges foreseen for the IEEE Guadalajara pilot and cloud initiative include finding a cost effective strategy to support the private cloud, and ensuring security for cloud users, which is paramount given the mixed environment of government, citizens and companies who will be sharing the private cloud. Additionally, the city must adapt public policies to enhance the benefits of a consolidated infrastructure with the proposed private cloud and concepts such as the UOS. With Guadalajara as its first Smart City, the IEEE Smart Cities Initiative is working to build a consortium of cities to share their experience and best practices.

About the author

Victor M. Larios has received his PhD and a DEA in Computer Science at the Technological University of Compiègne, France and a BA in Electronics Engineering at the ITESO University in Guadalajara, Mexico. He works at the University of Guadalajara (UDG) holding a Full Professor-Researcher position at the Department of Information Systems and he is the director of the Smart Cities Innovation Center at CUCEA UDG Campus. Besides, Dr. Victor M. Larios is the founder of the UDG PhD in Information Technologies in 2007, and has been leading projects in Guadalajara between academy, government and high tech industry as IBM, Intel and HP focusing his research to distributed systems, parallel computing, data analytics and visualization, serious games and smart cities.