Category Archives: Cloudera

What is the promise of big data? Computers will be better than humans

AI-Artificial-Intelligence-Machine-Learning-Cognitive-ComputingBig data as a concept has in fact been around longer than computer technology, which would surprise a number of people.

Back in 1944 Wesleyan University Librarian Fremont Rider wrote a paper which estimated American university libraries were doubling in size every sixteen years meaning the Yale Library in 2040 would occupy over 6,000 miles of shelves. This is not big data as most people would know it, but the vast and violent increase in the quantity and variety of information in the Yale library is the same principle.

The concept was not known as big data back then, but technologists today are also facing a challenge on how to handle such a vast amount of information. Not necessarily on how to store it, but how to make use of it. The promise of big data, and data analytics more generically, is to provide intelligence, insight and predictability but only now are we getting to a stage where technology is advanced enough to capitalise on the vast amount of information which we have available to us.

Back in 2003 Google wrote a paper on its MapReduce and Google File System which has generally been attributed to the beginning of the Apache Hadoop platform. At this point, few people could anticipate the explosion of technology which we’ve witnessed, Cloudera Chairman and CSO Mike Olson is one of these people, but he is also leading a company which has been regularly attributed as one of the go-to organizations for the Apache Hadoop platform.

“We’re seeing innovation in CPUs, in optical networking all the way to the chip, in solid state, highly affordable, high performance memory systems, we’re seeing dramatic changes in storage capabilities generally. Those changes are going to force us to adapt the software and change the way it operates,” said Olson, speaking at the Strata + Hadoop event in London. “Apache Hadoop has come a long way in 10 years; the road in front of it is exciting but is going to require an awful lot of work.”

Analytics was previously seen as an opportunity for companies to look back at its performance over a defined period, and develop lessons for employees on how future performance can be improved. Today the application of advanced analytics is improvements in real-time performance. A company can react in real-time to shift the focus of a marketing campaign, or alter a production line to improve the outcome. The promise of big data and IoT is predictability and data defined decision making, which can shift a business from a reactionary position through to a predictive. Understanding trends can create proactive business models which advice decision makers on how to steer a company. But what comes next?

Mike Olsen

Cloudera Chairman and CSO Mike Olsen

For Olsen, machine learning and artificial intelligence is where the industry is heading. We’re at a stage where big data and analytics can be used to automate processes and replace humans for simple tasks. In a short period of time, we’ve seen some significant advances in the applications of the technology, most notably Google’s AlphaGo beating World Go champion Lee Se-dol and Facebook’s use of AI in picture recognition.

Although computers taking on humans in games of strategy would not be considered a new PR stunt, IBM’s Deep Blue defeated chess world champion Garry Kasparov in 1997, this is a very different proposition. While chess is a game which relies on strategy, go is another beast. Due to the vast number of permutations available, strategies within the game rely on intuition and feel, a complex task for the Google team. The fact AlphaGo won the match demonstrates how far researchers have progressed in making machine-learning and artificial intelligence a reality.

“In narrow but very interesting domains, computers have become better than humans at vision and we’re going to see that piece of innovation absolutely continue,” said Olsen. “Big Data is going to drive innovation here.”

This may be difficult for a number of people to comprehend, but big data has entered the business world; true AI and automated, data-driven decision may not be too far behind. Data is driving the direction of businesses through a better understanding of the customer, increase the security of an organization or gaining a better understanding of the risk associated with any business decision. Big data is no longer a theory, but an accomplished business strategy.

Olsen is not saying computers will replace humans, but the number of and variety of processes which can be replaced by machines is certainly growing, and growing faster every day.

Deloitte and Cloudera create compliance service in the cloud

CloudProfessional service company Deloitte and cloud operator Cloudera have launched a jointly created cloud service that helps financial services people meet their compliance obligations more easily. It aims to specifically ease the workload created by the supervisory rules of the capital analysis and review (CCAR) process.

The Deloitte CCAR service aims to help companies cope with the masses of data needed to stress test financial products as regulations constantly change. Annual CCAR supervisory rules regularly specify new scenarios and datasets to be used in credit risk, liquidity risk, market risk, pre-provision net revenue (PPNR) and capital management models.

The cost and time involved in constantly processing these complicated variables, in order to generate the forecasted stress estimates, is escalating as the number of quarterly and yearly models multiples, according to Deloitte.

The Deloitte-designed solution includes accelerators to streamline data selection, data quality, variables conversion, data ingestion and management and to convert or migrate models to the SAS DS2 or Apache Spark or Python programming languages.

Cloudera was approached to use its expertise in Apache Hadoop open source software frameworks in order to create the visualization and dashboard tools promised in the system. The tools are designed to interact with the results of stress tests so they can quickly identify trends and potential sources of risk.

Deloitte built accelerators in Spark that cater for a wide variety of contingencies, which cuts the cost and risk of migrating existing CCAR models into an open source environment at first and into  the SAS DS2 once it is released.

“The current regulatory environment that our clients face is more complex than at any time in history,” said Ashish Verma, director at Deloitte Consulting LLP. “This complexity in regulation has led to complexity in data management, making compliance very costly with little benefit to the business.”

Cloudera has created a ‘cost effective solution’ to the problems faced by clients, said Verma, “storing this data within Cloudera Enterprise means companies can perform additional non-compliance analysis and potentially develop a deeper understanding of their businesses.”

Cloudera announces tighter security measures for Hadoop

Cloud securityCloudera has announced a new open source project that aims to use real-time analytical applications in Hadoop and an open source security layer for unified access control enforcement.

Kudu, an in-memory store for Hadoop, aims to give developers more choice and stop them from having their options limited. Currently developers must choose between fast analytics with HDFS or updating data with HBase. Combining the two, according to Cloudera, can be potentially fatal for any developers that try, since the systems are both highly complex.

Cloudera says Kudu eliminates the complexities involved in processes like time series analysis, machine data analytics and online reporting. It does this by supporting high-performance sequential and random reads and writes, enabling fast analytics on changing data.

Cloudera co-authored Kudu with Intel, which helped it make better use of in-memory hardware and Intel’s 3D XPoint technology. Other contributors included Xiaomi, AtScale, Splice Machine and Zoomdata.

“Our infrastructure team has been working with Cloudera to develop Kudu, taking advantage of its unique ability to support columnar scans and fast inserts and updates to continue to expand our Hadoop ecosystem footprint,” Baoqiu Cui, chief architect at smartphone developer Xiaomi, told CIO magazine. “Using Kudu, alongside interactive SQL tools like Impala, has allowed us to build a next-generation data analytics platform for real-time analytics and online reporting.”

Meanwhile a new core security for Hadoop has been launched. RecordService aims to provide unified access control enforcement for Hadoop by enforcing role based access controls. It acts as a new layer that sits between Hadoop’s storage and computing engines and aims to consistently enforce the role-based access controls defined by Sentry. RecordService also provides dynamic data masking across Hadoop, protecting sensitive data as it is accessed.

“Security is a critical part of Hadoop, but for it to evolve the security needs to become universal across the platform. With RecordService, the Hadoop community fulfils the vision of unified fine-grained access controls for every Hadoop access path,” said Mike Olson, co-founder and chief strategy officer at Cloudera.