Category Archives: Big Data

Big Data Future Spurs Acquisitions

In October 2011, Oracle announced its acquisition of Endeca Technologies, an enterprise search and data management company providing enterprises with non-structured data management, e-commerce and business intelligence technology.

In November 2011, IBM announced its acquisition of Platform Computing, an HPC software company with excellent performance in cloud computing and big data.

In February 2012, Groupon acquired Adku, a startup that uses big data to personalize online shopping experience for people visiting e-commerce sites like eBay and Amazon.

In March 2012, EMC announced its acquisition of Pivotal Labs, a private agile software developer and tool provider headquartered in San Francisco.

In the past two years, international IT giants, including IBM, Oracle, EMC and SAP, have been engaged in an upsurge of acquisition in the big data market, spending more than $1.5 billion in acquiring related data management and analysis companies. Big data becomes a new hot term after “cloud computing” in the IT and financial sectors.

The upsurge of big data results from the integrated development of the new-generation information technology, and the processing and analysis of big data in turn becomes a key support for the said integrated development.

The Internet of Things (IoT), mobile Internet, digital home and social network services are the applications of the new-generation information technology. Big data is continuously increasing together with these applications, whereas cloud computing provides the storage and computing platform for massive and diversified big data. It is estimated that the global data storage volume was 1.8ZB in 2011, and it will hit 2.7ZB in 2012 and exceed 8ZB in 2015. The growth rate of structured data is around 32%, and that of non-structured data 63%.

In the retail sector, analysis on big data enables retailers to master the real-time market trends and promptly take corresponding measures. Walmart has started analyzing the massive sales data of all its chain stores in combination with weather data, economics and demography, so as to select proper products for each chain store and determine the timing of discounts.

In the Internet sector, analysis on big data helps manufacturers develop more precise and effective marketing strategies. Facebook and eBay are analyzing and exploring massive data from social networks and online transaction data, with an aim of providing personalized advertising services.

In the utility sector, big data have begun to play a significant role. Many European cities guide drivers to select the best routes by analyzing real-time traffic flow data, thereby improving traffic conditions. The United Nations also launched “Global Pulse”, a program aiming to accelerate global economic development with big data.

The enormous commercial value of and market demand for big data are driving transformation of the information industry. New big data-oriented products, technologies, services and models are constantly emerging.

On one hand, the challenges such as effective storage, fast read-write and real-time analysis will have significant impacts on the chip and storage industry as well as incubate the integrated data storage & processing server and memory computing markets.

On the other hand, the enormous value of big data will lead to urgent needs for fast data processing and analysis as well as give rise to the unprecedented prosperity of data exploration and business intelligence markets.


The Self-Driving Car Company

Alexsis Madrigal offers an inside look at what Google is doing with Maps and all those Streetview photos they’ve amassed. It’s jaw-dropping in its scope and audacity — and in its implications for the future:

“…as my friend and sci-fi novelist Robin Sloan put it to me, “I maintain that this is Google’s core asset. In 50 years, Google will be the self-driving car company (powered by this deep map of the world) and, oh, P.S. they still have a search engine somewhere.”

Read the article.


Google’s Dremel is the Holy Grail of Big Data: Really Big, Really Fast, Really Simple

First Google created, and wrote papers on, Hadoop and MapReduce, which got reverse-engineered into the current state of the art for Big Data.

But Google has moved on to Dremel, and the rest of the world is slow in catching up.

With BigQuery Google offers a simple-to-user service that doesn’t sacrifice Big Data scale OR speed.

As  Armando Fox, a professor of computer science at the University of California, Berkeley who specializes in these sorts of data-center-sized software platforms. put it in a Wired article:

“This is unprecedented. Hadoop is the centerpiece of the “Big Data” movement, a widespread effort to build tools that can analyze extremely large amounts of information. But with today’s Big Data tools, there’s often a drawback. You can’t quite analyze the data with the speed and precision you expect from traditional data analysis or “business intelligence” tools. But with Dremel, Fox says, you can.

“They managed to combine large-scale analytics with the ability to really drill down into the data, and they’ve done it in a way that I wouldn’t have thought was possible,” he says. “The size of the data and the speed with which you can comfortably explore the data is really impressive. People have done Big Data systems before, but before Dremel, no one had really done a system that was that big and that fast.

“Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.”


DataIQ Aims for Big Data Visualization Simplicity at Low Cost

Image representing StoredIQ as depicted in Cru...

StoredIQ today announced DataIQ, designed to give organizations a simple, low-cost, visualization solution to understand their unstructured data — without first moving it to a repository — to help answer data intelligence questions that challenge many of today’s IT organizations such as:

“Every company has more data than it can manage. But few can precisely locate it, assess the value of it, or make much sense out of it when they need to,” said Ted Friedman, vice president and distinguished analyst with Gartner. “Information management projects can be very intimidating. To set a course for success, organizations need to build insight regarding the whereabouts, meaning, and usage of their data.”

Designed to run on the StoredIQ Platform, DataIQ scales from terabytes to petabytes, from single corporate offices to global enterprises, and provides a single, holistic view across a multitude of enterprise data sources and hundreds of file types — without moving any data from its native location. With DataIQ, organizations have the power to make informed decisions before starting any information management initiative including: data migration, storage optimization, records management, eDiscovery, data clean up, and information governance. DataIQ gives companies the ability to:

  • Identify – interesting subsets of information without moving
    any data across the corporate network
  • Analyze – data using advanced visualizations to spot compliance
    violations, get out in front of the eDiscovery process, make
    infrastructure planning decisions, jump start records initiatives, etc.
  • Act – copy, collect, and move data that requires further
    processing or retention; defensibly delete data that provides negative
    value to the company

“As we worked with customers on information management initiatives from eDiscovery, information governance, storage, and records retention, one common theme bubbled up over and over…customers needed a simple tool to give them a comprehensive understanding of their unstructured data,” said Phil Myers, CEO of StoredIQ. “DataIQ was designed as a quick start data intelligence application that empowers customers with knowledge about their data to better plan and prepare for any information management project.”


NextBio, Intel Collaborate to Optimize Hadoop for Genomics Big Data

Image representing nextbio as depicted in Crun...

NextBio and Intel announced today a collaboration aimed at optimizing and stabilizing the Hadoop stack and advancing the use of Big Data technologies in genomics. As a part of this collaboration, the NextBio and Intel engineering teams will apply experience they have gained from NextBio’s use of Big Data technologies to the improvement of HDFS, Hadoop, and HBase. Any enhancements that NextBio engineers make to the Hadoop stack will be contributed to the open-source community. Intel will also showcase NextBio’s use of Big Data.

“NextBio is positioned at the intersection of Genomics and Big Data. Every day we deal with the three V’s (volume, variety, and velocity) associated with Big Data – We, our collaborators, and our users are adding large volumes of a variety of molecular data to NextBio at an increasing velocity,” said Dr. Satnam Alag, chief technology officer and vice president of engineering at NextBio. “Without the implementation of our algorithms in the MapReduce framework, operational expertise in HDFS, Hadoop, and HBase, and investments in building our secure cloud-based infrastructure, it would have been impossible for us to scale cost-effectively to handle this large-scale data.”

“Intel is firmly committed to the wide adoption and use of Big Data technologies such as HDFS, Hadoop, and HBase across all industries that need to analyze large amounts of data,” said Girish Juneja, CTO and General Manager, Big Data Software and Services, Intel. “Complex data requiring compute-intensive analysis needs not only Big Data open source, but a combination of hardware and software management optimizations to help deliver needed scale with a high return on investment. Intel is working closely with NextBio to deliver this showcase reference to the Big Data community and life science industry.”

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data,” Dr. Alag continued. “NextBio has invested significantly in the use of Big Data technologies to handle the tsunami of genomic data being generated and its expected exponential growth. As we further scale our infrastructure to handle this growing data resource, we are excited to work with Intel to make the Hadoop stack better and give back to the open-source community.”


NextBio, Intel Collaborate to Optimize Hadoop for Genomics Big Data

Image representing nextbio as depicted in Crun...

NextBio and Intel announced today a collaboration aimed at optimizing and stabilizing the Hadoop stack and advancing the use of Big Data technologies in genomics. As a part of this collaboration, the NextBio and Intel engineering teams will apply experience they have gained from NextBio’s use of Big Data technologies to the improvement of HDFS, Hadoop, and HBase. Any enhancements that NextBio engineers make to the Hadoop stack will be contributed to the open-source community. Intel will also showcase NextBio’s use of Big Data.

“NextBio is positioned at the intersection of Genomics and Big Data. Every day we deal with the three V’s (volume, variety, and velocity) associated with Big Data – We, our collaborators, and our users are adding large volumes of a variety of molecular data to NextBio at an increasing velocity,” said Dr. Satnam Alag, chief technology officer and vice president of engineering at NextBio. “Without the implementation of our algorithms in the MapReduce framework, operational expertise in HDFS, Hadoop, and HBase, and investments in building our secure cloud-based infrastructure, it would have been impossible for us to scale cost-effectively to handle this large-scale data.”

“Intel is firmly committed to the wide adoption and use of Big Data technologies such as HDFS, Hadoop, and HBase across all industries that need to analyze large amounts of data,” said Girish Juneja, CTO and General Manager, Big Data Software and Services, Intel. “Complex data requiring compute-intensive analysis needs not only Big Data open source, but a combination of hardware and software management optimizations to help deliver needed scale with a high return on investment. Intel is working closely with NextBio to deliver this showcase reference to the Big Data community and life science industry.”

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data,” Dr. Alag continued. “NextBio has invested significantly in the use of Big Data technologies to handle the tsunami of genomic data being generated and its expected exponential growth. As we further scale our infrastructure to handle this growing data resource, we are excited to work with Intel to make the Hadoop stack better and give back to the open-source community.”


NextBio, Intel Collaborate to Optimize Hadoop for Genomics Big Data

Image representing nextbio as depicted in Crun...

NextBio and Intel announced today a collaboration aimed at optimizing and stabilizing the Hadoop stack and advancing the use of Big Data technologies in genomics. As a part of this collaboration, the NextBio and Intel engineering teams will apply experience they have gained from NextBio’s use of Big Data technologies to the improvement of HDFS, Hadoop, and HBase. Any enhancements that NextBio engineers make to the Hadoop stack will be contributed to the open-source community. Intel will also showcase NextBio’s use of Big Data.

“NextBio is positioned at the intersection of Genomics and Big Data. Every day we deal with the three V’s (volume, variety, and velocity) associated with Big Data – We, our collaborators, and our users are adding large volumes of a variety of molecular data to NextBio at an increasing velocity,” said Dr. Satnam Alag, chief technology officer and vice president of engineering at NextBio. “Without the implementation of our algorithms in the MapReduce framework, operational expertise in HDFS, Hadoop, and HBase, and investments in building our secure cloud-based infrastructure, it would have been impossible for us to scale cost-effectively to handle this large-scale data.”

“Intel is firmly committed to the wide adoption and use of Big Data technologies such as HDFS, Hadoop, and HBase across all industries that need to analyze large amounts of data,” said Girish Juneja, CTO and General Manager, Big Data Software and Services, Intel. “Complex data requiring compute-intensive analysis needs not only Big Data open source, but a combination of hardware and software management optimizations to help deliver needed scale with a high return on investment. Intel is working closely with NextBio to deliver this showcase reference to the Big Data community and life science industry.”

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data,” Dr. Alag continued. “NextBio has invested significantly in the use of Big Data technologies to handle the tsunami of genomic data being generated and its expected exponential growth. As we further scale our infrastructure to handle this growing data resource, we are excited to work with Intel to make the Hadoop stack better and give back to the open-source community.”


Actuate and Pervasive Software Team to Provide Interactive Visualization of Big Data Analytics

 

Image representing Actuate as depicted in Crun...

Image representing Pervasive Software as depic...

Actuate Corporation today announced an alliance with Pervasive Software Inc. that will enable business data analysts to rapidly review, prepare and analyze big data, and display intuitive data visualizations to support users’ ability to make efficient business decisions.

By speeding Big Data-based decision making, powering predictive analytics and decreasing capital and operating costs, ActuateOne and Pervasive RushAnalyzer “will make Big Data analytics and powerful visualizations available to business users in any industry and to the BIRT developer community.”

“Pervasive RushAnalyzer, the first predictive analytics product to run natively on Hadoop, enables users to rapidly transform and analyze terabytes of data on commodity hardware, and ActuateOne provides the advanced visualization capabilities to support insights and more productive conclusions,” said Mike Hoskins, CTO and general manager of Pervasive, Big Data Products and Solutions. “Pervasive’s seamless integration with Actuate, via BIRT, puts advanced Big Data analytic insights and actionable intelligence into the hands of multiple roles within an organization.”

“Big Data analytics has been traditionally been the realm of data scientists,” said Nobby Akiha, Senior Vice President of Marketing for Actuate. “By teaming with Pervasive, we are changing the game to ensure business users are in the driver’s seat to analyze Big Data sources so that they can operationalize and deliver insights to everyday users.”

ActuateOne – an integrated suite of standard and cloud software built around BIRT – enables easy visualization of data trends through customizable BIRT-based dashboards and Google-standard plug-and-play gadgets. Pervasive RushAnalyzer lets data analysts build and deploy predictive analytics solutions on multiple platforms, including Hadoop clusters and high-performance servers, to rapidly discover data patterns, build operational analytics and deliver predictive analytics. The drag-and-drop graphical interface speeds data preparation with direct access to multiple databases and file formats, as well as a prebuilt library of data mining and analytic operators, leading to simpler data manipulation, mining and visualization.


Qubole Exits Stealth Mode, Introduces Auto-Scaling Big Data Platform

Image representing Hadoop as depicted in Crunc...

Qubole exited stealth mode today to introduce its auto-scaling Big Data platform, “combining the power of Apache Hadoop and Hive with the simplicity of a Cloud platform in order to accelerate time-to-value from Big Data.” Qubole, a Silver Sponsor of next week’s Hadoop Summit 2012 conference, also invites business analysts, data scientists, and data engineers to participate in the Qubole early access program.

While most well known as creators of Apache Hive and long-time contributors to Apache Hadoop, Qubole’s founders Ashish Thusoo and Joydeep Sen Sarma also managed the Facebook data infrastructure team that was responsible for nearly 25PB of compressed data. The data services built by this team are used across business and engineering teams who submit tens of thousands of jobs, queries and ad hoc analysis requests every day. Thusoo and Sen Sarma applied their experiences and learnings to create the industry’s next generation big data platform for the cloud. With Qubole, organizations can literally begin uncovering new insights from their structured and unstructured data sources within minutes.

“We believe a new approach is needed – one that hides the complexity associated with storing and managing data and instead provides a fast, easy path to analysis and insights for business analysts, data scientists and data engineers,” said Joydeep Sen Sarma, Co-Founder of Qubole. “We gained significant experience helping a web-scale company build and manage a complex Big Data platform. We don’t want our customers to worry about choosing a flavor of Hadoop, or spinning up clusters, or trying to optimize performance. Qubole will manage all of that so that users can focus on their data and their algorithms.”

Qubole Auto-Scaling Big Data Platform for the Cloud Benefits Include:

  • Fastest Path to Big Data Analytics –
    Qubole handles all infrastructure complexities behind the scenes so
    users can begin doing ad hoc analysis and creating data pipelines
    using SQL and MapReduce within minutes.
  • Scalability “On the Fly” – Qubole
    features the industry’s first auto-scaling Hadoop clusters so users
    can get the right amount of computing power for each and every project.
  • Fast Query Authoring Tools – Qubole
    provides fast access to sample data so that queries can be authored
    and validated quickly.
  • Fastest Hadoop and Hive Service in the Cloud
    – Using advanced caching and query acceleration techniques, Qubole has
    demonstrated query speeds up to five times faster than other
    Cloud-based Hadoop solutions.
  • Quick Connection to Data – Qubole
    provides mechanisms to work with data sets stored in any format in
    Amazon S3. It also allows users to easily export data to S3 or to
    databases like MySQL.
  • Integrated Data Workflow Engine – Qubole
    provides mechanisms to easily create data pipelines so users can run
    their queries periodically with a high degree of reliability.
  • Enhanced Debugging Abilities – Qubole
    provides features that helps users get to errors in Hadoop/Hive jobs
    fast, thus saving time in debugging queries.
  • Easy Collaboration with Peers – Qubole’s
    Cloud-based architecture makes it ideal for analysts working in a
    geographically distributed environment to share information and
    analysis.

“Companies are increasingly moving to the Cloud and for good reason. Applications hosted in the Cloud are much easier to use and manage, especially for companies without very large IT organizations. While Software as a Service (SaaS) is now the standard for many different types of applications, it has not yet been made easy for companies to use the Cloud to convert their ever-increasing volume and variety of data into useful business and product insights. Qubole makes it much easier and faster for companies to analyze and process more of their Big Data, and they will benefit tremendously,” said Ashish Thusoo, Co-Founder of Qubole.

To join the early access program, please visit www.qubole.com. Qubole is looking to add a select number of companies for early access to its service, with the intention of making the service more generally available in Q4 2012. People interested in seeing a demo of the platform can visit Qubole at the Hadoop Summit June 13 – 14 at the San Jose Convention Center, kiosk #B11.