Category Archives: BigData

New Updates For HP’s Big Data Platform Haven

HP has updated its big data platform Haven to include new analytics and predictive capabilities. This platform is geared towards enterprises with lots of data of various types, and the new update expands the type of data that can be analyzed through a new connector framework. A new Knowledge Graphing feature will be implemented along with better speech recognition and language identification features.


The Haven big data platform is made up of analytics, hardware and services with some of this available on-demand. HP’s big data platform was begun in 2013 with Haven being the umbrella for various technologies. The update brings together analytics for structured and unstructured data by combining context-aware unstructured data service analytics of HP IDOL with SQL-based capabilities of HP Vertica.




Examples of this type of service include Microsoft Exchange, SharePoint, Oracle, and SAP enterprise applications and cloud services such as Box, Salesforce and Google Drive.


The knowledge-graphing feature mentioned above could analyze connections in data, enabling advanced and contextually aware research within assorted data sources. The enhanced speech and language capabilities of the update are able to work with 20 languages. This part of Haven is powered by advanced deep neural technology and stems from thousands of hours of audio sampling via this neural network.


Other enhancements include targeted query response and IDOL search optimizer. The targeted query response helps customize and improve search results based on specific criteria. The IDOL search optimizer is used for understanding the types of searches being done by users and then gauging the quality of results.


The goal of HP’s Haven platform is to not have big companies relying on specialized data scientists or costly, complex integration projects in order to benefit from big data computing across almost any data type.

The post New Updates For HP’s Big Data Platform Haven appeared first on Cloud News Daily.

FOSE 2013: Cloud, Virtualization, Cybersecurity, Mobile Government, Big Data Featured

Cloud and Virtualization; Cybersecurity; Mobile Government; Big Data and Business Intelligence; and Project Management will be the featured tracks at FOSE 2013, each providing cutting-edge technology insights, policy updates, case studies and expert guidance to optimize the efficiency and effectiveness of government programs. FOSE 2013, the largest and most comprehensive event serving the government technology community, will take place May 14-16 at the Walter E. Washington Convention Center in Washington, D.C.

“Late last year we surveyed our government and industry attendees to gauge the topics that are of most interest,” said Mike Eason, Vice President, Public Sector Events, 1105 Media, Inc. “Not surprisingly, cloud, mobile, big data/analytics and cyber came in at the top. It’s our job to ensure we are offering the education that supports the government’s needs around these issues. We are once again structuring our program to highlight these key trends, and will be drawing on the expertise of agency executives that have real past performance in the five areas to serve as speakers.”

Each track provides an in-depth look into the given topic, including:

  • Cloud and Virtualization will feature best practices and insights on technology trends, case studies and leading practices on planning, implementation and benefits realization.
  • Cybersecurity will examine the business of cyber, including detecting complicated malware and adversaries – insider and outsider, determining what data left the organization, developing defensive and preemptive measures to keep attacks from happening and managing risk-based compliance.
  • Mobile Government will offer tools, strategies and insights into hot issues such as BYOD, security, APIs and mobilizing enterprise systems, as well as achieving the goals of the Digital Government Strategy.
  • Big Data and Business Intelligence will focus on how to extract meaning from bits and bytes to reach business objectives, featuring case studies from federal agencies that have found useful intelligence from data, examine toolkits being used and highlight the management and policy challenges that come up in the process.
  • Project Management, developed in conjunction with the Project Management Institute, will provide best practices and trade secrets of agile project management to help the government professional advance their career.

A selection of confirmed session topics includes:

For more information and to keep up-to-date on the full program agenda, visit To see how FOSE addresses the technology road ahead, view the FOSE 2013 infographic at

What’s the BIG Deal? DATA On the Origin of the Term

NY Times BITS (Steve Lohr) today: An interesting “detective story” seeking the coiner of the phrase “Big Data”.

The unruly digital data of the Web is a big ingredient in what is now being called “Big Data.” And as it turns out, the term Big Data seems to be most accurately traced not to references in news or journal archives, but to digital artifacts now posted on technical Web sites, appropriately enough.

Big Data Future Spurs Acquisitions

In October 2011, Oracle announced its acquisition of Endeca Technologies, an enterprise search and data management company providing enterprises with non-structured data management, e-commerce and business intelligence technology.

In November 2011, IBM announced its acquisition of Platform Computing, an HPC software company with excellent performance in cloud computing and big data.

In February 2012, Groupon acquired Adku, a startup that uses big data to personalize online shopping experience for people visiting e-commerce sites like eBay and Amazon.

In March 2012, EMC announced its acquisition of Pivotal Labs, a private agile software developer and tool provider headquartered in San Francisco.

In the past two years, international IT giants, including IBM, Oracle, EMC and SAP, have been engaged in an upsurge of acquisition in the big data market, spending more than $1.5 billion in acquiring related data management and analysis companies. Big data becomes a new hot term after “cloud computing” in the IT and financial sectors.

The upsurge of big data results from the integrated development of the new-generation information technology, and the processing and analysis of big data in turn becomes a key support for the said integrated development.

The Internet of Things (IoT), mobile Internet, digital home and social network services are the applications of the new-generation information technology. Big data is continuously increasing together with these applications, whereas cloud computing provides the storage and computing platform for massive and diversified big data. It is estimated that the global data storage volume was 1.8ZB in 2011, and it will hit 2.7ZB in 2012 and exceed 8ZB in 2015. The growth rate of structured data is around 32%, and that of non-structured data 63%.

In the retail sector, analysis on big data enables retailers to master the real-time market trends and promptly take corresponding measures. Walmart has started analyzing the massive sales data of all its chain stores in combination with weather data, economics and demography, so as to select proper products for each chain store and determine the timing of discounts.

In the Internet sector, analysis on big data helps manufacturers develop more precise and effective marketing strategies. Facebook and eBay are analyzing and exploring massive data from social networks and online transaction data, with an aim of providing personalized advertising services.

In the utility sector, big data have begun to play a significant role. Many European cities guide drivers to select the best routes by analyzing real-time traffic flow data, thereby improving traffic conditions. The United Nations also launched “Global Pulse”, a program aiming to accelerate global economic development with big data.

The enormous commercial value of and market demand for big data are driving transformation of the information industry. New big data-oriented products, technologies, services and models are constantly emerging.

On one hand, the challenges such as effective storage, fast read-write and real-time analysis will have significant impacts on the chip and storage industry as well as incubate the integrated data storage & processing server and memory computing markets.

On the other hand, the enormous value of big data will lead to urgent needs for fast data processing and analysis as well as give rise to the unprecedented prosperity of data exploration and business intelligence markets.

Google’s Dremel is the Holy Grail of Big Data: Really Big, Really Fast, Really Simple

First Google created, and wrote papers on, Hadoop and MapReduce, which got reverse-engineered into the current state of the art for Big Data.

But Google has moved on to Dremel, and the rest of the world is slow in catching up.

With BigQuery Google offers a simple-to-user service that doesn’t sacrifice Big Data scale OR speed.

As  Armando Fox, a professor of computer science at the University of California, Berkeley who specializes in these sorts of data-center-sized software platforms. put it in a Wired article:

“This is unprecedented. Hadoop is the centerpiece of the “Big Data” movement, a widespread effort to build tools that can analyze extremely large amounts of information. But with today’s Big Data tools, there’s often a drawback. You can’t quite analyze the data with the speed and precision you expect from traditional data analysis or “business intelligence” tools. But with Dremel, Fox says, you can.

“They managed to combine large-scale analytics with the ability to really drill down into the data, and they’ve done it in a way that I wouldn’t have thought was possible,” he says. “The size of the data and the speed with which you can comfortably explore the data is really impressive. People have done Big Data systems before, but before Dremel, no one had really done a system that was that big and that fast.

“Usually, you have to do one or the other. The more you do one, the more you have to give up on the other. But with Dremel, they did both.”

Qubole Exits Stealth Mode, Introduces Auto-Scaling Big Data Platform

Image representing Hadoop as depicted in Crunc...

Qubole exited stealth mode today to introduce its auto-scaling Big Data platform, “combining the power of Apache Hadoop and Hive with the simplicity of a Cloud platform in order to accelerate time-to-value from Big Data.” Qubole, a Silver Sponsor of next week’s Hadoop Summit 2012 conference, also invites business analysts, data scientists, and data engineers to participate in the Qubole early access program.

While most well known as creators of Apache Hive and long-time contributors to Apache Hadoop, Qubole’s founders Ashish Thusoo and Joydeep Sen Sarma also managed the Facebook data infrastructure team that was responsible for nearly 25PB of compressed data. The data services built by this team are used across business and engineering teams who submit tens of thousands of jobs, queries and ad hoc analysis requests every day. Thusoo and Sen Sarma applied their experiences and learnings to create the industry’s next generation big data platform for the cloud. With Qubole, organizations can literally begin uncovering new insights from their structured and unstructured data sources within minutes.

“We believe a new approach is needed – one that hides the complexity associated with storing and managing data and instead provides a fast, easy path to analysis and insights for business analysts, data scientists and data engineers,” said Joydeep Sen Sarma, Co-Founder of Qubole. “We gained significant experience helping a web-scale company build and manage a complex Big Data platform. We don’t want our customers to worry about choosing a flavor of Hadoop, or spinning up clusters, or trying to optimize performance. Qubole will manage all of that so that users can focus on their data and their algorithms.”

Qubole Auto-Scaling Big Data Platform for the Cloud Benefits Include:

  • Fastest Path to Big Data Analytics –
    Qubole handles all infrastructure complexities behind the scenes so
    users can begin doing ad hoc analysis and creating data pipelines
    using SQL and MapReduce within minutes.
  • Scalability “On the Fly” – Qubole
    features the industry’s first auto-scaling Hadoop clusters so users
    can get the right amount of computing power for each and every project.
  • Fast Query Authoring Tools – Qubole
    provides fast access to sample data so that queries can be authored
    and validated quickly.
  • Fastest Hadoop and Hive Service in the Cloud
    – Using advanced caching and query acceleration techniques, Qubole has
    demonstrated query speeds up to five times faster than other
    Cloud-based Hadoop solutions.
  • Quick Connection to Data – Qubole
    provides mechanisms to work with data sets stored in any format in
    Amazon S3. It also allows users to easily export data to S3 or to
    databases like MySQL.
  • Integrated Data Workflow Engine – Qubole
    provides mechanisms to easily create data pipelines so users can run
    their queries periodically with a high degree of reliability.
  • Enhanced Debugging Abilities – Qubole
    provides features that helps users get to errors in Hadoop/Hive jobs
    fast, thus saving time in debugging queries.
  • Easy Collaboration with Peers – Qubole’s
    Cloud-based architecture makes it ideal for analysts working in a
    geographically distributed environment to share information and

“Companies are increasingly moving to the Cloud and for good reason. Applications hosted in the Cloud are much easier to use and manage, especially for companies without very large IT organizations. While Software as a Service (SaaS) is now the standard for many different types of applications, it has not yet been made easy for companies to use the Cloud to convert their ever-increasing volume and variety of data into useful business and product insights. Qubole makes it much easier and faster for companies to analyze and process more of their Big Data, and they will benefit tremendously,” said Ashish Thusoo, Co-Founder of Qubole.

To join the early access program, please visit Qubole is looking to add a select number of companies for early access to its service, with the intention of making the service more generally available in Q4 2012. People interested in seeing a demo of the platform can visit Qubole at the Hadoop Summit June 13 – 14 at the San Jose Convention Center, kiosk #B11.

Lucid Imagination Combines Search, Analytics and Big Data to Tackle the Problem of Dark Data

Image representing Lucid Imagination as depict...

Organizations today have little to no idea how much lost opportunity is hidden in the vast amounts of data they’ve collected and stored.  They have entered the age of total data overload driven by the sheer amount of unstructured information, also called “dark” data, which is contained in their stored audio files, text messages, e-mail repositories, log files, transaction applications, and various other content stores.  And this dark data is continuing to grow, far outpacing the ability of the organization to track, manage and make sense of it.

Lucid Imagination, a developer of search, discovery and analytics software based on Apache Lucene and Apache Solr technology, today unveiled LucidWorks Big Data. LucidWorks Big Data is the industry’s first fully integrated development stack that combines the power of multiple open source projects including Hadoop, Mahout, R and Lucene/Solr to provide search, machine learning, recommendation engines and analytics for structured and unstructured content in one complete solution available in the cloud.

With LucidWorks Big Data, Lucid Imagination equips technologists and business users with the ability to initially pilot Big Data projects utilizing technologies such as Apache Lucene/Solr, Mahout and Hadoop, in a cloud sandbox. Once satisfied, the project can remain in the cloud, be moved on premise or executed within a hybrid configuration.  This means they can avoid the staggering overhead costs and long lead times associated with infrastructure and application development lifecycles prior to placing their Big Data solution into production.

The product is now available in beta. To sign up for inclusion in the beta program, visit

How big is the problem of dark data? The total amount of digital data in the world will reach 2.7 zettabytes in 2012, a 48 percent increase from 2011.* 90 percent of this data will be unstructured or “dark” data. Worldwide, 7.5 quintillion bytes of data, enough to fill over 100,000 Libraries of Congress get generated every day. Conversely, that deep volume of data can serve to help predict the weather, uncover consumer buying patterns or even ease traffic problems – if discovered and analyzed proactively.

“We see a strong opportunity for search to play a key role in the future of data management and analytics,” said Matthew Aslett, research manager, data management and analytics, 451 Research. “Lucid’s Big Data offering, and its combination of large-scale data storage in Hadoop with Lucene/Solr-based indexing and machine-learning capabilities, provides a platform for developing new applications to tackle emerging data management challenges.”

Data analytics has traditionally been the domain of business intelligence technologies. Most of these tools, however, have been designed to handle structured data such as SQL, and cannot easily tap into the broad range of data types that can be used in a Big Data application. With the announcement of LucidWorks Big Data, organizations will be able to utilize a single platform for their Big Data search, discovery and analytics needs. LucidWorks Big Data is the only complete platform that:

  • Combines the real time, ad hoc data accessibility of LucidWorks (Lucene/Solr) with compute and storage capabilities of Hadoop
  • Delivers commonly used analytic capabilities along with Mahout’s proven, scalable machine learning algorithms for deeper insight into both content and users
  • Tackles data, both big and small with ease, seamlessly scaling while minimizing the impact of provisioning Hadoop, LucidWorks and other components
  • Supplies a single, coherent, secure and well documented REST API for both application integration and administration
  • Offers fault tolerance with data safety baked in
  • Provides choice and flexibility, via on premise, cloud hosted or hybrid deployment solutions
  • Is tested, integrated and fully supported by the world’s leading experts in open source search
  • Includes powerful tools for configuration, deployment, content acquisition, security, and search experience that is packaged in a convenient, well-organized application

Lucid Imagination’s Open Search Platform uncovers real-time insights from any enterprise data, whether structured in databases, unstructured in formats such as emails or social channels, or semi-structured from sources such as websites.  The company’s rich portfolio of enterprise-grade solutions is based on the same proven open source Apache Lucene/Solr technology that powers many of the world’s largest e-commerce sites. Lucid Imagination’s on-premise and cloud platforms are quicker to deploy, cost less than competing products and are more easily tailored to specific needs than business intelligence solutions because they leverage innovation from the open source community.

“We’re allowing a broad set of enterprises to test and implement data discovery and analysis projects that have historically been the province of large multinationals with large data centers. Cloud computing and LucidWorks Big Data finally level the field,” said Paul Doscher, CEO of Lucid Imagination. “Large companies, meanwhile, can use our Big Data stack to reduce the time and cost associated with evaluating and ultimately implementing big data search, discovery and analysis. It’s their data – now they can actually benefit from it.”