Category Archives: HBase

Google reveals Bigtable, a NoSQL service based on what it uses internally

Google has punted another big data service, a variant of what it uses internally, into the wild

Google has punted another big data service, a variant of what it uses internally, into the wild

Search giant Google announced Bigtable, a fully managed NoSQL database service the company said combines its own internal database technology with open source Apache HBase APIs.

The company that helped give birth to MapReduce and its sister Hadoop is now making available the same non-relational database tech driving a number of its services including Google Search, Gmail, and Google Analytics.

Google said Bigtable is powered by BigQuery underneath, and is extensible through the HBase API (which provides real-time read / write access capabilities).

“Google Cloud Bigtable excels at large ingestion, analytics, and data-heavy serving workloads. It’s ideal for enterprises and data-driven organizations that need to handle huge volumes of data, including businesses in the financial services, AdTech, energy, biomedical, and telecommunications industries,” explained Cory O’Connor, product manager at Google.

O’Connor said the service, which is now in beta, can deliver over two times the performance of its direct competition (which will likely depend on the use case), and has a TCO of less than half that of its direct competitors.

“As businesses become increasingly data-centric, and with the coming age of the Internet of Things, enterprises and data-driven organizations must become adept at efficiently deriving insights from their data. In this environment, any time spent building and managing infrastructure rather than working on applications is a lost opportunity.”

Bigtable is Google’s latest move to bolster its data services, a central pillar of its strategy to attract new customers to its growing platform. Last month the company announced the beta launch of Google Cloud Dataflow, a Java-based service that lets users build, deploy and run data processing pipelines for other applications like ETL, analytics, real-time computation, and process orchestration, while abstracting away all the other infrastructure bits like cluster management.

NextBio, Intel Collaborate to Optimize Hadoop for Genomics Big Data

Image representing nextbio as depicted in Crun...

NextBio and Intel announced today a collaboration aimed at optimizing and stabilizing the Hadoop stack and advancing the use of Big Data technologies in genomics. As a part of this collaboration, the NextBio and Intel engineering teams will apply experience they have gained from NextBio’s use of Big Data technologies to the improvement of HDFS, Hadoop, and HBase. Any enhancements that NextBio engineers make to the Hadoop stack will be contributed to the open-source community. Intel will also showcase NextBio’s use of Big Data.

“NextBio is positioned at the intersection of Genomics and Big Data. Every day we deal with the three V’s (volume, variety, and velocity) associated with Big Data – We, our collaborators, and our users are adding large volumes of a variety of molecular data to NextBio at an increasing velocity,” said Dr. Satnam Alag, chief technology officer and vice president of engineering at NextBio. “Without the implementation of our algorithms in the MapReduce framework, operational expertise in HDFS, Hadoop, and HBase, and investments in building our secure cloud-based infrastructure, it would have been impossible for us to scale cost-effectively to handle this large-scale data.”

“Intel is firmly committed to the wide adoption and use of Big Data technologies such as HDFS, Hadoop, and HBase across all industries that need to analyze large amounts of data,” said Girish Juneja, CTO and General Manager, Big Data Software and Services, Intel. “Complex data requiring compute-intensive analysis needs not only Big Data open source, but a combination of hardware and software management optimizations to help deliver needed scale with a high return on investment. Intel is working closely with NextBio to deliver this showcase reference to the Big Data community and life science industry.”

“The use of Big Data technologies at NextBio enables researchers and clinicians to mine billions of data points in real-time to discover new biomarkers, clinically assess targets and drug profiles, optimally design clinical trials, and interpret patient molecular data,” Dr. Alag continued. “NextBio has invested significantly in the use of Big Data technologies to handle the tsunami of genomic data being generated and its expected exponential growth. As we further scale our infrastructure to handle this growing data resource, we are excited to work with Intel to make the Hadoop stack better and give back to the open-source community.”