Category Archives: Real-time computing

Real-Time Processing Solutions for Big Data Application Stacks – Integration of GigaSpaces XAP, Cassandra DB

Guest post by Yaron Parasol, Director of Product Management, GigaSpaces

GigaSpaces Technologies has developed infrastructure solutions for more than a decade and in recent years has been enabling Big Data solutions as well. The company’s latest platform release – XAP 9.5 – helps organizations that need to process Big Data fast. XAP harnesses the power of in-memory computing to enable enterprise applications to function better, whether in terms of speed, reliability, scalability or other business-critical requirements. With the new version of XAP, increased focus has been placed on real-time processing of big data streams, through improved data grid performance, better manageability and end-user visibility, and integration with other parts of your Big Data stack – in this version, integration with Cassandra.

XAP-Cassandra Integration

To build a real-time Big Data application, you need to consider several factors.

First– Can you process your Big Data in actual real-time, in order to get instant, relevant business insights? Batch processing can take too long for transactional data. This doesn’t mean that you don’t still rely on your batch processing in many ways…

Second – Can you preprocess and transform your data as it flows into the system, so that the relevant data is made digestible and routed to your batch processor, making batch more efficient as well. Finally, you also want to make sure the huge amounts of data you send to long-term storage are available for both batch processing and ad hoc querying, as needed.

XAP and Cassandra DB together can easily enable all the above to happen. With built-in event processing capabilities, full data consistency, and high-speed in-memory data access and local caching – XAP handles the real-time aspect with ease. Whereas, Cassandra is perfect for storing massive volumes of data, querying them ad hoc, and processing them offline.

Several hurdles had to be overcome to make the integration truly seamless and easy for end users – including XAP’s document-oriented model vs. Cassandra’s columnar data model, XAP’s immediate consistency (data must be able to move between models smoothly), XAP offers immediate consistency with performance, while Cassandra trades off between performance and consistency (with Cassandra as the Big Data store behind XAP processing, both consistency and performance are maintained).

Together with the Cassandra integration, XAP offers further enhancements. These include:

Data Grid Enhancements

To further optimize your queries over the data grid XAP now includes compound indices, which enable you to index multiple attributes. This way the grid scans one index instead of multiple indices to get query result candidates faster.
On the query side, new projections support enables you to query only for the attributes you’re interested in instead of whole objects/documents. All of these optimizations dramatically reduce latency and increase the throughput of the data grid in common scenarios.

The enhanced change API includes the ability to change multiple objects using a SQL query or POJO template. Replication of change operations over the WAN has also been streamlined, and it now replicates only the change commands instead of whole objects. Finally, a hook in the Space Data Persister interface enables you to optimize your DB SQL statements or ORM configuration for partial updates.

Visibility and Manageability Enhancements

A new web UI gives XAP users deep visibility into important aspects of the data grid, including event containers, client-side caches, and multi-site replication gateways.

Managing a low latency, high throughput, distributed application is always a challenge due to the amount of moving parts. The new enhanced UI helps users to maintain agility when managing their application.

The result is a powerful platform that offers the best of all worlds, while maintaining ease of use and simplicity.

Yaron Parasol is Director of Product Management for GigaSpaces, a provider of end-to-end scaling solutions for distributed, mission-critical application environments, and cloud enabling technologies.