Category Archives: Big Data

Amazon enhances AWS with new analytics tools

AWSOn the eve of its AWS re:Invent 2015 event internet giant Amazon is positioning itself for a run at the business intelligence market.

Already announced is the Amazon Elasticsearch Service, is a managed service designed to make it easier to deploy and operate Elasticsearch in the AWS cloud, on which more later.

In addition the WSJ is reporting the likely launch of a new analytics service, codenamed SpaceNeedle, which is set to augment AWS with business intelligence tools. The reported strategic aim of this new service is to both strengthen Amazon’s relationship with AWS customers and allow it to broaden its total available market.

Back to the Elasticsearch service, BCN spoke to Ian Massingham, UK Technical Evangelist at AWS, to find out what the thinking behind it is. “This service is intended for developers running applications that use Elasticsearch today, or developers that are considering incorporating Elasticsearch into future applications,” he said “Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and click stream analytics.”

Apparently Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, as well as search-as-you-type and did-you-mean suggestions, while The Guardian uses Elasticsearch to combine visitor logs with social network data to provide real-time feedback to its editors about the public’s response to new articles.

Expect more AWS news as the re:Invent event gets underway. Already Avere Systems has unveiled Avere CloudFusion, a file storage application for AWS, that aims to provides a cloud file system to leverage Amazon Elastic Compute Cloud (EC2) and Amazon Elastic Block Store (EBS) with the cost efficiencies of Amazon Simple Storage Service (S3), all with the simplicity of network-attached storage.

Semantic technology: is it the next big thing or just another buzzword?

Most buzzwords circulating right now describe very attention-grabbing products: virtual reality headsets, smart watches, internet-connected toasters. Big Data is the prime example of this: many firms are marketing themselves to be associated with this term and its technologies while it’s ‘of the moment’, but are they really innovating or simply adding some marketing hype to their existing technology? Just how ‘big’ is their Big Data?

On the surface of it one would expect semantic technology to face similar problems, however the underlying technology requires a much more subtle approach. The technology is at its best when it’s transparent, built into a set of tools to analyse, categorise and retrieve content and data before it’s even displayed to the end user. While this means it may not experience as much short term media buzz, it is profoundly changing the way we use the internet and interact with content and data.

This is much bigger than Big Data. But what is semantic technology? Broadly speaking, semantic technologies encode meaning into content and data to enable a computer system to possess human-like understanding and reasoning. There are a number of different approaches to semantic technology, but for the purposes of this article we’ll focus ‘Linked Data’. In general terms this means creating links between data points within documents and other forms of data containers, rather than the documents themselves. It is in many ways similar what Tim Berners-Lee did in creating the standards by which we link documents, just on a more granular scale.

Existing text analysis techniques can identify entities within documents. For example, in the sentence “Haruhiko Kuroda, governor of Bank of Japan, announced 0.1 percent growth,” ‘Haruhiko Kuroda’ and ‘Bank of Japan’ are both entities, and they are ‘tagged’ as such using specialised markup language. These tags are simply a way of highlighting that the text has some significance; it remains with the human user to understand what the tags mean.

 

1 taggingOnce tagged, entities can then be recognised and have information from various sources associated with them. Groundbreaking? Not really. It’s easy to tag content such that the system knows that “Haruhiko Kuroda” is a type of ‘person’, however this still requires human input.

2 named entity recognition

Where semantics gets more interesting is in the representation and analysis of the relationships between these entities. Using the same example, the system is able to create a formal, machine-readable relationship between Haruhiko Kuroda, his role as the governor, and the Bank of Japan.

3 relation extraction

In order for this to happen, the pre-existing environment must be defined. In order for the system to understand that ‘governor’ is a ‘job’ which exists within the entity of ‘Bank of Japan’, a rule must exist which states this as an abstraction. This is called an ontology.

Think of an ontology as the rule-book: it describes the world in which the source material exists. If semantic technology was used in the context of pharmaceuticals, the ontology would be full of information about classifications of diseases, disorders, body systems and their relationships to each other. If the same technology was used in the context of the football World Cup, the ontology would contain information about footballers, managers, teams and the relationships between those entities.

What happens when we put this all together? We can begin to infer relationships between entities in a system that have not been directly linked by human action.

4 inference

An example: a visitor arrives on the website of a newspaper and would like information about bank governors in Asia. Semantic technology allows the website to return a much more sophisticated set of results from the initial search query. Because the system has an understanding of the relationships defining bank governors generally (via the the ontology), it is able to leverage the entire database of published text content in a more sophisticated way, capturing relationships that would have been overlooked by computer analysis alone. The result is that the user is provided with content more closely aligned to what they are already reading.

Read the sentence and answer the question: “What is a ‘Haruhiko Kuroda’?” As a human the answer is obvious. He is several things: human, male, and a governor of the Bank of Japan. This is the type of analytical thought process, this ability to assign traits to entities and then use these traits to infer relationships between new entities, that has so far eluded computer systems. The technology allows the inference of relationships that are not specifically stated within the source material: because the system knows that Haruhiko Kuroda is governor of Bank of Japan, it is able to infer that he works with other employees of the Bank of Japan, that he lives in Tokyo, which is in Japan, which is a set of islands in the Pacific.

Companies such as the BBC, which Ontotext has worked with, are sitting on more text data than they have ever experienced before. This is hardly unique to the publishing industry, either. According to Eric Schmidt, former Google CEO and executive chairman of Alphabet, every two days we create as much information as was generated from the dawn of civilisation up until 2003 – and he said that in 2010. Five years later and businesses of all sizes are waking up to this fact – they must invest in the infrastructure to fully take advantage of their own data. You may not be aware of it, but you are already using semantic technology every day. Take Google search as an example: when you input a search term, for example ‘Bulgaria’, two columns appear. On the left are the actual search results, and on the right are semantic search results: information about the country’s flag, capital, currency and other information that is pulled from various sources based on semantic inference.

Written by Jarred McGinnis, UK managing consultant at Ontotext

Salesforce boosts its Analytics Cloud intelligence tool

Analytics1Salesforce has added new options for users of its Analytics Cloud intelligence tool. The new ‘Wave Actions’ flash up crucial information on dashboards so that salespeople can act more incisively as crucial information reaches them faster.

The new features allow companies to create customised Wave Actions, such as creating cases, updating accounts or assigning tasks. Since Wave is natively integrated with App Cloud, the Wave Actions are automatically pushed from Wave into the corresponding Salesforce record. The system instantly identifies the type of problem that sales managers need to know about as soon as possible, according to Salesforce. When an account suffers particularly bad customer attrition, for example, a sales manager will be alerted to this pattern more rapidly. This is achieved by customising the Wave Analytics App to alert managers about patterns on sales figures (such as defecting customers) and enables them to take action more rapidly.

A new Wave Visualizations feature aims to create a consistent user experience and create a more intuitive process. Salesforce has also revamped the Analytics Cloud’s user interface in a bid to encourage users to become more adventurous in their creation of reports and dashboards. This, according to the cloud software vendor, will bring Analytics Cloud in line with the Lightning Experience design that was rolled out first for Salesforce’s Sales Cloud.

New information has also been unveiled about the use of the Analytics Cloud within the portfolio of other vendor’s software offerings. According to Salesforce there are 81 companies in the Analytics Cloud’s partner ecosystem, with 13 software companies scheduled to unveil new apps based on Analytics Cloud, including Apttus, FinancialForce, SteelBrick and Vlocity.

In its most recent earnings statement, Salesforce revealed that subscription and support revenues from Analytics Cloud were ‘not significant’ for the three and six months ending on July 31, 2015.

The addition to Analytics Cloud comes exactly one year after it was first launched. According to Salesforce, the upgrade gives Analytics Cloud a wider, more active remit than its existing role as a standalone business intelligence application.

SAP announces improvements to cloud platform and Vora analytics software

SAP HANA VoraSAP has released new software that it claims will make analytics easier for users of open source Hadoop software.

The SAP HANA Vora is a new in-memory query engine that improves the performance of the Apache Spark execution framework. As a result, anyone running data analysis should be able to get better interactions with their data if it’s held on Hadoop and companies will benefit from more useful intelligence.

SAP claims this new software will overcome the general ‘lack of business process awareness’ that exists in companies across enterprise apps, analytics, big data and Internet of Things (IoT) sources. The software will make it easier for data scientists and developers to get access to the right information by simplifying the access to corporate and Hadoop data.

SAP HANA Vora will bring most benefit in industries where Big Data analytics in business process context is paramount. SAP identified financial services, telecommunications, healthcare and manufacturing as target markets. The savings created by the new software will come from a number of areas, it said. In the financial sector, the return on investment in the systems will come from mitigating risk and fraud by detecting new anomalies in financial transactions and customer history data.

Telecoms companies will benefit from optimising their bandwidth, SAP claims, as telcos use the software to analyse traffic patterns to avoid network bottlenecks and improve the quality of service. Manufacturers will benefit from preventive maintenance and improved product re-call processes as a result of SAL HANA Vora’s newly delivered powers of analysis of bills-of-material, services records and sensor data.

The use of Hadoop and SAP HANA to manage large unstructured data sets left room for improvement, according to user Aziz Safa, Intel IT Enterprise Applications and Application Strategy VP. “One of the key requirements is better analyses of big data,” said Safa, “but mining these large data sets for contextual information in Hadoop is a challenge.”

SAP HANA Vora will be released by the end of September, when a cloud-based developer edition will also be available. Here’s a SAP vid on the matter.

 

Alibaba launches what it claims to be China’s first cloud AI platform

Aliyun has launched what it claims to be China's first AI platform

Aliyun has launched what it claims to be China’s first AI platform

Alibaba’s cloud computing division Aliyun has launched what it claims to be China’s first artificial intelligence cloud service.

The DT PAI platform has been built with a series of purpose-built algorithms and machine learning technologies designed to help users generate predictive intelligence insights. Aliyun said the service features “drag and drop” capabilities that let users easily connect different services and set parameters.

The company claims the platform is China’s first commercially available artificial intelligence platform.

“Our goal is to create a one-stop AI development, publishing and sharing platform through data calculations and data connections, all with the aim of using AI to drive innovation in all aspects of life,” said Xiao Wei, senior product expert, Aliyun.

“In the past, the field of artificial intelligence was only open to a very small number of qualified developers and required the use of specialised tools. Such an approach was prone to error and redundancy. Hoewver, DT PAI allows developers with little or no experience in the field to construct a data application from scratch in a much shorter period of time. What used to take days can be completed within minutes,” Wei added.

The platform is based on Aliyuns recently update big data cloud infrastructure and its Open Data Processing Service (ODPS).

Alibaba seems to be following IBM’s lead on when it comes to AI. Big Blue has been using Bluemix as a drag-and-drop platform for Watson, IBM’s cognitive compute (AI) as a service, pitching it as a more accessible development and delivery platform for its big data services.

EY, Hortonworks ink big data deal

Ernst & Young and Hortonworks are partnering on Hadoop

Ernst & Young and Hortonworks are partnering on Hadoop

Global consulting giant Ernst & Young and Hortonworks have inked a deal that will see the two companies partner on helping joint clients overcome their big data challenges.

The deal will see Hortonworks extend its big data platform together with EY’s data and information management professional services. Specifically, the two companies plan to guide clients on how to leverage big data technologies like Hadoop to overcome key data storage and batch analysis challenges.

“Many leading organizations are drowning in data, yet they lack the ability to analyze and drive value from the vast amount of information at their disposal,” said Scott H. Schlesinger, principle, Ernst & Young LLP and EY Americas IT Advisory. “The alliance will enable EY and Hortonworks to assist organizations in driving value from their existing technology.”

Mitch Ferguson, vice president of business development at Hortonworks said: “This alliance will strengthen EY’s ability to implement enterprise-level big data software, including HDP, to turn data into an asset, further addressing the business and technology needs of organizations.”

Intel partners with OHSU in using cloud, big data to cure cancer

Intel is working with the OHSU to develop a secure, federate cloud service for healthcare practitioners treating cancer

Intel is working with the OHSU to develop a secure, federate cloud service for healthcare practitioners treating cancer

Intel is testing a cloud-based platform as a service in conjunction with the Oregon Health & Science University (OHSU) that can help diagnose and treat individuals for cancer based on their genetic pre-dispositions.

The organisations want to develop a cloud service that can be used by healthcare practitioners to soak up a range of data including genetic information, data about a patient’s environment and lifestyle to deliver tailored cancer treatment plans quickly to those in need.

“The Collaborative Cancer Cloud is a precision medicine analytics platform that allows institutions to securely share patient genomic, imaging and clinical data for potentially lifesaving discoveries. It will enable large amounts of data from sites all around the world to be analyzed in a distributed way, while preserving the privacy and security of that patient data at each site,” explained Eric Dishman director of proactive health research at Intel.

“The end goal is to empower researchers and doctors to help patients receive a diagnosis based on their genome and potentially arm clinicians with the data needed for a targeted treatment plan. By 2020, we envision this happening in 24 hours — All in One Day. The focus is to help cancer centres worldwide—and eventually centers for other diseases—securely share their private clinical and research data with one another to generate larger datasets to benefit research and inform the specific treatment of their individual patients.”

Initially, Intel and the Knight Cancer Institute at Oregon Health & Science University (OHSU) will launch the Collaborative Cancer Cloud, but the organisations expect two more institutions will be on board by 2016.

From there, Intel said, the organisations hope to federate the cloud service with other healthcare service providers, and open it up for use to treat other diseases like Alzheimer’s.

“In the same timeframe, we also intend to deliver open source code contributions to ensure the broadest developer base possible is working on delivering interoperable solutions. Open sourcing this code will drive both interoperability across different clouds, and allow analytics across a broader set of data – resulting in better insights for personalized care,” Dishman said.

Basho, Cisco integrate Riak KV and Apache Mesos to strengthen IoT automation

Basho and Cisco have integrated Riak and Mesos

Basho and Cisco have integrated Riak and Mesos

Cisco and Basho have successfully demoed the Riak key value store running on Apache Mesos, an open source technology that makes running diverse, complex distributed applications and workloads easier.

Basho helped create and commercialise the Riak NoSQL database and worked with Cisco to pair Mesos with Riak’s own automation and orchestration technology, which the companies said would help support next gen big data and internet of things (IoT) workloads.

“Enabling Riak KV with Mesos on Intercloud, we can seamlessly and efficiently manage the cloud resources required by a globally scalable NoSQL database, allowing us to provide the back-end for large-scale data processing, web, mobile and Internet-of-Things applications,” said Ken Owens, chief technology officer for Cisco Intercloud Services.

“We’re making it easier for customers to develop and deploy highly complex, distributed applications for big data and IoT. This integration will accelerate developers’ ability to create innovative new cloud services for the Intercloud.”

Apache Mesos provides resource scheduling for workloads spread across distributed – and critically, heterogeneous – environments, which is why it’s emerging as a fairly important tool for IoT developers.

So far Cisco and Basho have only integrated Basho’s commercial Riak offering, Riak KV, with Mesos, but Basho is developing an open source integration with Mesos that will also be commercialized around a supported enterprise offering.

“By adding the distributed scheduler from Mesos, we’re effectively taking the infrastructure component away from the equation,” Adam Wray, Basho’s chief executive officer told BCN. “Now you don’t have to worry about the availability of servers – you literally have an on-demand model with Mesos, so people can scale up and down based on the workloads for any number of datacentres.”

“This is what true integration of a distributed data tier with a distributed infrastructure tier looks like, being applied at an enterprise scale.”

Wray added that while the current deal with Cisco isn’t a reselling agree we can expect Basho to be talking about large OEM deals in the future, especially as IoT picks up.

Google, Microsoft punt big data integration services into GA

Big cloud incumbents are doubling down on data integration

Big cloud incumbents are doubling down on data integration

Google and Microsoft have both announced the general release of Cloud Dataflow and Azure Data Factory, their respective cloud-based data integration services.

Google’s Cloud Dataflow is designed to integrate separate databases and data systems – both streaming and batch – in one programming model while giving apps full access to, and the ability to customise, that data; it is essentially a way to reduce operational overhead when doing big data analysis in the cloud.

Microsoft’s Azure Data Factory is a slightly different offering. It’s a data integration and automation service that regulates the data pipelines connecting a range of databases and data systems with applications. The pipelines can be scheduled to ingest, prep, transform, analyse and publish that data – with ADF automating and orchestration more complex transactions.

ADF is actually one of the core components of Microsoft’s Cortana analytics offering, and is deployed to automate the movement and transformation of data from disparate sources.

The maturation and commoditisation of data integration and automation is a positive sign for an industry that has for a very long while leaned heavily on expensive bespoke data integration. As more cloud incumbents bring their own integration offerings to the table it will be interesting to see how some of the bigger players in data integration and automation, like Informatica or Teradata, respond.

Edge Up Sports taps IBM Watson to give fantasy football a cognitive computing boost

Edge Up is using Watson to improve fantasy football decision-making

Edge Up is using Watson to improve fantasy football decision-making

Fantasy football analytics provider Edge Up Sports is partnering with IBM to deploy a Watson-based service that helps users manage the performance of their teams.

Edge Up, which will launch alongside the upcoming NFL season, bills itself as a one-stop shop of insights for users to supplement their current fantasy platforms, providing analysis of additional information like NFL players’ Twitter activity and coach statistics.

The company has enlisted IBM’s Watson-as-a-Service to bring some of the platform’s cognitive capabilities to bear on some of the more nuanced elements involved in how a team performs, like the emotional preparedness of a team, or how well players sustain hits on the field.

“Edge Up grabs vast amounts of available NFL data, and with the help of Watson, team general managers are able to make informed decisions and adjustments to their fantasy football roster picks,” said Edge Up Sports chief executive Illya Tabakh.

“By leveraging Watson technologies, we’re excited to be able to transform the way fantasy football is played, and provide a platform that is assisting team owners with the necessary analysis and insights that could increase their chances in winning their league.”

The companies said combining more data points and automating the analysis of how teams performance will help reduce the amount of time users need to spend on fantasy football decision-making.

“The purpose of opening up IBM Watson capabilities to our Ecosystem Partners via an open developer platform is to accelerate creativity and entrepreneurial spirit, and Edge Up Sports is a perfect example,” said Lauri Saft, vice president, IBM Watson.