Category Archives: Apache

Ververica expands advanced stream processing tech with ‘Powered By Ververica’ program

Ververica, the creator of Apache Flink, has launched its ‘Powered By Ververica’ program, aimed at expanding the reach of advanced stream processing technology by providing access to the cutting-edge VERA engine to leading partners. StreamNative, the cloud-native data streaming company founded by the original creators of Apache Pulsar, joins as the first partner in this… Read more »

The post Ververica expands advanced stream processing tech with ‘Powered By Ververica’ program appeared first on Cloud Computing News.

mod_nss: Instalación y configuración

En algunas algunas distribuciones nos podemos encontrar que mod_ssl no soporta TLS 1.2, pero en cambio sí podremos instalar mod_nss que sí lo soporta. Vamos a ver como usar mod_nss

Primero deberemos cargarlo y definir algunas variables globales, por ejemplo en /etc/httpd/conf.d/nss.conf:

LoadModule nss_module modules/libmodnss.so

AddType application/x-x509-ca-cert .crt
AddType application/x-pkcs7-crl    .crl

NSSPassPhraseDialog  builtin

NSSPassPhraseHelper /usr/libexec/nss_pcache

NSSSessionCacheSize 10000
NSSSessionCacheTimeout 100
NSSSession3CacheTimeout 86400


NSSRandomSeed startup builtin

NSSRenegotiation off

NSSRequireSafeNegotiation off

NSSCipherSuite +rsa_rc4_128_md5,+rsa_rc4_128_sha,+rsa_3des_sha,-rsa_des_sha,-rsa_rc4_40_md5,-rsa_rc2_40_md5,-rsa_null_md5,-rsa_null_sha,+fips_3des_sha,-fips_des_sha,-fortezza,-fortezza_rc4_128_sha,-fortezza_null,-rsa_des_56_sha,-rsa_rc4_56_sha,+rsa_aes_128_sha,+rsa_aes_256_sha

NSSProtocol TLSv1.0,TLSv1.1,TLSv1.2

A continuación mediante certutil crearemos la base de datos que contendrá los certificados:

echo "ejemplopassword" > /etc/httpd/alias/pwdfile.txt
echo "internal:ejemplopassword" > /etc/httpd/alias/pin.txt
certutil -N -d /etc/httpd/alias -f /etc/httpd/alias/pwdfile.txt

Para el caso de CentOS, al instalar mod_nss se generará una base de datos con certificados autofirmados de ejemplo en /etc/httpd/alias

Mediante certutil deberemos generar la clave privada y el CSR que necesitamos para firmar el certificado, por ejemplo:

# certutil -R -s 'CN=systemadmin.es, O=systemadmin, OU=modnss, L=Barcelona, ST=Barcelona, C=RC' -o /etc/httpd/ssl/systemadmin.csr -a -g 2048 -d /etc/httpd/alias -f /etc/httpd/alias/pwdfile.txt

Podemos ver la clave privada generada mediante certutil -K:

# certutil -K -d /etc/httpd/alias/
certutil: Checking token "NSS Certificate DB" in slot "NSS User Private Key and Certificate Services"
< 0> rsa      37d35426e3a54d45c360be5727cc0f93be4dbeb4   NSS Certificate DB:alpha
< 1> rsa      c2fb4ee7ebeedc5a8f0c0cb8d6d2d51581b9ef57   NSS Certificate DB:cacert
< 2> rsa      6c18f8803eb18ad6ad1930c3b4650eb3e8dc5b72   NSS Certificate DB:Server-Cert
< 3> rsa      67c0de3a88a738ffaaf3508d370b528b7976ab0e   NSS Certificate DB:sudosueu
< 4> rsa      7b7276980ef037e4b6b37652a95e16376ea95e29   SelfSignedSP
< 5> rsa      da7524dee9662362db91ff0b95e77c078e2c4ed5   (orphan)

Una vez la entidad certificadora nos devuelva el certificado firmado, deberemos importar primero el certificado intermedio, si existe. Por ejemplo, para importar el certificado presente en /etc/httpd/ssl/systemadmin_intermediate.crt a la clave GeoTrustGlobalCA haríamos:

# certutil -A -n 'geotrust' -t 'CT,,' -d /etc/httpd/alias -f /etc/httpd/alias/pwdfile.txt -a -i /etc/httpd/ssl/systemadmin_intermediate.crt

Finalmente, importaremos el certificado firmado mediante el siguiente comando. En este caso suponemos que el certificado esta en /etc/httpd/ssl/systemadmin_cert.crt y lo queremos importar con la clave systemadmin:

# certutil -A -n 'systemadmin' -t 'P,,' -d /etc/httpd/alias -f /etc/httpd/alias/pwdfile.txt -a -i /etc/httpd/ssl/systemadmin_cert.crt

Podemos verificar la cadena mediante certutil -O:

# certutil -O -n systemadmin -d .
"GeoTrustGlobalCA" [CN=GeoTrust DV SSL CA - G3,OU=Domain Validated SSL,O=GeoTrust Inc.,C=US]

  "systemadmin" [CN=www.systemadmin.es]

Si volvemos a listar las claves privadas veremos que ya no se encuentra huérfana:

# certutil -K -d .
certutil: Checking token "NSS Certificate DB" in slot "NSS User Private Key and Certificate Services"
< 0> rsa      37d35426e3a54d45c360be5727cc0f93be4dbeb4   NSS Certificate DB:alpha
< 1> rsa      c2fb4ee7ebeedc5a8f0c0cb8d6d2d51581b9ef57   NSS Certificate DB:cacert
< 2> rsa      6c18f8803eb18ad6ad1930c3b4650eb3e8dc5b72   NSS Certificate DB:Server-Cert
< 3> rsa      67c0de3a88a738ffaaf3508d370b528b7976ab0e   NSS Certificate DB:sudosueu
< 4> rsa      7b7276980ef037e4b6b37652a95e16376ea95e29   SelfSignedSP
< 5> rsa      da7524dee9662362db91ff0b95e77c078e2c4ed5   systemadmin

Para habilitar el virtualhost SSL con mod_nss, deberemos añadir las siguientes opciones:

<VirtualHost *:443>
(...)
  NSSEngine on

  NSSCipherSuite +rsa_rc4_128_md5,+rsa_rc4_128_sha,+rsa_3des_sha,-rsa_des_sha,-rsa_rc4_40_md5,-rsa_rc2_40_md5,-rsa_null_md5,-rsa_null_sha,+fips_3des_sha,-fips_des_sha,-fortezza,-fortezza_rc4_128_sha,-fortezza_null,-rsa_des_56_sha,-rsa_rc4_56_sha,+rsa_aes_128_sha,+rsa_aes_256_sha

  NSSProtocol TLSv1.0,TLSv1.1,TLSv1.2

  NSSNickname systemadmin

  NSSCertificateDatabase /etc/httpd/alias
(...)
</VirtualHost>

Simplemente deberemos indicar la clave del certificado a usar mediante NSSNickname, en el caso de ejemplo sería systemadmin.

Tags:

Microsoft selects Ubuntu for first Linux-based Azure offering

AzureMicrosoft has announced plans to simplify Big Data and widen its use through Azure.

In a blog post, T K Rengarajan, Microsoft’s corporate VP for Data Platforms, described how the expanded Microsoft Azure Data Lake Store, available in preview later this year, will provide a single repository that captures data of any size, type and speed without forcing changes to applications as data scales. In the store, data can be securely shared for collaboration and is accessible for processing and analytics from HDFS applications and tools.

Another new addition is Azure Data Lake Analytics, a service built on Apache YARN that dynamically scales, which Microsoft says will stop people being side tracked from work by needing to know about distributed architecture. This service, available in preview later this year, will include U-SQL, a language that unifies the benefits of SQL with the expressive power of user code. U-SQL’s scalable distributed querying is intended to help users analyse data in the store and across SQL Servers in Azure, Azure SQL Database and Azure SQL Data Warehouse.

Meanwhile, Microsoft has selected Ubuntu for its first Linux-based Azure offering. The Hadoop-based big data service offering, HDInsight, will run on Canonical’s open source browser Ubuntu.

Azure HDInsight uses a range of open source analytics engines including Hive, Spark, HBase and Storm. Microsoft says it is now on general release with a 99.9 per cent uptime service level agreement.

Meanwhile Azure Data Lake Tools for Visual Studio will provide an integrated development environment that aims to ‘dramatically’ simplify authoring, debugging and optimization for processing and analytics at any scale, according to Rengarajan. “Leading Hadoop applications that span security, governance, data preparation and analytics can be easily deployed from the Azure Marketplace on top of Azure Data Lake,” said Rengarajan.

Azure Data Lake removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics, said Rengarajan.

Apache Spark reportedly outgrowing Hadoop as users move to cloud

cloud competition trophyApache Spark is breaking down the barriers between data scientists and engineers, making machine learning easier and is out growing Hadoop as an open source framework for cloud computing developments, a new report claims.

The 2015 Spark User Survey was conducted by Databricks, the company founded by the creators of Apache Spark.

Spark adoption is growing quickly because users are finding it easy to use, reliably fast, and aligned for future growth in analytics, the report claims, with 91 per cent of the survey citing performance as their reason for adoption. Other reasons given were ease of programming (77 per cent), easy deployment (71 per cent) advanced analytics (64 per cent) and the capacity for real time streaming (52 per cent).

The report, based on the findings of a survey of 1,400 respondents Spark stakeholders, claims that the number of Spark users with no Hadoop components doubled between 2014 and 2015. The study set out to identify how the data analytics and processing engine is being used by developers and organisations.

The Spark growth claim is based on the finding that 48 per cent of users are running Spark in standalone mode while 40 per cent run it on Hadoop’s YARN operating system. At present 11 per cent of users are running Spark on Apache Mesos. The survey also found that 51 per cent of respondents run Spark on a public cloud.

The number of contributors to Spark rose from 315 to 600 contributors in the last 12 months, which the report authors claim makes this the most active open source project in Big Data. Additionally, more than 200 organisations contribute code to Spark, which they claims makes it ‘one of’ the largest communities of engaged developers to date.

According to the report, Spark is being used for increasingly diverse applications, with data scientists particularly focused on machine learning, streaming and graph analysis projects. Spark was used to create streaming applications 56 per cent more frequently in 2015 than 2014. The use of advanced analytics, like MLib for machine learning and GraphX for graph processing, is becoming increasingly common, the report says.

According to the study, 41 per cent of those surveyed identified themselves as data engineers, while 22 per cent of respondents say they are data scientists. The most common languages used for open sourced based big data projects in cloud computing are Scala (used by 71 per cent of the survey), Python (58 per cent), SQL (36 per cent), Java (31 per cent) and R (18 per cent).

Apple allegedly planning to unify web services on Mesos open source infrastructure

Mesos logoNews of significant numbers of Apple device crashes have fuelled industry speculation that Apple is planning to unify its variety of online services into one open source system built on Mesos infrastructure software.

According to web site The Information Apple is recruiting open source engineers. The recruitment could support a strategy to pull all its web services, including iCloud and iTunes, out of their separate technical silos and looking to merge them into one cohesive whole. Apple is said to be concerned about the lack of interoperability between Apple’s online services.

The plan to run internet applications across an ‘orchestrated infrastructure’ could be disruptive in more ways than one, according to Quocirca analyst Clive Longbottom.

“It’s a good idea that would enable Apple to more closely integrate various capabilities and offer new services around search, buy and store function,” said Longbottom. “The two main problems are around migrating all existing services over, and in ensuring high availability for all services when they are all in the same basket.”

According to Reuters, significant numbers of Apple customers are reporting their mobile devices have crashed as they tried to upload the new iOS 9 operating system. This is the latest in a number of technical challenges Apple is facing as its cloud software portfolio becomes more ambitious and difficult to manage, according to Sergio Galindo, General Manager at developer GFI Software, “The rollout of iOS 9 is an ambitious project, particularly as Apple has maintained support for devices that are elderly, in smartphone terms. Devices such as the iPhone 4s are significantly different and underpowered compared to more recent iterations,” said Galindo.

According to GFI’s own research, Apple’s OS X and iOS were the software systems platforms with the most exploitable vulnerabilities, closely followed by the Linux kernel. iOS was found to have significantly more flaws than conventional desktop and server Windows installations.

“Software glitches, vulnerabilities and compatibility issues in an embedded device such as a phone create a challenging user experience,” said Galindo. “This is why testing of new updates before allowing users to update their phones and tablets is essential. Applied to a business context, it is important for IT departments to ensure users do not put their devices or the corporate network at risk.”

IBM calls Apache Spark “most important new open source project in a decade”

IBM is throwing its weight behind Apache Spark in a bid to bolster its IoT strategy

IBM is throwing its weight behind Apache Spark in a bid to bolster its IoT strategy

IBM said it will throw its weight behind Apache Spark, an open source community developing a processing engine for large-scale datasets, putting thousands of internal developers to work on Spark-related projects and contributing its machine learning technology to the code ecosystem.

Spark, an Apache open source project born in 2009, is essentially an engine that can process vast amounts of data very quickly. It runs in Hadoop clusters through YARN or as a standalone deployment and can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat; it currently supports Scala, Java and Python.

It is designed to perform general data processing (like MapReduce) but one of the exciting things about Spark is it can also process new workloads like streaming data, interactive queries, and machine learning – making it a good match for Internet of Things applications, which is why IBM is so keen to go big on supporting the project.

The company said the technology brings huge advances when processing massive datasets generated by Internet of Things devices, improving the performance of data-dependent apps.

“IBM has been a decades long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, general manager, analytics platform, IBM Analytics.

“Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation,” Smith said.

In addition to joining Spark IBM said it would build the technology into the majority of its big data offerings, and offer Spark-as-a-Service on Bluemix. It also said it will open source its IBM SystemML machine learning technology, and collaborate with Databricks, a Spark-as-a-Service provider, to advance Spark’s machine learning capabilities.

Hortonworks buys SequenceIQ to speed up cloud deployment of Hadoop

CloudBreak

SequenceIQ will help boost Hortonworks’ position in the Hadoop ecosystem

Hortonworks has acquired SequenceIQ, a Hungary-based startup delivering infrastructure agnostic tools to improve Hadoop deployments. The company said the move will bolster its ability to offer speedy cloud deployments of Hadoop.

SequenceIQ’s flagship offering, Cloudbreak, is a Hadoop as a Service API for multi-tenant clusters that applies some of the capabilities of Blueprint (which lets you create a Hadoop cluster without having to use the Ambari Cluster Install Wizard) and Periscope (autoscaling for Hadoop YARN) to help speed up deployment of Hadoop on different cloud infrastructures.

The two companies have partnered extensively in the Hadoop community, and Hortonworks said the move will enhance its position among a growing number of Hadoop incumbents.

“This acquisition enriches our leadership position by providing technology that automates the launching of elastic Hadoop clusters with policy-based auto-scaling on the major cloud infrastructure platforms including Microsoft Azure, Amazon Web Services, Google Cloud Platform, and OpenStack, as well as platforms that support Docker containers. Put simply, we now provide our customers and partners with both the broadest set of deployment choices for Hadoop and quickest and easiest automation steps,” Tim Hall, vice president of product management at Hortonworks, explained.

“As Hortonworks continues to expand globally, the SequenceIQ team further expands our European presence and firmly establishes an engineering beachhead in Budapest. We are thrilled to have them join the Hortonworks team.”

Hall said the company also plans to contribute the Cloudbreak code back into the Apache Foundation sometime this year, though whether it will do so as part of an existing project or standalone one seems yet to be decided.

Hortonworks’ bread and butter is in supporting enterprise adoption of Hadoop and bringing the services component to the table, but it’s interesting to see the company commit to feeding the Cloudbreak code – which could, at least temporarily, give it a competitive edge – back into the ecosystem.

“This move is in line with our belief that the fastest path to innovation is through open source developed within an open community,” Hall explained.

The big data M&A space has seen more consolidation over the past few months, with Hitachi Data Systems acquiring big data and analytics specialist Pentaho and Infosys’ $200m acquisition of Panaya.