Categories

Back 2 Code

Molecule

Molecule is designed to aid in the development and testing of Ansible roles. Molecule provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios. Molecule encourages an approach that results in consistently developed roles that are well-written, easily understood and maintained. Molecule Installation 1 2 3 4 5 $ conda create -n molecule python=3.7 $ source activate ansible $ conda install -c conda-forge ansible docker-py docker-compose molecule # docker-py seems to be called docker in PyPi $ pip install ansible docker docker-compose molecule Main features Cookiecutter to create role from a standardized template.

Spark on Kubernetes Client Mode

This is the third article in the Spark on Kubernetes (K8S) series after: Spark on Kubernetes First Spark on Kubernetes Python and R bindings This one is dedicated to the client mode a feature that as been introduced in Spark 2.4. In client mode the driver runs locally (or on an external pod) making possible interactive mode and so it cannot be used to run REPL like Spark shell or Jupyter notebooks.

Spark on Kubernetes Python and R bindings

The version 2.4 of Spark for Kubernetes introduces Python and R bindings. spark-py: The Spark image with Python bindings (including Python 2 and 3 executables) spark-r: The Spark image with R bindings (including R executable) Databricks has published an article dedicated to the Spark 2.4 features for Kubernetes. It’s exactly the same principle as already explained in my previous article. But this time we are using: A different image: spark-py Another example: local:///opt/spark/examples/src/main/python/pi.

Spark on Kubernetes First Run

Since the version 2.3, Spark can run on a Kubernetes cluster. Let’s see how to do it. In this example I will use the version 2.4. Prerequisites are: Download an install (unzip) the corresponding Spark distribution, For more information, there is a section on the Spark site dedicated to this use case. Spark images I will build and push Spark images to make them available to the K8S cluster.

Dplyr & Sparklyr usage

In this example, I want to show the possibility to perform with the same syntax local computing as well as distributed computing thanks to the Sparklyr package. To do that I will use the nycflights13 dataset (one of the dataset used in the Sparklyr demo) in order to check if the number of flights by day evolves according to the period of the year (the month). Spoiler: It varies but not so much.