/images/avatar_ro.png

Back 2 Code

The Three Team Phases

Three phases In his book Elastic Leadership 1, Roy Osherove talks about an interesting principle he call the Three team phases. If you are a team leader or just a team member your team should be in one of these three phases. Survival phase (no time to learn): The team is spending its time fighting fires or trying to reach deadlines. The team is struggling and has to use the most efficient solution—certainly not the most efficient, but the most pragmatic—to achieve the work as soon as possible.

Deprecation in R

After my recent article on marking deprecated code in Python, I had to do the same thing in R. It’s included in the language (in The R Base Package).

Deprecation in Python

It’s always a good practice to deprecate functions, methods or classes before removing or changing something.

Radian

I’m a big fan of console tools, and I was wondering if there was a way to color the R console output.

I came across this tool called Radian with this attractive tagline.

A 21 century R console

So let’s try it!

Littler

R packages have often funny names. littler stands for little R, this means lower case r.

Molecule

Molecule is designed to aid in the development and testing of Ansible roles. Molecule provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios. Molecule encourages an approach that results in consistently developed roles that are well-written, easily understood and maintained. Molecule Installation $ conda create -n molecule python=3.7 $ source activate ansible $ conda install -c conda-forge ansible docker-py docker-compose molecule # docker-py seems to be called docker in PyPi $ pip install ansible docker docker-compose molecule Main features Cookiecutter to create role from a standardized template.

Spark on Kubernetes Client Mode

This is the third article in the Spark on Kubernetes (K8S) series after: Spark on Kubernetes First Spark on Kubernetes Python and R bindings This one is dedicated to the client mode a feature that as been introduced in Spark 2.4. In client mode the driver runs locally (or on an external pod) making possible interactive mode and so it cannot be used to run REPL like Spark shell or Jupyter notebooks.

Spark on Kubernetes Python and R bindings

The version 2.4 of Spark for Kubernetes introduces Python and R bindings. spark-py: The Spark image with Python bindings (including Python 2 and 3 executables) spark-r: The Spark image with R bindings (including R executable) Databricks has published an article dedicated to the Spark 2.4 features for Kubernetes. It’s exactly the same principle as already explained in my previous article. But this time we are using: A different image: spark-py Another example: local:///opt/spark/examples/src/main/python/pi.

Spark on Kubernetes First Run

Since the version 2.3, Spark can run on a Kubernetes cluster. Let’s see how to do it. In this example I will use the version 2.4. Prerequisites are: Download an install (unzip) the corresponding Spark distribution, For more information, there is a section on the Spark site dedicated to this use case. Spark images I will build and push Spark images to make them available to the K8S cluster.

Dplyr & Sparklyr usage

In this example, I want to show the possibility to perform with the same syntax local computing as well as distributed computing thanks to the Sparklyr package. To do that I will use the nycflights13 dataset (one of the dataset used in the Sparklyr demo) in order to check if the number of flights by day evolves according to the period of the year (the month). Spoiler: It varies but not so much.

Configure PySpark to connect to a Standalone Spark Cluster

In one of my previous article I talked about running a Standalone Spark Cluster inside Docker containers through the usage of docker-spark. I was using it with R Sparklyr framework. However if you want to use from a Python environment in an interactive mode (like in Jupyter notebooks where the driver runs on the local machine while the workers run in the cluster), you have several steps to follow. You need to run the same Python version on the driver and on the workers.

Spark History Server available in docker-spark

Spark comes with a history server, it provides a great UI with many information regarding Spark jobs execution (event timeline, detail of stages, etc.). Details can be found in the Spark monitoring page. I’ve modified the docker-spark to be able to run it with the docker-compose upcommand. With this implementation, its UI will be running at http://${YOUR_DOCKER_HOST}:18080. To use the Spark’s history server you have to tell your Spark driver:

Find Spark

Find Spark is an handy tool to use each time you want to switch between spark versions in Jupyter Notebooks without the need to change the SPARK_HOME environment variable.

WSJF

WSJF stands for Weighted Shortest Job First. It’s a technique used in scaled Agile framework (SAFe) to prioritise jobs—epics, features and capabilities—according to their value relative to the cost to perform it. Basically it’s a way of ranking a list of features in order to maximise the outcome—the value produced—with a constrained capacity to produce it. The job with the highest WSJF (value over the cost) is selected first for implementation.

Effective Monitoring and Alerting

A short note about this book I used in my work. First of all two good points. The first is that it deals with monitoring, alerting and reporting in general, that is to say independently of the tools used. This is both a strong point and a weak point since it could be useful to identify families of tools adapted to each use. This step back is not so common and allows to introduce higher level concepts, for example the organization of the monitoring in stacks which is absolutely crucial but also notions and general definitions applicable in all circumstances - or almost.