Spark on Kubernetes Python and R bindings
2.4 of Spark for Kubernetes introduces Python and R bindings.
spark-py: The Spark image with Python bindings (including Python 2 and 3 executables)
spark-r: The Spark image with R bindings (including R executable)
Databricks has published an article dedicated to the Spark
2.4 features for Kubernetes.
It’s exactly the same principle as already explained in my previous article. But this time we are using:
- A different image:
- Another example:
local:///opt/spark/examples/src/main/python/pi.py, once again a Pi computation :-|
- A dedicated Spark namespace:
The namespace must be created first:
$ k create namespace spark namespace "spark" created
It permits to isolate the Spark pods from the rest of the cluster and could be used later to cap available resources.
$ cd $SPARK_HOME $ ./bin/spark-submit \ --master k8s://https://localhost:6443 \ --deploy-mode cluster \ --name spark-pi \ --conf spark.executor.instances=2 \ --conf spark.driver.memory=512m \ --conf spark.executor.memory=512m \ --conf spark.kubernetes.container.image=spark-py:v2.4.0 \ --conf spark.kubernetes.pyspark.pythonVersion=3 \ --conf spark.kubernetes.namespace=spark \ local:///opt/spark/examples/src/main/python/pi.py
spark.kubernetes.pyspark.pythonVersion is an additional (an optional) property that can be used to select the major Python version to use (it’s
2 by default).
This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3.
An interesting that has nothing to do with Python is that Spark defines labels that are applied on pods. They permit to easily identify the role of each pod.
$ k get po -L spark-app-selector,spark-role -n spark NAME READY STATUS RESTARTS AGE SPARK-APP-SELECTOR SPARK-ROLE spark-pi-1545987715677-driver 1/1 Running 0 12s spark-c4e28a2ef3d14cfda16c007383318c79 driver spark-pi-1545987715677-exec-1 0/1 ContainerCreating 0 1s spark-application-1545987726694 executor spark-pi-1545987715677-exec-2 0/1 ContainerCreating 0 1s spark-application-1545987726694 executor
You can for example use the label to delete all the terminated driver pods.
# Can also decide to switch from the default to the spark namespace # $ k config set-context $(kubectl config current- context) --namespace spark $ k delete po -l spark-role=driver -n spark # Or you can delete the whole namespace $ k delete ns spark