Wednesday Bonus Exercise 2.2: Use Singularity to Run Tensorflow¶
In this tutorial, we see how to submit a tensorflow job on the OSG through Singularity containers. We currently offer CPU and GPU containers for tensorflow (both based on Ubuntu). Here, we focus on a CPU container.
Setup¶
You should still be logged into training.osgconnect.net
(the OSG Connect submit server for this workshop).
Get the example files and understand the job requirements.¶
In order to run this example quickly, you can download all the files into a new folder using the tutorial
command:
username@training $ tutorial tensorflow-matmul
This creates a directory tutorial-tensorflow-matmul
. Go inside the directory and see what is inside.
username@training $ cd tutorial-tensorflow-matmul username@training $ ls -F
You will see the following files
tf_matmul.py (Python program to multiply two matrices using tensorflow package) tf_matmul.submit (HTCondor Job description file) tf_matmul_wrapper.sh (Job wrapper shell script that executes the python program) tf_matmul_gpu.submit (HTCondor Job description file targeting gpus)
NOTE: The file tf_matmul_gpu.submit
is for gpus, but we will not focus on gpus in this exercise. You are welcome to take a look.
The python script `tf_matmul.py` uses tensorflow to perform the matrix multiplication of a `2x2` matrix.
The submit file will have similar requirements and options as our previous job, including:
Requirements = HAS_SINGULARITY == True
In addition, we also provide the full path of the image via the keyword +SingularityImage
.
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest"
Submit the tensorflow example job¶
Now submit the job to the OSG.
username@training $ condor_submit tf_matmul.submit
The job will look for a machine on the OSG that has singularity installed. On a matched machine, the job creates the singularity container from the image /cvmfs/singularity.opensciencegrid.org/opensciencegrid/tensorflow:latest
. Inside this container, the program tf_matmul.py
begins to execute.
After your job completed, you will see an output file tf_matmul.output
.
username@training $ cat tf_matmul.output result of matrix multiplication =============================== [[ 1.0000000e+00 0.0000000e+00] [-4.7683716e-07 1.0000002e+00]] ===============================
The result printed in the output file should be a 2x2
identity matrix.