Monday Exercise 2.4: Use queue N, $(Cluster), and $(Process)¶
The goal of this exercise is to learn to submit many jobs from a single queue
statement, and then to control filenames and arguments per job.
Submitting Many Jobs With One Submit File¶
Suppose you have a program that you want to run many times. The program takes an argument, and you want to change the argument for each run of the program. With what you know so far, you have a couple of choices (assuming that you cannot change the job itself to work this way):
- Write one submit file; submit one job, change the argument in the submit file, submit another job, change the submit file, …
- Write many submit files that are nearly identical except for the program argument
Neither of these options seems very satisfying. Fortunately, we can do better with HTCondor.
Running Many Jobs With One queue Statement¶
Here is a C program that uses a simple stochastic (random) method to estimate the value of π — feel free to try to figure out the method from the code, but it is not critical for this exercise. The single argument to the program is the number of samples to take. More samples should result in better estimates!
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
struct timeval my_timeval;
int iterations = 0;
int inside_circle = 0;
int i;
double x, y, pi_estimate;
gettimeofday(&my_timeval, NULL);
srand48(my_timeval.tv_sec ^ my_timeval.tv_usec);
if (argc == 2) {
iterations = atoi(argv[1]);
} else {
printf("usage: circlepi ITERATIONS\n");
exit(1);
}
for (i = 0; i < iterations; i++) {
x = (drand48() - 0.5) * 2.0;
y = (drand48() - 0.5) * 2.0;
if (((x * x) + (y * y)) <= 1.0) {
inside_circle++;
}
}
pi_estimate = 4.0 * ((double) inside_circle / (double) iterations);
printf("%d iterations, %d inside; pi = %f\n", iterations, inside_circle, pi_estimate);
return 0;
}
- In a new directory for this exercise, save the code to a file named
circlepi.c
- Compile the code (we will cover this in more detail Wednesday):\
gcc -static -o circlepi circlepi.c
- If there are errors, check the file contents and compile command carefully, otherwise see the instructors
- Test the program with just a few samples:\
./circlepi 10000
Now suppose that you want to run the program many times, to produce many estimates. This is exactly what a statement like queue 3
is useful for. Let’s see how it works.
- Write a normal submit file for this program
- Pass 1 billion (
1000000000
) as the command line argument tocirclepi
- Remember to use
queue 3
instead of justqueue
- Pass 1 billion (
- Submit the file\
Note the slightly different message from
\condor_submit
:3 job(s) submitted to cluster NNNN.
- Before the jobs execute, look at the job queue to see the multiple jobs
Here is some sample condor_q -nobatch
output:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
10228.0 cat 7/25 11:57 0+00:00:00 I 0 0.7 circlepi 1000000000
10228.1 cat 7/25 11:57 0+00:00:00 I 0 0.7 circlepi 1000000000
10228.2 cat 7/25 11:57 0+00:00:00 I 0 0.7 circlepi 1000000000
In this sample, all three jobs are part of cluster 10228
, but the first job was assigned process 0
, the second job was assigned process 1
, and the third one was assigned process 2
. (Historical note: Programmers like to start counting from 0, hence the odd numbering scheme.)
At this time, it is worth reviewing the definition of a job ID. It is a job’s cluster number, a dot (.
), and the job’s process number. So in the example above, the job ID of the second job is 10228.1
.
Pop Quiz: Do you remember how to ask HTCondor to list all of the jobs from one cluster? How about one specific job ID?
Using queue N With Output¶
When all three jobs in your single cluster are finished, examine the resulting files.
- What is in the output file?
- What is in the error file (hopefully nothing)?
- What is in the log file? Look carefully at the job IDs in each event.
- Is this what you expected? Is it what you wanted?
Using $(Process) to Distinguish Jobs¶
As you saw with the experiment above, we need a way to separate output (and error) files per job that is queued, not just for the whole cluster of jobs. Fortunately, HTCondor has a way to separate the files easily.
When processing a submit file, HTCondor defines and uses a special variable for the process number of each job. If you write $(Process)
in a submit file, HTCondor will replace it with the process number of the job, independently for each job that is queued. For example, you can use the $(Process)
variable to define a separate output file name for each job. Suppose the following two lines are in a submit file:
output = my-output-file-$(Process).out
queue 10
Even though the output
filename is defined only once, HTCondor will create separate output filenames for each job:
First job | my-output-file-0.out |
Second job | my-output-file-1.out |
Third job | my-output-file-2.out |
... | |
Last (tenth) job | my-output-file-9.out |
Let’s see how this works for our program that estimates π.
- In your submit file, change the definitions of
output
anderror
to use$(Process)
, in a way that is similar to the example above - Remove any output, error, and log files from previous runs
- Submit the updated file
When all three jobs are finished, examine the resulting files again.
- How many files are there of each type? What are their names?
- Is this what you expected? Is it what you wanted from the π estimation process?
Using $(Cluster) to Separate Files Across Runs¶
With $(Process)
, you can get separate output (and error) filenames for each job within a run. However, the next time you submit the same file, all of the output and error files are overwritten by new ones created by the new jobs. Maybe this is the behavior that you want. But sometimes, you may want to separate files by run, as well.
In addition to $(Process)
, there is also a $(Cluster)
variable that you can use in your submit files. It works just like $(Process)
, except it is replaced with the cluster number of the entire submission. Because the cluster number is the same for all jobs within a single submission, it does not separate files by job within a submission. But when used with $(Process)
, it can be used to separate files by run. For example, consider this output
statement:
output = my-output-file-$(Cluster)-$(Process).out
For one particular run, it might result in output filenames like this:
First job | my-output-file-2444-0.out |
Second job | my-output-file-2444-1.out |
Third job | my-output-file-2444-2.out |
... |
If you like, change your submit file from the previous exercise to use both $(Cluster)
and $(Process)
. Submit your file twice to see the separate files for each run. Be careful how many jobs you run total, as the number of output files grows quickly!
Using $(Process) and $(Cluster) in Other Statements¶
The $(Cluster)
and $(Process)
variables can be used in any submit file statement, although they are useful in some kinds of statements more than others. For instance, it is hard to imagine a truly good reason to use the $(Process)
variable in a rank
statement (i.e., for preferring some execute slots over others), and in general the $(Cluster)
variable often makes little sense to use.
But in some situations, the $(Process)
variable can be very helpful. Common uses are in the following kinds of statements — can you think of a scenario in which each use might be helpful?
log
transfer_input_files
transfer_output_files
arguments
Unfortunately, HTCondor does not let you perform math on the $(Process)
number when using it. So, for example, if you use $(Process)
as a numeric argument to a command, it will always result in jobs getting the arguments 0, 1, 2, and so on. If you have control over your program and the way in which it uses command-line arguments, then you are fine. Otherwise, you might need to transform the $(Process)
numbers into something more appropriate using a wrapper script, which will be discussed on Wednesday.
(Optional) Defining JobBatchName for Tracking¶
During the lecture, it was mentioned that you can define arbitrary attributes in your submit file, and that one purpose of such attributes is to track or report on different jobs separately. In this optional exercise, you will see how this technique can be used.
Once again, we will use sleep
jobs, so that your jobs remain in the queue long enough to experiment on.
- Create a basic submit file that runs
sleep 120
(or some reasonable duration). - Instead of a single
queue
statement, write this:\
jobbatchname = 1 queue 5 \
The highlighted statements give the extra attribute jobbatchname
to your jobs; the first 5 jobs have one value, and the second 5 have another.
- Submit the file.
- Now, quickly edit the submit file to instead say:
jobbatchname = 2
- Submit the file again.
Check on the submissions using a normal condor_q
and condor_q -nobatch
. Of course, your special attribute does not appear in the condor_q -nobatch
output, but it is present in the condor_q
output and in each job’s ClassAd. You can see the effect of the attribute by limiting your condor_q
output to one type of job or another. First, run this command:
%UCL_PROMPT_SHORT% <strong>condor_q -constraint 'JobBatchName == "1"'</strong>
Do you get the output that you expected?
Using the example command above, how would you list your other five jobs?