Monday Exercise 2.4: Use queue N, $(Cluster), and $(Process)

The goal of this exercise is to learn to submit many jobs from a single queue statement, and then to control filenames and arguments per job.

Submitting Many Jobs With One Submit File

Suppose you have a program that you want to run many times. The program takes an argument, and you want to change the argument for each run of the program. With what you know so far, you have a couple of choices (assuming that you cannot change the job itself to work this way):

Neither of these options seems very satisfying. Fortunately, we can do better with HTCondor.

Running Many Jobs With One queue Statement

Here is a C program that uses a simple stochastic (random) method to estimate the value of π — feel free to try to figure out the method from the code, but it is not critical for this exercise. The single argument to the program is the number of samples to take. More samples should result in better estimates!

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int main(int argc, char *argv[])
{
  struct timeval my_timeval;
  int iterations = 0;
  int inside_circle = 0;
  int i;
  double x, y, pi_estimate;

  gettimeofday(&my_timeval, NULL);
  srand48(my_timeval.tv_sec ^ my_timeval.tv_usec);

  if (argc == 2) {
    iterations = atoi(argv[1]);
  } else {
    printf(&quot;usage: circlepi ITERATIONS\n&quot;);
    exit(1);
  }

  for (i = 0; i < iterations; i++) {
    x = (drand48() - 0.5) * 2.0;
    y = (drand48() - 0.5) * 2.0;
    if (((x * x) + (y * y)) <= 1.0) {
      inside_circle++;
    }
  }
  pi_estimate = 4.0 * ((double) inside_circle / (double) iterations);
  printf(&quot;%d iterations, %d inside; pi = %f\n&quot;, iterations, inside_circle, pi_estimate);
  return 0;
}
  1. In a new directory for this exercise, save the code to a file named circlepi.c
  2. Compile the code (we will cover this in more detail Wednesday):\
     gcc -static -o circlepi circlepi.c
  3. If there are errors, check the file contents and compile command carefully, otherwise see the instructors
  4. Test the program with just a few samples:\
     ./circlepi 10000

Now suppose that you want to run the program many times, to produce many estimates. This is exactly what a statement like queue 3 is useful for. Let’s see how it works.

  1. Write a normal submit file for this program
    • Pass 1 billion (1000000000) as the command line argument to circlepi
    • Remember to use queue 3 instead of just queue
  2. Submit the file\

    Note the slightly different message from condor_submit:

    \
    3 job(s) submitted to cluster NNNN.
  3. Before the jobs execute, look at the job queue to see the multiple jobs

Here is some sample condor_q -nobatch output:

 ID       OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
10228.0   cat             7/25 11:57   0+00:00:00 I  0    0.7 circlepi 1000000000
10228.1   cat             7/25 11:57   0+00:00:00 I  0    0.7 circlepi 1000000000
10228.2   cat             7/25 11:57   0+00:00:00 I  0    0.7 circlepi 1000000000

In this sample, all three jobs are part of cluster 10228, but the first job was assigned process 0, the second job was assigned process 1, and the third one was assigned process 2. (Historical note: Programmers like to start counting from 0, hence the odd numbering scheme.)

At this time, it is worth reviewing the definition of a job ID. It is a job’s cluster number, a dot (.), and the job’s process number. So in the example above, the job ID of the second job is 10228.1.

Pop Quiz: Do you remember how to ask HTCondor to list all of the jobs from one cluster? How about one specific job ID?

Using queue N With Output

When all three jobs in your single cluster are finished, examine the resulting files.

Using $(Process) to Distinguish Jobs

As you saw with the experiment above, we need a way to separate output (and error) files per job that is queued, not just for the whole cluster of jobs. Fortunately, HTCondor has a way to separate the files easily.

When processing a submit file, HTCondor defines and uses a special variable for the process number of each job. If you write $(Process) in a submit file, HTCondor will replace it with the process number of the job, independently for each job that is queued. For example, you can use the $(Process) variable to define a separate output file name for each job. Suppose the following two lines are in a submit file:

output = my-output-file-$(Process).out
queue 10

Even though the output filename is defined only once, HTCondor will create separate output filenames for each job:

First job my-output-file-0.out
Second job my-output-file-1.out
Third job my-output-file-2.out
...
Last (tenth) job my-output-file-9.out

Let’s see how this works for our program that estimates π.

  1. In your submit file, change the definitions of output and error to use $(Process), in a way that is similar to the example above
  2. Remove any output, error, and log files from previous runs
  3. Submit the updated file

When all three jobs are finished, examine the resulting files again.

Using $(Cluster) to Separate Files Across Runs

With $(Process), you can get separate output (and error) filenames for each job within a run. However, the next time you submit the same file, all of the output and error files are overwritten by new ones created by the new jobs. Maybe this is the behavior that you want. But sometimes, you may want to separate files by run, as well.

In addition to $(Process), there is also a $(Cluster) variable that you can use in your submit files. It works just like $(Process), except it is replaced with the cluster number of the entire submission. Because the cluster number is the same for all jobs within a single submission, it does not separate files by job within a submission. But when used with $(Process), it can be used to separate files by run. For example, consider this output statement:

output = my-output-file-$(Cluster)-$(Process).out

For one particular run, it might result in output filenames like this:

First job my-output-file-2444-0.out
Second job my-output-file-2444-1.out
Third job my-output-file-2444-2.out
...

If you like, change your submit file from the previous exercise to use both $(Cluster) and $(Process). Submit your file twice to see the separate files for each run. Be careful how many jobs you run total, as the number of output files grows quickly!

Using $(Process) and $(Cluster) in Other Statements

The $(Cluster) and $(Process) variables can be used in any submit file statement, although they are useful in some kinds of statements more than others. For instance, it is hard to imagine a truly good reason to use the $(Process) variable in a rank statement (i.e., for preferring some execute slots over others), and in general the $(Cluster) variable often makes little sense to use.

But in some situations, the $(Process) variable can be very helpful. Common uses are in the following kinds of statements — can you think of a scenario in which each use might be helpful?

Unfortunately, HTCondor does not let you perform math on the $(Process) number when using it. So, for example, if you use $(Process) as a numeric argument to a command, it will always result in jobs getting the arguments 0, 1, 2, and so on. If you have control over your program and the way in which it uses command-line arguments, then you are fine. Otherwise, you might need to transform the $(Process) numbers into something more appropriate using a wrapper script, which will be discussed on Wednesday.

(Optional) Defining JobBatchName for Tracking

During the lecture, it was mentioned that you can define arbitrary attributes in your submit file, and that one purpose of such attributes is to track or report on different jobs separately. In this optional exercise, you will see how this technique can be used.

Once again, we will use sleep jobs, so that your jobs remain in the queue long enough to experiment on.

  1. Create a basic submit file that runs sleep 120 (or some reasonable duration).
  2. Instead of a single queue statement, write this:\

jobbatchname = 1 queue 5 \

The highlighted statements give the extra attribute jobbatchname to your jobs; the first 5 jobs have one value, and the second 5 have another.

  1. Submit the file.
  2. Now, quickly edit the submit file to instead say:

jobbatchname = 2

  1. Submit the file again.

Check on the submissions using a normal condor_q and condor_q -nobatch. Of course, your special attribute does not appear in the condor_q -nobatch output, but it is present in the condor_q output and in each job’s ClassAd. You can see the effect of the attribute by limiting your condor_q output to one type of job or another. First, run this command:

%UCL_PROMPT_SHORT% <strong>condor_q -constraint 'JobBatchName == "1"'</strong>

Do you get the output that you expected?

Using the example command above, how would you list your other five jobs?