Skip to content

A few tips and tricks

Objective

This exercise will teach you a few nifty commands to help you use Condor more easily.

Tips for condor_q

Curious where your jobs are running? Use the -run option to see where jobs are running. (Idle jobs are not shown.)

$ condor_q -run -nobatch

-- Submitter: frontal.cci.ucad.sn : <10.0.0.252:9645> : frontal.cci.ucad.sn
 ID      OWNER           SUBMITTED     RUN_TIME HOST(S)
  23.44  kagross         8/18 14:51   0+00:00:42 [email protected]
  23.45  kagross         8/18 14:51   0+00:00:37 [email protected]
  23.46  kagross         8/18 14:51   0+00:00:32 [email protected]
  23.47  kagross         8/18 14:51   0+00:00:27 [email protected]
  23.48  kagross         8/18 14:51   0+00:00:20 [email protected]
  23.49  kagross         8/18 14:51   0+00:00:14 [email protected]

condor_q can show you your job ClassAd. Recall back to the lecture and the discussion of ClassAds. For instance, you can look at the ClassAd for a single job:

$ condor_q -l 23.0

MaxHosts = 1
User = "[email protected]"
OnExitHold = false
CoreSize = 0
MachineAttrCpus0 = 1
WantRemoteSyscalls = false
MyType = "Job"
Rank = 0.0
CumulativeSuspensionTime = 0
MinHosts = 1
PeriodicHold = false
PeriodicRemove = false
Err = "simple.49.error"
ProcId = 49
EnteredCurrentStatus = 1408374244
UserLog = "/home/kagross/condor-test/s
... output trimmed ... 

There are some interesting parts you can check out.

How many times has this job run? (It might be more than one if there were recoverable errors.)

$ condor_q -l 23.0 | grep JobRunCount
JobRunCount = 1

Where is the user log for this job? This is helpful when you assist someone else in debugging and they're not sure.

$ condor_q -l 23.0 | grep UserLog
UserLog = "/home/kagross/condor-test/simple.47.log"

What are the job's requirements? Condor automatically fills some in for you to make sure your job runs on a reasonable computer in our cluster, but you can override any of these. I've broken the output into multiple lines to explain it to you.

$ condor_q -l 23.0 | grep Requirements
Requirements =( TARGET.Arch == "X86_64" ) <em># Run on a 64-bit computer</em>
    && ( TARGET.OpSys == "LINUX" )  <em># Make sure you run on Linux</em>
    && ( TARGET.Disk >= RequestDisk ) <em># Make sure the default disk Condor is on has enough disk space.</em>
    && ( TARGET.Memory >= RequestMemory )  <em># Make sure the computer has enough memory</em>
    && ( TARGET.HasFileTransfer )  <em># Only run on a computer that can accept your files.</em>

What else can you find that's interesting in the ClassAd?

Removing jobs

If you submit a job that you realize has a problem, you can remove it with condor_rm. For example:

$ condor_q -nobatch

-- Submitter: osg-ss-submit.chtc.wisc.edu : <128.104.100.55:9618?sock=28867_10e4_2> : osg-ss-submit.chtc.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
  29.0   roy             6/21 15:23   0+00:00:00 I  0   0.7  simple 60 10      

1 jobs; 0 completed, 0 removed, 2 idle, 0 running, 0 held, 0 suspended

$ condor_rm 29.0
Job 29.0 marked for removal

$ condor_q -nobatch

-- Submitter: osg-ss-submit.chtc.wisc.edu : <128.104.100.55:9618?sock=28867_10e4_2> : osg-ss-submit.chtc.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

A few tips:

Historical information

You can see information about jobs that completed and are no longer in the queue with the condor_history command. It's rare that you want to see all the jobs, so try looking at jobs for just you:

$ condor_history YOUR_USER_ID

For example:

$ condor_history kagross
   9.9   kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 9 9
   9.8   kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 8 9
   9.11  kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 11 9
   9.7   kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 7 9
   9.5   kagross         7/31 12:44   0+00:00:02 C   7/31 12:44 /home/kagross/simple 5 9
   9.6   kagross         7/31 12:44   0+00:00:02 C   7/31 12:44 /home/kagross/simple 6 9
   9.3   kagross         7/31 12:44   0+00:00:02 C   7/31 12:44 /home/kagross/simple 3 9
   9.2   kagross         7/31 12:44   0+00:00:02 C   7/31 12:44 /home/kagross/simple 2 9
   9.1   kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 1 9
   9.0   kagross         7/31 12:44   0+00:00:03 C   7/31 12:44 /home/kagross/simple 
   9.4   kagross         7/31 12:44   0+00:00:01 C   7/31 12:44 /home/kagross/simple 4 9
   8.0   kagross         7/31 12:42   0+00:00:07 C   7/31 12:42 /home/kagross/simple 4 10
...