OSPool Site Admin Documentation
Monitoring and Information
Viewing researcher jobs running within a glidein job
-
Log in to a worker node as the OSPool user (e.g., “osg01” but may be different at your site)
-
Pick a glidein (and its directory):
Note: SCRATCH is the path of the scratch directory you gave us to put glideins in.
-
Option 1: List HTCondor processes, pick one, and note its glidein directory:
$ ps -u osg01 -f | grep master osg01 4122304 [...] SCRATCH/glide_qJDh7z/main/condor/sbin/condor_master [...]
-
Option 2: List glidein directories and pick an active one:
$ ls -l SCRATCH/glide_*/_GLIDE_LEASE_FILE
Pick a “glide_xxxxxx” directory that has a lease file with a timestamp less than about 5 minutes old.
Note: Let GLIDEDIR be the path to the chosen glidein directory, e.g.,
SCRATCH/glide_xxxxxx
-
-
Make sure HTCondor is recent enough:
$ GLIDEDIR/main/condor/bin/condor_version $CondorVersion: 24.6.1 2025-03-20 BuildID: 794846 PackageID: 24.6.1-1 $ $CondorPlatform: x86_64_AlmaLinux9 $
The Condor Version should be 2.7.0 or later; if not, go back to step 2 and pick a different glidein directory.
-
Pick the PID of an HTCondor “startd” process to query:
-
If you are just exploring, run the following command and pick any “condor_startd” process — it does not matter if it is associated with the HTCondor instance identified in steps 2–3 above:
$ ps -u osg01 -f | grep condor_startd
-
If you are looking for the “startd” associated with some other process, start with a process tree:
$ ps -u osg01 -f --forest
Find the process of interest, then work upward in the process tree to the nearest ancestor “condor_startd” process.
-
In either case, note the PID (leftmost number) on the “condor_startd” line you picked.
-
-
Run
condor_who
on the PID you picked:$ GLIDEDIR/main/condor/bin/condor_who -pid PID -ospool Batch System : SLURM Batch Job : 346675 Birthdate : 2025-04-07 14:47:02 Temp Dir : /var/lib/condor/execute/osg01/glide_qJDh7z/tmp Startd : [email protected] has 1 job(s) running: PROJECT USER AP_HOSTNAME JOBID RUNTIME MEMORY DISK CPUs EFCY PID STARTER Inst-Proj jsmith ap20.uc.osg-htc.org 27781234.0 0+00:17:43 512.0 MB 129.0 MB 1 0.00 4124321 4123123 …
Definitions of fields in the header (before the PROJECT USER … row):
Batch System | HTCondor’s name for the type of batch system you are running. |
---|---|
Batch Job | The identifier for this glidein job in your batch system. |
Birthdate | When HTCondor began running within this glidein; typically, this is a few minutes after the glidein job itself began running. |
Temp Dir | The path to the glidein job directory (remove the trailing “/tmp”) |
Startd | HTCondor’s identifier for its “startd” process within the glidein job. |
Definitions of fields in each row of the researcher job table:
PROJECT | The OSPool project identifier for this researcher job |
---|---|
USER | The OSPool AP’s user identifier for this researcher job |
AP_HOSTNAME | The OSPool AP’s hostname |
JOBID | HTCondor’s identifier for this researcher job on its AP |
RUNTIME | HTCondor’s value for the runtime of the researcher job |
MEMORY | The amount of memory (RAM), in MB, that HTCondor allocated to this researcher job |
DISK | The amount of disk, in KB, that HTCondor allocated to this researcher job |
CPUs | The number of CPU cores that HTCondor allocated to this researcher job |
EFCY | An HTCondor measure of the efficiency of the job, roughly calculated as CPU time / wallclock time; a value noticeably greater than the CPUs value may mean the researcher job is using more cores than requested |
PID | The local PID of the researcher job, or more often, of the root of the process tree for the researcher job (e.g., this could be a wrapper script or even Singularity or Apptainer for a researcher job in a container) |
STARTER | The local PID of the HTCondor “starter” process that owns this research job |
Viewing researcher jobs running within all OSPool glidein jobs
Note: Steps 1–3 are the same as above.
-
Log in to a worker node as the OSPool user (e.g., “osg01” but may be different at your site)
-
Pick a glidein (and its directory):
Note: SCRATCH is the path of the scratch directory you gave us to put glideins in.
-
Option 1: List HTCondor processes, pick one, and note its glidein directory:
$ ps -u osg01 -f | grep master osg01 4122304 [...] SCRATCH/glide_qJDh7z/main/condor/sbin/condor_master [...]
-
Option 2: List glidein directories and pick an active one:
$ ls -l SCRATCH/glide_*/_GLIDE_LEASE_FILE
Pick a “glide_xxxxxx” directory that has a lease file with a timestamp less than about 5 minutes old.
Note: Let GLIDEDIR be the path to the chosen glidein directory, e.g.,
SCRATCH/glide_xxxxxx
-
-
Make sure HTCondor is recent enough:
$ GLIDEDIR/main/condor/bin/condor_version $CondorVersion: 24.6.1 2025-03-20 BuildID: 794846 PackageID: 24.6.1-1 $ $CondorPlatform: x86_64_AlmaLinux9 $
The Condor Version should be 2.7.0 or later; if not, go back to step 2 and pick a different glidein directory.
-
Run
condor_who
on all discoverable glideins on this host running as the current user:$ GLIDEDIR/main/condor/bin/condor_who -allpids -ospool Batch System : SLURM Batch Job : 346675 Birthdate : 2025-04-07 14:47:02 Temp Dir : /var/lib/condor/execute/osg01/glide_qJDh7z/tmp Startd : [email protected] has 1 job(s) running: PROJECT USER AP_HOSTNAME JOBID RUNTIME MEMORY DISK CPUs EFCY PID STARTER Inst-Proj jsmith ap20.uc.osg-htc.org 27781234.0 0+00:17:43 512.0 MB 129.0 MB 1 0.00 4124321 4123123 …
For each glidein job, there will be one set of heading lines (“Batch System”, etc.) and a table of researcher jobs, one per line; format and definitions are as above.