Worker Node Overview¶
The Worker Node Client is a collection of useful software components that is expected to be on every OSG worker node. In addition, a job running on a worker node can access a handful of environment variables that can be used to locate resources.
This page describes how to initialize the environment of your job to correctly access the execution and data areas from the worker node.
The OSG provides no scientific software dependencies or software build tools on the worker node; you are expected to bring along all application-level dependencies yourself (preferred; most portable) or utilize CVMFS. Sites are not required to provide any specific tools (gcc
, lapack
, blas
, etc.) beyond the ones in the OSG worker node client and the base OS.
If you would like to test the minimal OS environment that jobs can expect, you can test out your scientific software in the OSG Docker image.
Filling local scratch disk
The directory specified by the OSG_WN_TMP
environment variable is used by pilot jobs as a temporary staging area
for user job data during the lifetime of the pilot.
If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory
filling up, which could negatively affect other jobs running on the impacted node.
See this section for suggestions for mitigation.
Hardware Recommendations¶
Hardware | Minimum | Recommended | Notes |
---|---|---|---|
Core per pilot | 1 | 8 | Depends on the supported VOs. The total core count on every node in the cluster must be divisible by core per pilot. |
Memory per core | 1024MB | 2048MB | Memory per core times core per pilot needs to be less than the total memory on every node. Do not overcommit. |
Scratch disk per core (OSG_WN_TMP) | 2 GB | 10 GB | This can be overcommitted if a mix of different VO jobs is expected. |
CVMFS Cache per node (optional) | 10 GB | 20 GB |
Common Software Available on Worker Nodes¶
The OSG worker node environment contains the following software:
- Data and related tooling:
- The supported set of CA certificates (located in
$X509_CERT_DIR
after the environment is set up) - VO authentication:
vo-client
- Update Certificate Revocation Lists:
fetch-crl
- The supported set of CA certificates (located in
- Proxy management tools:
- Create proxies:
voms-proxy-init
- Show proxy info:
voms-proxy-info
- Destroy the current proxy:
voms-proxy-destroy
- Create proxies:
- Data transfer tools:
- HTTP/plain FTP protocol tools (via system dependencies):
wget
andcurl
: standard tools for downloading files with HTTP and FTP
- Transfer clients
GFAL
-based client (gfal-copy
and others). GFAL supports SRM, XRootD, and HTTP protocols.- The
stashcp
data federation client - The XRootD command line client,
xrdcp
- HTTP/plain FTP protocol tools (via system dependencies):
- Troubelshooting tool:
osg-system-profiler
At some sites, these tools may not be available at the pilot launch. To setup the environment, do the following:
user@host $ source $OSG_GRID/setup.sh
This should be done by a pilot job, not by the end-user payload.
The Worker Node Environment¶
The following table outlines the various important directories and information in the worker node environment.
A job running on an OSG worker node can refer to each directory using the corresponding environment variable.
Several of them are defined as options in your OSG-Configure .ini
files in /etc/osg/config.d
.
Custom variables and those that aren't listed may be defined in the Local Settings section.
Environment Variable | OSG-Configure section/option | Purpose | Notes |
---|---|---|---|
$OSG_GRID |
Storage /grid_dir |
Location of additional environment variables. | Pilots should source $OSG_GRID/setup.sh in order to guarantee the environment contains the worker node binaries in $PATH . |
$OSG_SQUID_LOCATION , |
Squid /location |
Location of a HTTP caching proxy server | Utilize this service for downloading files via HTTP for cache-friendly workflows. |
$OSG_WN_TMP |
Storage /worker_node_temp |
Temporary storage area workspace for pilot job(s) | Local to each worker node. See this section below for details. |
$X509_CERT_DIR |
Location of the CA certificates | If not defined, defaults to /etc/grid-security/certificates . |
|
$_CONDOR_SCRATCH_DIR |
Suggested temporary storage for glideinWMS-based payloads. | Users should prefer this environment variable over $OSG_WN_TMP if running inside glideinWMS. |
OSG_WN_TMP¶
As described above OSG_WN_TMP
is a temporary storage area on each worker node for
pilot jobs to use as temporary scratch space.
Its value is set through the configuration of your CE.
For site administrators¶
Filling local scratch disk
The directory specified by the OSG_WN_TMP
environment variable is used by pilot jobs as a temporary staging area
for user job data during the lifetime of the pilot.
If many pilot jobs do not exit cleanly (e.g., due to preemption), this may result in the local scratch directory
filling up, which could negatively affect other jobs running on the impacted node.
Site administrators are responsible for cleaning up the contents of $OSG_WN_TMP
(see table above for size recommendations).
We recommend one of the following solutions:
-
(Recommended) Use batch-system capabilities to create directories in the job scratch directory and bind mount them for the job so that the batch system performs the clean up.
-
For HTCondor batch systems, HTCondor has this ability through MOUNT_UNDER_SCRATCH:
MOUNT_UNDER_SCRATCH = $(MOUNT_UNDER_SCRATCH), <PATH TO OSG_WN_TMP>
If using this method, space set aside for
OSG_WN_TMP
should be reallocated to the partition containing the job scratch directories. If using HTCondor, this will be the partition containing the path defined by the HTCondorEXECUTE
configuration variable. -
For Slurm batch systems, we recommend using the Lua plugin Slurm-tmpdir alongside prolog/epilog scripts (https://slurm.schedmd.com/prolog_epilog.html). This method will create per job
/scratch
and/tmp
directories which will be cleaned up after the job completes.
-
-
Periodically purge the directory (e.g.
tmpwatch
).
Job removal grace periods
Additionally, increasing the batch system grace period for job removal will give pilot jobs a better chance of
cleaning up after themselves.
For example, the time between scancel
triggering a SIGTERM
and a SIGKILL
is controlled by the value of the
KillWait configuration.
Consider increasing this grace period scaling with the number of cores given to a pilot job as there could be more
data to clean up with an increasing core count.
For VO managers¶
Note
The following advice applies to VO managers or maintainers of pilot software; end-users should contact their VO
for the proper locations to stage temporary work (often, this will be either $TMPDIR
or $_CONDOR_SCRATCH_DIR
).
Be careful with using $OSG_WN_TMP
; at some sites, this directory might be shared with other VOs. We recommend creating a new sub-directory as a precaution:
mkdir -p $OSG_WN_TMP/MYVO
export mydir=`mktemp -d -t MYVO`
cd $mydir
# Run the rest of your application
rm -rf $mydir
The pilot should utilize $TMPDIR
to communicate the location of temporary storage to payloads.
A significant number of sites use the batch system to make an independent directory for each user job, and change $OSG_WN_TMP
on the fly to point to this directory.
There is no way to know in advance how much scratch disk space any given worker node has available; recall, what disk space is available may be shared among a number of job slots.