Installing and Maintaining HTCondor-CE¶
The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction.
Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE.
OSG Hosted CE
Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE.
If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead.
Before starting the installation process, consider the following points, consulting the upstream references as needed (HTCondor-CE 5):
- User IDs: If they do not exist already, the installation will create the Linux users
condor(UID 4716) and
gratiaYou will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below.
SSL certificate: The HTCondor-CE service uses a host certificate and an accompanying key.
- If using a Let's Encrypt cert, install these as
- If using an IGTF cert, install these as
See details in the Host Certificates overview. - DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host - Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP) - Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster - File Systems: Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes.
- If using a Let's Encrypt cert, install these as
As with all OSG software installations, there are some one-time (per host) steps to prepare in advance:
- Ensure the host has a supported operating system
- Install the appropriate EPEL and OSG Yum repositories for your operating system.
- Obtain root access to the host
- Install CA certificates
An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software
osg-configure, a Gratia probe for OSG accounting).
To simplify installation, OSG provides convenience RPMs that install all required software.
Clean yum cache:
[email protected] # yum clean all --enablerepo=*
[email protected] # yum update
This command will update all packages
(Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step.
If your batch system is… Then run the following command… HTCondor
yum install empty-condor --enablerepo=osg-empty
yum install empty-slurm --enablerepo=osg-empty
(Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to
/etc/yum.repos.d/osg.repo. Otherwise, skip to the next step.
Select the appropriate convenience RPM:
If your batch system is... Then use the following package... HTCondor
Install the CE software where
<PACKAGE>is the package you selected in the above step.:
[email protected] # yum install <PACKAGE>
There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations.
Configuring the local batch system¶
To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation.
HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their
association with a given collaboration and what permissions the collaboration has given the client
osg-scitokens-mapfile, pulled in by the
osg-ce package, provides default token to local user mappings.
To accept RARs from a particular collaboration:
Create the Unix account(s) corresponding to the last field in the default mapfile:
/usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf. For example, to add support for the OSPool, create the
osguser account on the CE and across your cluster.
(Optional) if you wish to change the user mapping, copy the relevant mapping from
/etc/condor-ce/mapfiles.d/and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the
osgpilotaccount that you created on your CE and across your cluster, you could add the following to
# OSG SCITOKENS /^https\:\/\/scitokens\.org\/osg\-connect,/ osgpilot
For more details of the mapfile format, consult the "SciTokens" section of the upstream documentation.
Bannning a collaboration¶
Note that if you have not created the mapped user per the above section, it is not strictly necessary to add a ban mapping. HTCondor-CE will only authenticate remote RAR submission for the relevant credential if the Unix user exists.
To explicitly ban a remote submitter from your HTCondor-CE, add a line like the following to a file in
SCITOKENS /<TOKEN ISSUER>,<TOKEN SUBJECT>/ <USER>@banned.htcondor.org
<CREDENTIAL> with a regular expression and
<USER> with an arbitrary user name.
For example, to ban OSPool pilots from your site, you could add the following to
SCITOKENS /^https\:\/\/scitokens\.org\/osg\-connect,/ [email protected]
The OSG CE metapackage brings along a configuration tool,
osg-configure, that is designed to automatically configure
the different pieces of software required for an OSG HTCondor-CE:
Enable your batch system in the HTCondor-CE configuration by editing the
enabledfield in the
/etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini:
enabled = True
Read through the other
.inifiles in the
/etc/osg/config.ddirectory and make any necessary changes. See the osg-configure documentation for details.
Validate the configuration settings
[email protected] # osg-configure -v
Fix any errors (at least) that
Once the validation command succeeds without errors, apply the configuration settings:
[email protected] # osg-configure -c
In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation:
Accounting with multiple CEs or local user jobs¶
For non-HTCondor batch systems only
If your site has multiple CEs or you have local users submitting to the same local batch system, the OSG accounting
software needs to be configured so that it doesn't over report the number of jobs.
Modify the value of
/etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it
Starting and Validating HTCondor-CE¶
For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation
For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation:
Registering the CE¶
To contribute capacity, your CE must be registered with the OSG Consortium. To register your resource:
Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information:
Facility: University of Wisconsin Site: CHTC Resource Group: CHTC
To get assistance, please use the this page.