Skip to content

Installing and Maintaining HTCondor-CE

The HTCondor-CE software is a job gateway for an OSG Compute Entrypoint (CE). As such, the OSG will submit resource allocation requests (RARs) jobs to your HTCondor-CE and it will handle authorization and delegation of RARs to your local batch system. In OSG today, RARs are sent to CEs as pilot jobs from a factory, which in turn are able to accept and run end-user jobs. See the upstream documentation for a more detailed introduction.

Use this page to learn how to install, configure, run, test, and troubleshoot an OSG HTCondor-CE.

OSG Hosted CE

Unless you plan on running more than 10k concurrently running RARs or plan on making frequent configuration changes, we suggest requesting an OSG Hosted CE.

Note

If you are installing an HTCondor-CE for use outside of the OSG, consult the upstream documentation instead.

Before Starting

Before starting the installation process, consider the following points, consulting the upstream references as needed (HTCondor-CE 5):

  • User IDs: If they do not exist already, the installation will create the Linux users condor (UID 4716) and gratia You will also need to create Unix accounts for each collaboration that you wish to support. See details in the 'Configuring authentication' section below.
  • SSL certificate: The HTCondor-CE service uses a host certificate at /etc/grid-security/hostcert.pem and an accompanying key at /etc/grid-security/hostkey.pem
  • DNS entries: Forward and reverse DNS must resolve for the HTCondor-CE host
  • Network ports: The pilot factories must be able to contact your HTCondor-CE service on port 9619 (TCP)
  • Access point/login node: HTCondor-CE should be installed on a host that already has the ability to submit jobs into your local cluster
  • File Systems: Non-HTCondor batch systems require a shared file system between the HTCondor-CE host and the batch system worker nodes.

As with all OSG software installations, there are some one-time (per host) steps to prepare in advance:

Installing HTCondor-CE

An HTCondor-CE installation consists of the job gateway (i.e., the HTCondor-CE job router) and other support software (e.g., osg-configure, a Gratia probe for OSG accounting). To simplify installation, OSG provides convenience RPMs that install all required software.

  1. Clean yum cache:

    [email protected] # yum clean all --enablerepo=*
    
  2. Update software:

    [email protected] # yum update
    

    This command will update all packages

  3. (Optional) If your batch system is already installed via non-RPM means and is in the following list, install the appropriate 'empty' RPM. Otherwise, skip to the next step.

    If your batch system is… Then run the following command…
    HTCondor yum install empty-condor --enablerepo=osg-empty
    SLURM yum install empty-slurm --enablerepo=osg-empty
  4. (Optional) If your HTCondor batch system is already installed via non-OSG RPM means, add the line below to /etc/yum.repos.d/osg.repo. Otherwise, skip to the next step.

    exclude=condor
    
  5. Select the appropriate convenience RPM:

    If your batch system is... Then use the following package...
    HTCondor osg-ce-condor
    LSF osg-ce-lsf
    PBS osg-ce-pbs
    SGE osg-ce-sge
    SLURM osg-ce-slurm
  6. Install the CE software where <PACKAGE> is the package you selected in the above step.:

    [email protected] # yum install <PACKAGE>
    

Configuring HTCondor-CE

There are a few required configuration steps to connect HTCondor-CE with your batch system and authentication method. For more advanced configuration, see the section on optional configurations.

Configuring the local batch system

To configure HTCondor-CE to integrate with your local batch system, please refer to the upstream documentation.

Configuring authentication

HTCondor-CE clients will submit RARs accompanied by bearer tokens declaring their association with a given collaboration and what permissions the collaboration has given the client The osg-scitokens-mapfile, pulled in by the osg-ce package, provides default token to local user mappings. To accept RARs from a particular collaboration:

  1. Create the Unix account(s) corresponding to the last field in the default mapfile: /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf. For example, to add support for the OSPool, create the osg user account on the CE and across your cluster.

  2. (Optional) if you wish to change the user mapping, copy the relevant mapping from /usr/share/condor-ce/mapfiles.d/osg-scitokens-mapfile.conf to a .conf file in /etc/condor-ce/mapfiles.d/ and change the last field to the desired username. For example, if you wish to add support for the OSPool but prefer to map OSPool pilot jobs to the osgpilot account that you created on your CE and across your cluster, you could add the following to /etc/condor-ce/mapfiles.d/50-ospool.conf:

    # OSG
    SCITOKENS /^https\:\/\/scitokens\.org\/osg\-connect,/ osgpilot
    

For more details of the mapfile format, consult the "SciTokens" section of the upstream documentation.

Automatic configuration

The OSG CE metapackage brings along a configuration tool, osg-configure, that is designed to automatically configure the different pieces of software required for an OSG HTCondor-CE:

  1. Enable your batch system in the HTCondor-CE configuration by editing the enabled field in the /etc/osg/config.d/20-<YOUR BATCH SYSTEM>.ini:

    enabled = True
    
  2. Read through the other .ini files in the /etc/osg/config.d directory and make any necessary changes. See the osg-configure documentation for details.

  3. Validate the configuration settings

    [email protected] # osg-configure -v
    
  4. Fix any errors (at least) that osg-configure reports.

  5. Once the validation command succeeds without errors, apply the configuration settings:

    [email protected] # osg-configure -c
    

Optional configuration

In addition to the configurations above, you may need to further configure how pilot jobs are filtered and transformed before they are submitted to your local batch system or otherwise change the behavior of your CE. For detailed instructions, please refer to the upstream documentation:

Accounting with multiple CEs or local user jobs

Note

For non-HTCondor batch systems only

If your site has multiple CEs or you have non-grid users submitting to the same local batch system, the OSG accounting software needs to be configured so that it doesn't over report the number of jobs. Modify the value of SuppressNoDNRecords in /etc/gratia/htcondor-ce/ProbeConfig on each of your CE's so that it reads:

    :::file
    SuppressNoDNRecords="1"

Starting and Validating HTCondor-CE

For information on how to start and validate the core HTCondor-CE services, please refer to the upstream documentation

Troubleshooting HTCondor-CE

For information on how to troubleshoot your HTCondor-CE, please refer to the upstream documentation:

Registering the CE

To contribute to the the OSG Production Grid, your CE must be registered with the OSG. To register your resource:

  1. Identify the facility, site, and resource group where your HTCondor-CE is hosted. For example, the Center for High Throughput Computing at the University of Wisconsin-Madison uses the following information:

    Facility: University of Wisconsin
    Site: CHTC
    Resource Group: CHTC
    
  2. Using the above information, create or update the appropriate YAML file, using this template as a guide.

Getting Help

To get assistance, please use the this page.

Back to top