Skip to content

OSG Technology Area Meeting, 31 May 2022

Announcements

Ziyang, our new IRIS-HEP fellow, has joined. He will be working on network monitoring for XCache container deployments

Triage Duty

  • This week: Carl
  • Next week: Mat
  • 15 (+2) open FreshDesk tickets
  • 1 (-1) open GGUS ticket

Jira (as of Monday)

# of tickets Δ State
190 +0 Open
31 -1 Selected for Dev
24 +0 In Progress
17 +1 Dev Complete
25 +3 Ready for Testing
0 +0 Ready for Release

OSG Software Team

  • Kubernetes Hackathon
    • Mat: Autoupdates for Flux v2 on PATh Facility; can be tested on the backfill pilots since they are old.
    • Mat: Fix backfill pilots and other broken pods on Tiger.
    • Mat: Assist Cannon in debugging Topology on Tiger.
    • BrianL: Test HTCondor-CE 5.1.4 by upgrading the Canary CE(s).
    • BrianL: Deploy Dex.
  • Some UW CSL-hosted services have had their certs expire. vdt.cs.wisc.edu (used for Koji) was migrated today.
  • AI (TimT): Release 9.9.0 release candidate of HTCondor in the Open Pool. Planning to deprecate old interfaces before the start of the next LTS series.
  • BrianB diagnosed shared port issues with 9.8.0 on the APs that he worked around by disabling IPv6. 9.9.0 should fix it. Tim will let Brian know once the upgrade is done, so Brian can re-enable IPv6.
  • AI (Mat) Create new Topology endpoints for improved OSDF support.
  • AI (Mat) Streamlining OSDF service deployment: add Topology endpoint returning a list of OSDF caches for a given path
  • AI (Carl) Topology COManage transition: pull contact db into COManage contact results where missing

Discussion

None this week

Support Update

  • JLab (BrianL): JLab and their ScotsGrid site are seeing low CPU utilization from the Slurm side
  • Carl: Requests assistance on unregistered pilot container Gratia probes.
  • UTC-Epyc (Mat): Requests assistance for diagnosing random pilot deaths. Because rsyslog is broken on the site, we do not have pilot logs for the failed pilots.

OSG DevOps

  • A few feature requests / bug fixes for StashCP. One of them ready for pull request: https://github.com/opensciencegrid/stashcp/pull/41
  • Shoveler auto-update is working great! Increased token lifetime on lightweight issuer to 1 hour.
  • Did a large pull request for GP-ARGO. Wondering if we should include the Site Name in the OSG map? Currently, we show Facility Name, Resource Group, and Resource names. But we don't show the site level in the map.

Discussion

None this week

OSG Release Team

  • Ready for Testing
    • gratia-probe 2.6.1
      • Replace AuthToken* references with routed job attributes
      • Fix mismatched type concatenation in Gratia record send
      • Gratia: Set SCHEDD_CRON_LOG_NON_ZERO_EXIT = True in the htcondor-ce configuration
      • Remove certinfo file log messages
    • HTCondor-CE 5.1.4
      • Fix whole node job glidein CPUs and GPUs exprs that caused held jobs
      • Fix bug where default CERequirements were being ignored
      • Pass whole node request from GlideinWMS to the batch system
      • Fix rrdtool dependencies to ease OSG 3.5 to 3.6 upgrade
    • GlideinWMS 3.9.4-3: Fix rrdtool dependencies
    • rrdtool 1.8.0-1.2-el7: make Python RRDtools available to GlideinWMS
    • xrootd-multiuser 2.0.4
    • osg-token-renewer 0.8.2
    • htvault-config-1.13
    • htgettoken 1.12
    • XCache 3.1.0
    • stashcp 6.7.5
    • HTCondor 9.0.13
      • Schedd and startd cron jobs can now log output upon non-zero exit
      • condor_config_val now produces correct syntax for multi-line values
      • The condor_run tool now reports submit errors and warnings to the terminal
      • Fix issue where Kerberos authentication would fail within DAGMan
      • Fix HTCondor startup failure with certain complex network configurations
    • Upcoming
      • HTCondor 9.9.0
        • A new authentication method for remote HTCondor administration
        • Several changes to improve the security of connections
        • Fix issue where DAGMan direct submission failed when using Kerberos
        • The submission method is now recorded in the job ClassAd
        • Singularity jobs can now pull from Docker style repositories
        • The OWNER authorization level has been folded into the ADMINISTRATOR level

Discussion

None this week