OSG Technology Area Meeting, 31 May 2022¶
- Coordinates: Conference: +1-415-655-0002, PIN: 146 266 9392, https://morgridge-org.zoom.us/j/91987518094 (password sent separately)
- Attending: BrianL, Carl, Mat, TimT
Announcements¶
Ziyang, our new IRIS-HEP fellow, has joined. He will be working on network monitoring for XCache container deployments
Triage Duty¶
- This week: Carl
- Next week: Mat
- 15 (+2) open FreshDesk tickets
- 1 (-1) open GGUS ticket
Jira (as of Monday)¶
# of tickets | Δ | State |
---|---|---|
190 | +0 | Open |
31 | -1 | Selected for Dev |
24 | +0 | In Progress |
17 | +1 | Dev Complete |
25 | +3 | Ready for Testing |
0 | +0 | Ready for Release |
OSG Software Team¶
- Kubernetes Hackathon
- Mat: Autoupdates for Flux v2 on PATh Facility; can be tested on the backfill pilots since they are old.
- Mat: Fix backfill pilots and other broken pods on Tiger.
- Mat: Assist Cannon in debugging Topology on Tiger.
- BrianL: Test HTCondor-CE 5.1.4 by upgrading the Canary CE(s).
- BrianL: Deploy Dex.
- Some UW CSL-hosted services have had their certs expire. vdt.cs.wisc.edu (used for Koji) was migrated today.
- AI (TimT): Release 9.9.0 release candidate of HTCondor in the Open Pool. Planning to deprecate old interfaces before the start of the next LTS series.
- BrianB diagnosed shared port issues with 9.8.0 on the APs that he worked around by disabling IPv6. 9.9.0 should fix it. Tim will let Brian know once the upgrade is done, so Brian can re-enable IPv6.
- AI (Mat) Create new Topology endpoints for improved OSDF support.
- AI (Mat) Streamlining OSDF service deployment: add Topology endpoint returning a list of OSDF caches for a given path
- AI (Carl) Topology COManage transition: pull contact db into COManage contact results where missing
Discussion¶
None this week
Support Update¶
- JLab (BrianL): JLab and their ScotsGrid site are seeing low CPU utilization from the Slurm side
- Carl: Requests assistance on unregistered pilot container Gratia probes.
- UTC-Epyc (Mat): Requests assistance for diagnosing random pilot deaths. Because rsyslog is broken on the site, we do not have pilot logs for the failed pilots.
OSG DevOps¶
- A few feature requests / bug fixes for StashCP. One of them ready for pull request: https://github.com/opensciencegrid/stashcp/pull/41
- Shoveler auto-update is working great! Increased token lifetime on lightweight issuer to 1 hour.
- Did a large pull request for GP-ARGO. Wondering if we should include the Site Name in the OSG map? Currently, we show Facility Name, Resource Group, and Resource names. But we don't show the site level in the map.
Discussion¶
None this week
OSG Release Team¶
- Ready for Testing
- gratia-probe 2.6.1
- Replace AuthToken* references with routed job attributes
- Fix mismatched type concatenation in Gratia record send
- Gratia: Set SCHEDD_CRON_LOG_NON_ZERO_EXIT = True in the htcondor-ce configuration
- Remove certinfo file log messages
- HTCondor-CE 5.1.4
- Fix whole node job glidein CPUs and GPUs exprs that caused held jobs
- Fix bug where default CERequirements were being ignored
- Pass whole node request from GlideinWMS to the batch system
- Fix rrdtool dependencies to ease OSG 3.5 to 3.6 upgrade
- GlideinWMS 3.9.4-3: Fix rrdtool dependencies
- rrdtool 1.8.0-1.2-el7: make Python RRDtools available to GlideinWMS
- xrootd-multiuser 2.0.4
- osg-token-renewer 0.8.2
- htvault-config-1.13
- htgettoken 1.12
- XCache 3.1.0
- stashcp 6.7.5
- HTCondor 9.0.13
- Schedd and startd cron jobs can now log output upon non-zero exit
- condor_config_val now produces correct syntax for multi-line values
- The condor_run tool now reports submit errors and warnings to the terminal
- Fix issue where Kerberos authentication would fail within DAGMan
- Fix HTCondor startup failure with certain complex network configurations
- Upcoming
- HTCondor 9.9.0
- A new authentication method for remote HTCondor administration
- Several changes to improve the security of connections
- Fix issue where DAGMan direct submission failed when using Kerberos
- The submission method is now recorded in the job ClassAd
- Singularity jobs can now pull from Docker style repositories
- The OWNER authorization level has been folded into the ADMINISTRATOR level
- HTCondor 9.9.0
- gratia-probe 2.6.1
Discussion¶
None this week