Skip to content

OSG Technology Area Meeting, 13 August 2024

Announcements

  • Doc focus this Friday
  • PATh PI meeting for the next three days; Brian will be on call
  • Tim T out part of this week and all of next week

Triage Duty

Triage duty shifts Tue-Mon

  • This week: Matt (replacing TimT)
  • Next week: ?
  • 7 (+4) open FreshDesk tickets
  • 0 (+0) open GGUS ticket

Jira (as of Monday morning)

# of tickets Δ State
220 +0 Open
14 -3 Selected for Dev
35 +3 In Progress
16 +0 Dev Complete
4 +3 Ready for Testing
2 +1 Ready for Release

OSG Software Team

Doc focus this Friday

  • AI (Matt): Kuantifier status:

    If Kubernetes Jobs don't have a resource request, then the processour count would show up as 0; failing loudly on such as misconfiguration is not really possible, but this can be added as a warning, along with notes in the Helm chart and an install guide.

  • AI (Mat): ARM:

    Koji builders are ready; no mass rebuid because that would require bumping the package release numbers. Mat has been creating tickets to rebuild individual software as needed.

    Next step for ARM is to add integration testing -- we will need to make VM images for the VM Universe jobs. Nebraska has an ARM machine in a Kubernetes cluster; we may be able to make use of that.

  • AI (Matt): EL9 repo

  • AI (Mat): Contribute VOMS patches upstream
  • AI (BrianL): Prepare tickets for this Friday's doc focus
  • BrianL working on purchasing USB hubs for Yubikeys

Discussion

  • Marco continuing work on GlideinWMS development release; release canddiates are available. Fermilab is shutting down for the last week of August and first week of September.

  • Derek reporting low GPU utilization for NRAO Glideins on NRP; Brian will show him how to log in to the PATh Facility AP so he can see the status of NRAO's jobs.

  • Matt fixed the repo-rsync server, which was down due to a mismatch of VLANs between the Service and Pods.

Support Update

  • JLab (Mat): Support setting up their Pelican Origin
  • JLab (Matt): Troubleshoot crashing OAuth credmon
  • PATh Facility (Mat): missing GPUs
    • PATH-UNL had nodes shut down due to overheating;
    • PATH-Expanse had GPU pods (and CPU pods) failing to start -- at first we thought it was the return of a volume mounting issue we've seen before (see https://opensciencegrid.atlassian.net/issues/INF-1672); later we discovered that it was due to an outage at SDSC.

DevOps

None this week

OSG Release Team

  • Ready for Testing
    • htcondor-ce
    • osdf-server-7.9.3
    • xrootd-5.7.0
    • igtf-ca-certs
  • Ready for Release
    • GlideinWMS 3.10.7

Discussion

  • IGTF CA certs will be released next week due to the holiday
  • Full support periods for HTC23 and OSG23 are not fully aligned
    • Support for OSG 23 lasts several months longer
  • Marco has been using OSG noarch packages successfully in ARM