OSG Technology Area Meeting, 13 August 2024¶
- Coordinates: Conference: +1-415-655-0002, PIN: 146 266 9392, https://morgridge-org.zoom.us/j/91987518094 (password sent separately)
- Attending: BrianL, Derek, Marco, Mat, Matt
Announcements¶
- Doc focus this Friday
- PATh PI meeting for the next three days; Brian will be on call
- Tim T out part of this week and all of next week
Triage Duty¶
Triage duty shifts Tue-Mon
- This week: Matt (replacing TimT)
- Next week: ?
- 7 (+4) open FreshDesk tickets
- 0 (+0) open GGUS ticket
Jira (as of Monday morning)¶
# of tickets | Δ | State |
---|---|---|
220 | +0 | Open |
14 | -3 | Selected for Dev |
35 | +3 | In Progress |
16 | +0 | Dev Complete |
4 | +3 | Ready for Testing |
2 | +1 | Ready for Release |
OSG Software Team¶
Doc focus this Friday
-
AI (Matt): Kuantifier status:
If Kubernetes Jobs don't have a resource request, then the processour count would show up as 0; failing loudly on such as misconfiguration is not really possible, but this can be added as a warning, along with notes in the Helm chart and an install guide.
-
AI (Mat): ARM:
Koji builders are ready; no mass rebuid because that would require bumping the package release numbers. Mat has been creating tickets to rebuild individual software as needed.
Next step for ARM is to add integration testing -- we will need to make VM images for the VM Universe jobs. Nebraska has an ARM machine in a Kubernetes cluster; we may be able to make use of that.
-
AI (Matt): EL9 repo
- AI (Mat): Contribute VOMS patches upstream
- AI (BrianL): Prepare tickets for this Friday's doc focus
- BrianL working on purchasing USB hubs for Yubikeys
Discussion¶
-
Marco continuing work on GlideinWMS development release; release canddiates are available. Fermilab is shutting down for the last week of August and first week of September.
-
Derek reporting low GPU utilization for NRAO Glideins on NRP; Brian will show him how to log in to the PATh Facility AP so he can see the status of NRAO's jobs.
-
Matt fixed the repo-rsync server, which was down due to a mismatch of VLANs between the Service and Pods.
Support Update¶
- JLab (Mat): Support setting up their Pelican Origin
- JLab (Matt): Troubleshoot crashing OAuth credmon
- PATh Facility (Mat): missing GPUs
- PATH-UNL had nodes shut down due to overheating;
- PATH-Expanse had GPU pods (and CPU pods) failing to start -- at first we thought it was the return of a volume mounting issue we've seen before (see https://opensciencegrid.atlassian.net/issues/INF-1672); later we discovered that it was due to an outage at SDSC.
DevOps¶
None this week
OSG Release Team¶
- Ready for Testing
- htcondor-ce
- osdf-server-7.9.3
- xrootd-5.7.0
- igtf-ca-certs
- Ready for Release
- GlideinWMS 3.10.7
Discussion¶
- IGTF CA certs will be released next week due to the holiday
- Full support periods for HTC23 and OSG23 are not fully aligned
- Support for OSG 23 lasts several months longer
- Marco has been using OSG noarch packages successfully in ARM