Troubleshooting of GRACC services
Helpful dashboards
- GRACC Service Status
- GRACC Collector Stats
- RabbitMQ queues
- Probe Record Rate - example for given CE
- in addition, check on Kibana ProbeName records
- OSG Connect Summary - UChicago
- Site Transfer Summary
- Institutions contributing to the OSG by name
Issues
Selection of issues being investigated and actions taken in order to resolve them.
Logstash
Symptom:
- high usage of gracc archiver memory (e.g. ~12GB)
- logstash seems to be backed up and not responding
- RabbitMQ has high volume of queued messages (e.g. ~100k)
Action:
systemctl restart elasticsearch.service
- added to check_mksystemd monitoring for elasticsearch and elasticsearch-ro
- for continuous high rate disconnections in RabbitMQ contact Marina Krenz
GRACC-APEL
Update missing records
-
It may happen site has problem with sending accouting data to GRACC in particular month so when fixed they ask us correct accoutning in APEL report. In such case do: 1) From
hcc-grace-itb.unl.edu
run manually$ cd /root/gracc-apel/; ./apel_report YYYY MM
2) Move file
$ mv /root/gracc-apel/MM_YYYY.apel /var/spool/apel/outgoing/12345678/1234567890abcd
3) Send off
$ ssmsend