Troubleshooting of GRACC services
Helpful dashboards
- GRACC Service Status
- GRACC Collector Stats
- RabbitMQ queues
- Probe Record Rate - example for given CE
- in addition, check on Kibana ProbeName records
- OSG Connect Summary - UChicago
- Site Transfer Summary
- Institutions contributing to the OSG by name
Selection of issues being investigated and actions taken in order to resolve them.
- high usage of gracc archiver memory (e.g. ~12GB)
- logstash seems to be backed up and not responding
- RabbitMQ has high volume of queued messages (e.g. ~100k)
systemctl restart elasticsearch.service
- added to check_mksystemd monitoring for elasticsearch and elasticsearch-ro
- for continuous high rate disconnections in RabbitMQ contact Marina Krenz
Update missing records
It may happen site has problem with sending accouting data to GRACC in particular month so when fixed they ask us correct accoutning in APEL report. In such case do: 1) From
run manually$ cd /root/gracc-apel/; ./apel_report YYYY MM
2) Move file
$ mv /root/gracc-apel/MM_YYYY.apel /var/spool/apel/outgoing/12345678/1234567890abcd
3) Send off
$ ssmsend