for network troubleshooting in OSG/WLCG." persona: troubleshoot owners: ["networking-team@osg-htc.org"] status: active
tags: [troubleshoot, playbook, diagnostics]¶
🔧 Troubleshooter — Diagnose & Fix Network Issues¶
Systematic approach to identifying and resolving network and perfSONAR problems.
Quick Start (5 Minutes)¶
Is it a Network Problem?¶
- Gather facts: Run the Quick Triage Checklist —
collects system info, connectivity, services, logs
- Basic diagnostics: Follow **[Network Troubleshooting Guide](
../../network-troubleshooting.md)** — contact procedures, support escalation
- Learn more: **[ESnet Troubleshooting Guide](
https://fasterdata.es.net/performance-testing/troubleshooting/)** — detailed network investigation
Is it a perfSONAR Problem?¶
-
perfSONAR FAQ — quick answers to common issues
-
OSG Debugging Guide — investigation steps
-
perfSONAR Official Docs — comprehensive reference
Diagnostic Tools & Guides¶
On the perfSONAR Host¶
Check system status:
-
Systemd services:
systemctl status perfsonar-* -
Container status:
podman ps -aordocker ps -a -
Container logs:
podman logs perfsonar-testpointordocker logs
Verify network configuration:
-
Triage Checklist — step-by-step verification
-
Multiple NIC Setup — for multi-interface issues
-
Host Tuning — audit kernel and NIC settings
Check firewall & security:
-
Security & Firewall Guide — required ports and rules
-
nftables rules:
nft list ruleset -
Port status:
ss -ltnp
Network Path Analysis¶
ESnet tools: ESnet Troubleshooting Guide
perfSONAR tools:
-
pScheduler: pScheduler documentation
-
Test API: Query test meshes and historical results
-
Measurement archive: Access stored results via web interface
Common Scenarios & Playbooks¶
Container Won't Start¶
Playbook: Container Startup Issues (in progress)
Quick checks:
-
Image available:
podman images | grep perfsonar -
Volumes mounted:
podman volume ls -
Ports available:
ss -ltnp | grep -E '(443|5001|9000|8080)' -
Logs:
podman logs perfsonar-testpoint
Tests Not Running¶
Playbook: Tests Not Running (in progress)
Quick checks:
-
pSConfig enrolled:
psconfig remote list -
Mesh connectivity: Can reach
psconfig.opensciencegrid.org? -
pScheduler agent:
systemctl status perfsonar-pscheduler-agent -
Log errors:
podman logs perfsonar-testpoint | grep -i error
High Latency / Slow Tests¶
Playbook: Performance Issues (in progress)
Quick checks:
-
Host tuning: Run
fasterdata-tuning.shaudit mode -
NIC settings: Check MTU, GRO, GSO, ring buffers
-
Network load: Peak bandwidth during test time?
-
Competing tests: Multiple tests running simultaneously?
Firewall Blocking Tests¶
Playbook: Firewall & Network Access (in progress)
Quick checks:
-
Required ports: Security & Firewall Guide
-
Test connectivity: Can reach remote perfSONAR instances?
-
Firewall logs: Check local and campus firewall rules
-
DNS resolution: Can resolve perfSONAR hosts?
Escalation & Support¶
When to contact support:
Level 1: Self-Service Diagnostics¶
-
Run Triage Checklist
-
Consult perfSONAR FAQ
-
Review OSG Debugging Document
Level 2: Site-Specific Support¶
-
Contact your site's network administrator
-
Check local firewall, VLAN, NIC configuration
-
Verify DNS, IP routing, upstream connectivity
Level 3: OSG/WLCG Support¶
-
OSG sites: GOC Support Ticket
-
Include: hostname, triage checklist results, error messages, logs
-
WLCG sites: GGUS Ticket → "WLCG Network Throughput" or "WLCG perfSONAR support"
Level 4: perfSONAR Community¶
-
perfSONAR Community — active support
-
perfSONAR Documentation — comprehensive reference
-
GitHub Issues — report bugs
Related Resources¶
Setup & Installation¶
-
Quick Deploy Guide — initial installation help
-
Installation Guide — detailed setup steps
-
Deployment Models — choosing the right setup
Configuration & Optimization¶
-
Host Tuning — performance optimization
-
Multiple NIC Setup — multi-interface configuration
Understanding the System¶
-
perfSONAR in OSG/WLCG — why perfSONAR matters
-
Architecture Overview — system design and data flow