Skip to content

Installing a perfSONAR Toolkit for WLCG/OSG

This guide walks WLCG/OSG site administrators through end-to-end installation, configuration, and validation of a perfSONAR Toolkit on Enterprise Linux 9 (EL9) using RPM packages. The Toolkit provides a full-featured perfSONAR installation with a local web interface for configuration and monitoring, plus a local measurement archive for data storage.

For upstream RPM installation documentation, see: https://docs.perfsonar.net/install_el.html


Choosing Between Toolkit and Testpoint

Use perfSONAR Toolkit (this guide) if you need

  • Local web interface for configuration, monitoring, and viewing measurement results
  • Local measurement archive to store test data on-site with your own retention policies
  • Full-featured installation with all perfSONAR capabilities
  • Site-specific data retention requirements or regulatory compliance needs
  • On-site troubleshooting access to historical measurement data without external dependencies

Use perfSONAR Testpoint instead if you prefer

  • Lightweight container-based deployment with minimal local resources
  • Central archiving where measurements are stored at a remote archive (WLCG/OSG central infrastructure)
  • Simplified updates via container image pulls rather than RPM package management
  • Reduced local storage requirements (no local measurement archive)

See Installing a perfSONAR Testpoint for the container-based deployment guide.


Prerequisites and Planning

Before you begin, it may be helpful to gather the following information:

  • Hardware details: hostname, BMC/iLO/iDRAC credentials (if used), interface names, available storage locations.

  • Network data: IPv4/IPv6 assignments for each NIC, default gateway, internal/external VLAN information.

  • Operational contacts: site admin email, OSG facility/site name, latitude/longitude.

Existing perfSONAR configuration

If replacing an existing instance, you may want to back up /etc/perfsonar/ files, especially lsregistrationdaemon.conf, and any container volumes. We have a script namedperfSONAR-update-lsregistration.sh to extract/save/restore registration config that you may want to use.

Quick capture of existing lsregistration config (if you have a src)

Download a temp copy:

curl -fsSL \
  https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/perfSONAR-update-lsregistration.sh \
  -o /tmp/update-lsreg.sh
chmod 0755 /tmp/update-lsreg.sh
Use the downloaded tool to extract a restore script:
/tmp/update-lsreg.sh extract --output /root/restore-lsreg.sh --local
Note: Repository clone instructions are in Step 2. Note: All shell commands assume an interactive root shell.


Step 1 – Install and Harden EL9

  1. Provision EL9: Install AlmaLinux, Rocky Linux, or RHEL 9 with the Minimal profile.

  2. Set the hostname and time sync: Pick the NIC that will own the default route for the hostname.

    hostnamectl set-hostname <testpoint-hostname>
    systemctl enable --now chronyd
    timedatectl set-timezone <Region/City>
    
  3. Disable unused services:

    systemctl disable --now firewalld NetworkManager-wait-online
    dnf remove -y rsyslog
    
    Why disable unused services?

    We recommend disabling unused services during initial provisioning to reduce complexity and avoid unexpected interference with network and container setup. Services such as firewalld, NetworkManager-wait-online, and rsyslog can alter networking state, hold boot or network events, or conflict with the automated nftables/NetworkManager changes performed by the helper scripts. Disabling non-essential services makes the install deterministic, reduces the host attack surface, and avoids delays or race conditions while configuring policy-based routing, nftables rules, and
    container networking.

  4. Update the system:

    bash dnf -y update

  5. Record NIC names: Document interface mappings for later PBR configuration.

    nmcli device status
    ip -br addr
    

Step 2 – Install perfSONAR Toolkit via RPM

After completing Step 1 (minimal OS hardening), install the perfSONAR Toolkit bundle using RPM packages.

Step 2.1 – Configure DNF Repositories

Configure DNF to access EPEL, CRB (CodeReady Builder), and perfSONAR repositories:

# Install EPEL repository
dnf install -y epel-release

# Non-RHEL Enable CRB (CodeReady Builder) repository
dnf config-manager --set-enabled crb  
# --OR--
# For RHEL Enable access to codeready-builder. 
# NOTE auto-install script from perfSONAR doesn't set this (tries "crb" above which fails for RHEL)
subscription-manager repos --enable codeready-builder-for-rhel-9-x86_64-rpms

# Install perfSONAR repository for EL9
dnf install -y http://software.internet2.edu/rpms/el9/x86_64/latest/packages/perfsonar-repo-0.11-1.noarch.rpm

# Refresh DNF cache
dnf clean all
What these repositories provide
  • EPEL (Extra Packages for Enterprise Linux): Community packages not in base EL9
  • CRB (CodeReady Builder): Additional development and build tools
  • perfSONAR repo: Official perfSONAR packages maintained by Internet2

Step 2.2 – Install perfSONAR Toolkit Bundle

Install the complete toolkit bundle:

dnf install -y perfsonar-toolkit

This bundle automatically includes:

  • Core perfSONAR measurement tools (pScheduler, OWAMP, traceroute, throughput tests)
  • perfsonar-toolkit-security: Firewall rules (nftables) and fail2ban configuration
  • perfsonar-toolkit-sysctl: Network tuning parameters optimized for measurements
  • perfsonar-toolkit-systemenv-testpoint: Automatic updates and logging configuration
  • Web interface: Local UI at https://<hostname>/toolkit
  • Measurement archive: Local OpenSearch and Logstash for storing test results

Installation takes approximately 5-10 minutes depending on network speed.

Alternative automated installation

perfSONAR provides a one-line automated installer script:

curl -s https://downloads.perfsonar.net/install | sh -s - toolkit

This script performs the same steps as above (configure repos + install bundle).

Step 2.3 – Run Post-Install Configuration Scripts

The toolkit bundle includes configuration scripts that must be run after installation:

# Configure system tuning parameters (sysctl)
/usr/lib/perfsonar/scripts/configure_sysctl

# Configure firewall rules
/usr/lib/perfsonar/scripts/configure_firewall install
What these scripts configure

configure_sysctl: - TCP congestion control algorithm (htcp instead of reno) - Maximum TCP buffer sizes for high-bandwidth paths - Network stack tuning for measurement workloads - Creates /etc/sysctl.d/perfsonar-sysctl.conf

configure_firewall: - Opens required ports for perfSONAR services (pScheduler, OWAMP, HTTP/HTTPS) - Configures nftables rules (compatible with existing rules) - Enables fail2ban with perfSONAR jails - Creates /etc/nftables.d/perfsonar.nft

Step 2.4 – Install Helper Scripts for PBR and Management

Install OSG/WLCG helper scripts for policy-based routing and advanced configuration:

# Install base packages for helper scripts
dnf -y install jq curl tar gzip rsync bind-utils \
    python3 iproute iputils procps-ng sed grep gawk

# Bootstrap helper scripts
curl -fsSL \
    https://raw.githubusercontent.com/osg-htc/networking/master/docs/perfsonar/tools_scripts/install_tools_scripts.sh \
    -o /tmp/install_tools_scripts.sh

chmod 0755 /tmp/install_tools_scripts.sh

/tmp/install_tools_scripts.sh /opt/perfsonar-toolkit

Verify bootstrap completed successfully:

# Check that all helper scripts were downloaded
ls -1 /opt/perfsonar-toolkit/tools_scripts/*.sh | wc -l
# Should show 17 shell scripts

# Verify key scripts are present and executable
ls -l /opt/perfsonar-toolkit/tools_scripts/{perfSONAR-pbr-nm.sh,perfSONAR-install-nftables.sh,check-perfsonar-dns.sh,fasterdata-tuning.sh}
Why install helper scripts?

The OSG/WLCG helper scripts provide automation for: - Multi-NIC policy-based routing configuration - DNS forward/reverse validation - Registration information management - Custom nftables rules integrated with PBR

These scripts are optional but highly recommended for sites with multiple network interfaces or complex routing requirements.


Step 3 – Configure Policy-Based Routing (PBR)

The script /opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh automates NetworkManager profiles and routing rule setup. It fills out and consumes the network configuration in /etc/perfSONAR-multi-nic-config.conf.

Modes of operation

By default the script now performs an in-place apply that adjusts routes, rules, and NetworkManager connection properties without deleting existing connections or flushing all system routes. This minimizes disruption and usually avoids the need for a reboot.

An optional destructive mode --rebuild-all performs the original full workflow: backup existing profiles, flush all routes and rules, remove every NetworkManager connection, then recreate connections from scratch. Use this only for initial deployments or when you must completely reset inconsistent legacy state.

Mode Flag Disruption When to use
In-place (default) (none) or --apply-inplace Low (interfaces stay up; rules adjusted) Routine updates, gateway changes, add routes
Full rebuild --rebuild-all High (connections removed; brief connectivity drop) First-time setup, severe misconfiguration

Safety Enhancements

  • Detects active SSH session interface and avoids extra disruption to that NIC in in-place mode.
  • Prompts are still skipped with --yes.
  • Dry-run preview supported via --dry-run (combine with --debug for verbose output).
  • Reboot is no longer generally required; only consider one if NetworkManager fails to apply the new rules cleanly.

Generate config file automatically (or preview)

Gateways required for addresses

Any NIC with an IPv4 address must also have an IPv4 gateway, and any NIC with an IPv6 address must have an IPv6 gateway. If the generator cannot detect a gateway, it adds a WARNING block to the generated file listing affected NICs. Edit NIC_IPV4_GWS/NIC_IPV6_GWS accordingly before applying changes.

Gateway prompts

During generation, the script attempts to detect gateways per-NIC. If a NIC has an IP address but no gateway could be determined, it will prompt you interactively to enter an IPv4 and/or IPv6 gateway (or - to skip). Prompts are skipped in non-interactive sessions or when you use --yes. Note, NICs without gateways are assumed to NOT be used for perfSONAR.

Preview generation (no changes):

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --generate-config-debug    

Generate and write the config file:

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --generate-config-auto 

The script writes the config file to /etc/perfSONAR-multi-nic-config.conf. Edit to adjust site-specific values (e.g., confirm DEFAULT_ROUTE_NIC, add NIC_IPV4_ADDROUTE entries) and verify the entries. Next step is to apply the network changes...

Apply changes (in-place default)

Connect via console for network changes

When applying network changes across an ssh connection, your session may be interrupted. Please try to run the perfSONAR-pbr-nm.sh script when connected either directly to the console or by using 'nohup' in front of the script invocation.

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --yes
If SSH connection drops during network reconfiguration:
  1. Access via BMC/iLO/iDRAC console or physical console
  2. Review /var/log/perfSONAR-multi-nic-config.log for errors
  3. Check network state with nmcli connection show and ip addr
  4. Restore from backup if needed: backups are in /var/backups/nm-connections-<timestamp>/
  5. Reapply config after corrections: /opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --yes

Full rebuild (destructive – removes all NM connections first)

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --rebuild-all --yes

The policy based routing script logs to /var/log/perfSONAR-multi-nic-config.log. After an in-place apply, a reboot is typically unnecessary. If connectivity or rules appear inconsistent (ip rule show / ip route mismatch), consider a manual NetworkManager restart:

systemctl restart NetworkManager

DNS: forward and reverse entries (required)

All IP addresses that will be used for perfSONAR testing MUST have DNS entries: a forward (A/AAAA) record and a matching reverse (PTR) record. This is required so remote test tools and site operators can reliably reach and identify your host, and because some measurement infrastructure and registration systems perform forward/reverse consistency checks.

  • For single-stack IPv4-only hosts: ensure A and PTR are present and consistent.
  • For single-stack IPv6-only hosts: ensure AAAA and PTR are present and consistent.
  • For dual-stack hosts: both IPv4 and IPv6 addresses used for testing must have matching forward and reverse records (A+PTR and AAAA+PTR).
Run the DNS checker

To validate forward/reverse DNS for addresses in /etc/perfSONAR-multi-nic-config.conf you can run a script:

/opt/perfsonar-toolkit/tools_scripts/check-perfsonar-dns.sh
Notes and automation tips:

  • The script above uses dig (bind-utils package) which is commonly available; you can adapt it to use host if preferred.
  • Run the check as part of your provisioning CI or as a pre-flight check before enabling measurement registration.
  • For large sites or many addresses, parallelize the checks (xargs -P) or use a small Python script that leverages dns.resolver for async checks.
  • If your PTR returns a hostname with a trailing dot, the script strips it before the forward check.

If any addresses fail these checks, correct the DNS zone (forward and/or reverse) and allow DNS propagation before proceeding with registration and testing.

Verify the routing policy:

nmcli connection show
ip rule show
ip route show table <table-id>

Confirm that non-default interfaces have their own routing tables and that the default interface owns the system default route.


Step 4 – Configure nftables, SELinux, and Fail2Ban

Toolkit automatic security hardening

The perfsonar-toolkit bundle automatically configured security during installation (Step 2):

  • nftables rules via /usr/lib/perfsonar/scripts/configure_firewall
  • fail2ban with perfSONAR jails for SSH and service protection
  • SELinux policies (if enforcing mode is enabled)

This step is optional and only needed if you want to:

  • Customize firewall rules beyond the toolkit defaults
  • Integrate with OSG helper scripts for PBR-derived SSH access control
  • Add site-specific security policies

Optional: Customize Security with Helper Scripts

Use /opt/perfsonar-toolkit/tools_scripts/perfSONAR-install-nftables.sh to configure additional hardened nftables rules integrated with your PBR configuration. This script can derive SSH allow-lists from your multi-NIC configuration.

Prerequisites:

  • nftables, fail2ban, and SELinux tools are already installed by the perfsonar-toolkit bundle
  • Multi-NIC configuration file at /etc/perfSONAR-multi-nic-config.conf (from Step 3)

Install/configure additional custom options

You can use the install script to install the options you want (selinux, fail2ban).

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-install-nftables.sh --selinux --fail2ban --yes
- Use `--yes` to skip the interactive confirmation prompt (omit it if you prefer to review the
  summary and answer manually).

- Add `--dry-run` for a rehearsal that only prints the planned actions.

The script writes nftables rules for perfSONAR services, derives SSH allow-lists from /etc/perfSONAR-multi-nic- config.conf, optionally adjusts SELinux, and enables Fail2ban jails—only if those components are already installed.

SSH allow-lists and validation
  • Derives SSH allow-lists from /etc/perfSONAR-multi-nic-config.conf (CIDR prefixes and addresses).
  • Validates nftables rules before writing.
  • Outputs: rules to /etc/nftables.d/perfsonar.nft, log to /var/log/perfSONAR-install-nftables.log, backups to /var/backups/.
Preview nftables rules before applying

You can preview the fully rendered nftables rules (no changes are made):

/opt/perfsonar-toolkit/tools_scripts/perfSONAR-install-nftables.sh --print-rules
Manually add extra management hosts/subnets

If you need to allow additional SSH sources not represented by your NIC-derived prefixes, edit /etc/nftables.d/perfsonar.nft and add entries to the appropriate sets. Example:

set ssh_access_ip4_subnets {
    type ipv4_addr
    flags interval
    elements = { 192.0.2.0/24, 198.51.100.0/25 }
}

set ssh_access_ip4_hosts {
    type ipv4_addr
    elements = { 203.0.113.10, 203.0.113.11 }
}

set ssh_access_ip6_subnets {
    type ipv6_addr
    flags interval
    elements = { 2001:db8:1::/64 }
}

set ssh_access_ip6_hosts {
    type ipv6_addr
    elements = { 2001:db8::10 }
}

Then validate and reload (root shell):

nft -c -f /etc/nftables.d/perfsonar.nft
systemctl reload nftables || systemctl restart nftables

Confirm nftables state and security services

Verification commands
nft list ruleset
sestatus
systemctl status fail2ban

You may want to document any site-specific exceptions (e.g., additional allowed management hosts) in your change log.


Step 5 – Start and Configure perfSONAR Services

The perfSONAR Toolkit installation automatically enables and starts all required services. This step verifies service health and completes first-time web interface configuration.

Step 5.1 – Verify perfSONAR Services

Check that all perfSONAR services are running:

systemctl status pscheduler-scheduler
systemctl status pscheduler-runner
systemctl status pscheduler-archiver
systemctl status pscheduler-ticker
systemctl status psconfig-pscheduler-agent
systemctl status owamp-server
systemctl status perfsonar-lsregistrationdaemon

All services should show active (running) status. If any service is not running, start it:

systemctl start <service-name>
What each service does
  • pscheduler-scheduler: Schedules measurement tests
  • pscheduler-runner: Executes scheduled tests
  • pscheduler-archiver: Archives measurement results to local and remote stores
  • pscheduler-ticker: Manages periodic tasks and cleanup
  • psconfig-pscheduler-agent: Processes pSConfig templates and creates scheduled tests
  • owamp-server: One-Way Active Measurement Protocol (latency/loss measurements)
  • perfsonar-lsregistrationdaemon: Registers this host with the global Lookup Service
Additional services (measurement archive)

The toolkit also runs OpenSearch and Logstash for local measurement archive:

systemctl status opensearch
systemctl status logstash

These services store measurement results locally for web UI display and historical analysis.

All services are configured to start automatically on boot via systemd.

Step 5.2 – Access the Web Interface

The perfSONAR Toolkit provides a comprehensive web interface for configuration and monitoring.

Access the web UI:

  1. Open a browser and navigate to: https://<your-hostname>/toolkit

  2. First-time setup wizard:

    On first access, you'll be guided through initial configuration:

  3. Create administrator account: Set username and password for web UI access

  4. Administrative information: Site name, location, contact details
  5. Host information: Verify hostname, addresses, and network interfaces
  6. Test configuration: Review default test settings (typically defaults are appropriate)
  7. Archive settings: Configure local and/or remote archiving

  8. Complete the wizard to enable full functionality

Web UI features

The web interface provides:

  • Dashboard: Real-time and historical measurement results with graphs
  • Test Configuration: Schedule on-demand or regular tests to remote endpoints
  • Administrative Info: Update site information, contacts, and registration details
  • Service Health: Monitor perfSONAR service status and system resources
  • Archive Configuration: Manage local archive retention and remote archive destinations
  • Host Details: View network interfaces, routes, and system information
Accessing web UI remotely

If you need to access the web UI from outside your local network:

  • Ensure firewall allows HTTPS (port 443) from your management networks
  • Consider using SSH port forwarding for secure remote access:
    ssh -L 8443:localhost:443 root@<perfsonar-host>
    
    Then access: https://localhost:8443/toolkit

Web UI URL: https://<your-hostname>/toolkit

For detailed web UI documentation, see: https://docs.perfsonar.net/manage_admin_info.html

Step 5.3 – Configure Automatic Updates

The perfSONAR Toolkit enables automatic updates by default using dnf-automatic.

Verify automatic updates are enabled:

systemctl status dnf-automatic.timer

The timer should show active and run daily to check for and install perfSONAR package updates.

Manual update check:

dnf check-update perfsonar\*

Apply updates manually (if needed):

dnf update perfsonar\*

# Restart affected services after updates
systemctl restart pscheduler-scheduler pscheduler-runner pscheduler-archiver pscheduler-ticker psconfig-pscheduler-agent
How automatic updates work
  • dnf-automatic runs daily (configured in /etc/dnf/automatic.conf)
  • Updates are downloaded and installed automatically
  • Security updates are prioritized
  • Services are restarted as needed by RPM post-install scripts
  • Update logs: /var/log/dnf.log and journalctl -u dnf-automatic
Update behavior

By default, the toolkit applies updates automatically. If you prefer manual control:

# Disable automatic updates
systemctl disable dnf-automatic.timer

# Re-enable later if desired
systemctl enable --now dnf-automatic.timer

Manual updates require regular monitoring to ensure security patches are applied promptly.


The perfSONAR Toolkit web interface uses HTTPS with self-signed certificates by default. For production deployments, replacing these with Let's Encrypt certificates provides:

  • Browser trust: No certificate warnings when accessing the web UI
  • Security: Industry-standard encryption with automatic renewals
  • Compliance: Meets security requirements for production infrastructure

This step is optional but highly recommended for production sites.

Step 6.1 – Prerequisites for Let's Encrypt

Before obtaining Let's Encrypt certificates, ensure:

  1. DNS is configured correctly:

Your hostname must have valid forward (A/AAAA) and reverse (PTR) DNS records that are publicly resolvable. Verify this with the DNS checker:

/opt/perfsonar-toolkit/tools_scripts/check-perfsonar-dns.sh
  1. Port 80 is accessible:

Let's Encrypt uses HTTP-01 challenge which requires port 80 to be open from the internet. Update your firewall:

# Add HTTP port to nftables (in addition to existing HTTPS)
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-install-nftables.sh --ports=80,443 --yes
  1. Apache is not listening on port 80:

The perfSONAR Toolkit Apache server should only listen on port 443 (HTTPS). Verify this:

# Check Apache is only listening on 443, not 80
ss -tlnp | grep :80
ss -tlnp | grep :443

# Should show port 443 with httpd, but port 80 should be free
If Apache is listening on port 80

The Toolkit's Apache configuration should not bind to port 80. If it is, check /etc/httpd/conf/httpd.conf and /etc/httpd/conf.d/*.conf for Listen 80 directives and comment them out:

# Find Listen directives
grep -r "^Listen 80" /etc/httpd/

# Edit the file(s) and comment out or change to Listen 443
vi /etc/httpd/conf/httpd.conf

# Restart Apache
systemctl restart httpd

Step 6.2 – Install Certbot

Install certbot using snapd (recommended method) or EPEL packages:

Option A: Install via Snap (Recommended)

# Install snapd
dnf install -y snapd
systemctl enable --now snapd.socket

# Wait for snapd to initialize
sleep 10

# Create symlink for classic snap support
ln -sf /var/lib/snapd/snap /snap

# Install certbot
snap install --classic certbot

# Create symlink for certbot command
ln -sf /snap/bin/certbot /usr/bin/certbot

Option B: Install via DNF (Alternative)

# Install certbot from EPEL
dnf install -y certbot

# Verify installation
certbot --version
Why use snap for certbot?

The Certbot developers recommend snap installation because: - Always provides the latest certbot version - Automatic updates via snap refresh - Consistent across distributions - Includes all necessary dependencies

EPEL packages work but may lag behind upstream releases.

Step 6.3 – Obtain Let's Encrypt Certificate

Use certbot in standalone mode to obtain your certificate. Replace <your-fqdn> with your host's fully-qualified domain name and <admin-email> with your email address (used for renewal notifications).

Obtain certificate (interactive):

# Stop Apache temporarily to free port 80
systemctl stop httpd

# Obtain certificate using standalone mode
certbot certonly --standalone \
    -d <your-fqdn> \
    -m <admin-email> \
    --agree-tos

# Restart Apache
systemctl start httpd

Example:

certbot certonly --standalone \
    -d ps-toolkit.example.org \
    -m [email protected] \
    --agree-tos

Non-interactive (for automation):

systemctl stop httpd

certbot certonly --standalone \
    -d <your-fqdn> \
    -m <admin-email> \
    --agree-tos \
    --non-interactive

systemctl start httpd
Certificate file locations

After successful issuance, certificates are stored at:

  • Full chain: /etc/letsencrypt/live/<your-fqdn>/fullchain.pem
  • Private key: /etc/letsencrypt/live/<your-fqdn>/privkey.pem
  • Chain only: /etc/letsencrypt/live/<your-fqdn>/chain.pem
  • Certificate only: /etc/letsencrypt/live/<your-fqdn>/cert.pem

The actual certificate files are in /etc/letsencrypt/archive/<your-fqdn>/ and the live/ directory contains symlinks to the latest versions.

Step 6.4 – Configure Apache to Use Let's Encrypt Certificate

Use the helper script to update Apache SSL configuration:

/opt/perfsonar-toolkit/tools_scripts/configure-toolkit-letsencrypt.sh <your-fqdn>

Example:

/opt/perfsonar-toolkit/tools_scripts/configure-toolkit-letsencrypt.sh ps-toolkit.example.org

This script:

  • Backs up the original Apache SSL configuration
  • Updates SSLCertificateFile to point to Let's Encrypt fullchain
  • Updates SSLCertificateKeyFile to point to Let's Encrypt private key
  • Adds or updates SSLCertificateChainFile

Verify Apache configuration syntax:

apachectl configtest

Reload Apache to apply changes:

systemctl reload httpd

Verify the certificate is in use:

# Check certificate via OpenSSL
echo | openssl s_client -connect <your-fqdn>:443 -servername <your-fqdn> 2>/dev/null | openssl x509 -noout -issuer -dates

# Should show:
# issuer=C=US, O=Let's Encrypt, CN=...
# notBefore=...
# notAfter=...

Test in browser:

Navigate to https://<your-fqdn>/toolkit and verify:

  • No certificate warnings
  • Certificate is issued by "Let's Encrypt"
  • Certificate is valid (green padlock icon)

Step 6.5 – Configure Automatic Certificate Renewal

Let's Encrypt certificates expire after 90 days. Configure automatic renewal to avoid expiration.

Test renewal process (dry run):

# Perform a test renewal without actually renewing
certbot renew --dry-run --pre-hook "systemctl stop httpd" --post-hook "systemctl start httpd"

If the dry run succeeds, configure automatic renewal:

Option A: Using Certbot Timer (Recommended)

Certbot automatically installs a systemd timer for renewals when installed via snap:

# Check timer status
systemctl list-timers | grep certbot

# If not present, enable it
systemctl enable --now snap.certbot.renew.timer

# Or for EPEL installation:
systemctl enable --now certbot-renew.timer

Option B: Using Cron (Alternative)

Add a cron job for automatic renewal:

# Create renewal script
cat > /usr/local/bin/certbot-renew.sh << 'EOF'
#!/bin/bash
# Renew Let's Encrypt certificates and reload Apache

certbot renew \
    --pre-hook "systemctl stop httpd" \
    --post-hook "systemctl start httpd" \
    --quiet

exit 0
EOF

chmod 0755 /usr/local/bin/certbot-renew.sh

# Add cron job (runs twice daily at 3:30 AM and 3:30 PM)
cat > /etc/cron.d/certbot-renew << 'EOF'
30 3,15 * * * root /usr/local/bin/certbot-renew.sh
EOF
Renewal frequency and timing
  • Certbot automatically checks certificates and only renews those expiring within 30 days
  • Running renewal checks twice daily ensures timely renewal even if one attempt fails
  • The --quiet flag suppresses output unless there's an error
  • Pre/post hooks stop and start Apache to free port 80 for the standalone authenticator

Verify automatic renewal is configured:

# For snap installation
systemctl status snap.certbot.renew.timer

# For EPEL installation
systemctl status certbot-renew.timer

# For cron-based renewal
crontab -l | grep certbot
# or
cat /etc/cron.d/certbot-renew

Step 6.6 – Monitor Certificate Expiration

Even with automatic renewal, monitor certificate expiration to catch renewal failures:

Check certificate expiration date:

# Check all certificates
certbot certificates

# Check specific certificate via OpenSSL
echo | openssl s_client -connect <your-fqdn>:443 -servername <your-fqdn> 2>/dev/null | openssl x509 -noout -dates

Set up expiration monitoring (optional):

Email alerts for expiration

Let's Encrypt sends expiration warning emails to the address provided during certificate issuance. Ensure this email address is monitored:

# Check configured email
grep email /etc/letsencrypt/renewal/<your-fqdn>.conf

You can also set up local monitoring using nagios, icinga, or a simple script:

#!/bin/bash
# check-cert-expiry.sh - Alert if certificate expires within 14 days

DOMAIN="<your-fqdn>"
WARN_DAYS=14

EXPIRY=$(echo | openssl s_client -connect ${DOMAIN}:443 -servername ${DOMAIN} 2>/dev/null | \
         openssl x509 -noout -enddate | cut -d= -f2)

EXPIRY_EPOCH=$(date -d "${EXPIRY}" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( (EXPIRY_EPOCH - NOW_EPOCH) / 86400 ))

if [ $DAYS_LEFT -lt $WARN_DAYS ]; then
    echo "WARNING: Certificate for ${DOMAIN} expires in ${DAYS_LEFT} days!"
    # Send alert via email, Slack, etc.
else
    echo "OK: Certificate valid for ${DAYS_LEFT} more days"
fi

Troubleshooting Let's Encrypt

Certificate issuance fails with 'Connection refused'

Symptoms: Certbot fails with "Failed to authenticate" or "Connection refused" errors during HTTP-01 challenge.

Diagnostic steps:

# Verify port 80 is open in firewall
nft list ruleset | grep "dport 80"

# Test port 80 accessibility from external host
curl -v http://<your-fqdn>/

# Check nothing is listening on port 80
ss -tlnp | grep :80

Solutions:

  • Add port 80 to nftables: /opt/perfsonar-toolkit/tools_scripts/perfSONAR-install-nftables.sh --ports=80,443 --yes
  • Ensure Apache is not listening on port 80 (should only listen on 443)
  • Verify DNS resolves correctly from public internet
  • Check network firewall/router allows inbound port 80
Certificate renewal fails

Symptoms: Certificate expires or renewal fails with errors in logs.

Diagnostic steps:

# Check renewal logs
journalctl -u snap.certbot.renew.timer -n 50
# or
grep certbot /var/log/syslog | tail -50

# Test renewal manually
certbot renew --dry-run --pre-hook "systemctl stop httpd" --post-hook "systemctl start httpd" -vvv

# Check certificate status
certbot certificates

Common causes:

  • Port 80 blocked: Verify firewall allows HTTP during renewal
  • Apache failed to stop/start: Check Apache service status
  • DNS changes: Verify hostname still resolves correctly
  • Rate limiting: Let's Encrypt has rate limits (5 renewals per 7 days per domain)

Solutions:

  • Fix firewall or DNS issues
  • Manually renew: certbot renew --force-renewal
  • If rate limited, wait 7 days before retrying
Browser shows old certificate after renewal

Symptoms: Certificate renewed successfully but browser still shows old/expired certificate.

Diagnostic steps:

# Check certificate files are updated
ls -la /etc/letsencrypt/live/<your-fqdn>/

# Verify Apache configuration points to correct files
grep SSLCertificate /etc/httpd/conf.d/ssl.conf

# Check Apache loaded the new certificate
systemctl status httpd

Solutions:

  • Reload or restart Apache: systemctl restart httpd
  • Clear browser cache and hard refresh (Ctrl+Shift+R)
  • Verify SSL configuration: apachectl -t -D DUMP_VHOSTS

Step 7 – Configure and Enroll in pSConfig

Enroll your toolkit host with the OSG/WLCG pSConfig service so tests are auto-configured. Use the "auto URL" for each FQDN you expose for perfSONAR (one or two depending on whether you split latency/throughput by hostname).

The easiest way to configure pSConfig is via the web interface:

  1. Navigate to: https://<your-hostname>/toolkit/admin?view=psconfig
  2. Click "Add Remote Configuration"
  3. Enter the auto URL: https://psconfig.opensciencegrid.org/pub/auto/<your-fqdn>
  4. Enable "Configure Archives" to automatically set up result archiving
  5. Save and restart the pSConfig agent

Option B: Command Line Configuration

Basic enrollment via command line:

# Add auto URLs (configures archives too) and show configured remotes
psconfig remote --configure-archives add \
    "https://psconfig.opensciencegrid.org/pub/auto/ps-lat-example.my.edu"

psconfig remote list

If there are any stale/old/incorrect entries, you can remove them:

psconfig remote delete "<old-url>"

Option C: Automated Enrollment Script

Automation tip: derive FQDNs from your configured IPs (PTR lookup) and enroll automatically. Review the list before applying.

For RPM Toolkit installs (non-container):

# Dry run only (show planned URLs):
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-auto-enroll-psconfig.sh --local -n

# Apply enrollment:
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-auto-enroll-psconfig.sh --local -v

# Verify configured remotes
psconfig remote list

For container-based installs:

# Dry run only (show planned URLs):
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-auto-enroll-psconfig.sh -n

# Apply enrollment:
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-auto-enroll-psconfig.sh -v

# Verify configured remotes
podman exec -it perfsonar-testpoint psconfig remote list
The auto enroll psconfig script details
  • Parses IP lists from /etc/perfSONAR-multi-nic-config.conf (NIC_IPV4_ADDRS / NIC_IPV6_ADDRS).
  • Performs reverse DNS lookups (getent/dig) to derive FQDNs.
  • Deduplicates while preserving discovery order.
  • Adds each https://psconfig.opensciencegrid.org/pub/auto/<FQDN> with --configure-archives.
  • Lists configured remotes and returns non-zero if any enrollment fails.

Integrate into provisioning CI by running with -n (dry-run) for approval and then -y once approved.


Step 8 – Register and Configure with WLCG/OSG

  1. OSG/WLCG registration workflow:

    Registration steps and portals
    • Register the host in OSG topology.
    • Create or update a GGUS ticket announcing the new measurement point.
    • In GOCDB, add the service endpoint org.opensciencegrid.crc.perfsonar-testpoint bound to this host.
  2. Document memberships:

    Update your site wiki or change log with assigned mesh names, feed URLs, and support contacts.

  3. Update Lookup Service registration:

    Option A: Web UI (Recommended)

    The easiest way to configure registration information is via the Toolkit web interface:

    1. Navigate to: https://<your-hostname>/toolkit/admin?view=host
    2. Fill in administrative information:
      • Site name, organization, location (city, state, country, zip code)
      • Latitude and longitude (for map display)
      • Administrator name and email
      • Projects (WLCG, OSG, etc.)
    3. Save changes - the lsregistrationdaemon restarts automatically

    Option B: Command Line

    Edit /etc/perfsonar/lsregistrationdaemon.conf directly and restart the service:

    vi /etc/perfsonar/lsregistrationdaemon.conf
    
    # After editing, restart the registration daemon
    systemctl restart perfsonar-lsregistrationdaemon
    

    Option C: Helper Script

    Use the helper script to update registration. For RPM Toolkit installs, use the --local flag:

    For RPM Toolkit installs (non-container):

    # Preview changes only
    /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh update --local \
        --dry-run --site-name "Acme Co." --project WLCG \
        --admin-email [email protected] --admin-name "pS Admin"
    
    # Apply new settings and restart the daemon
    /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh create --local \
        --site-name "Acme Co." --domain example.org --project WLCG --project OSG \
        --city Berkeley --region CA --country US --zip 94720 \
        --latitude 37.5 --longitude -121.7469 \
        --admin-name "pS Admin" --admin-email [email protected]
    
    # Save current config (raw conf file)
    /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh save --output my-lsreg.conf --local
    
    # Or produce a self-contained executable restore script
    /opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh extract --output /root/restore-lsreg.sh --local
    

  4. Automatic updates

    The perfSONAR Toolkit uses dnf-automatic for automatic updates (already configured in Step 5).


Step 9 – SELinux Troubleshooting (If Enabled)

If you've enabled SELinux in enforcing mode, certain perfSONAR operations may generate audit log alerts. This section explains common issues and their fixes.

SELinux Basics for perfSONAR

SELinux enforces mandatory access controls based on file labels and process contexts. perfSONAR services run under specific contexts (e.g., lsregistrationdaemon_t, httpd_t), and accessed files must have compatible labels.

Check SELinux status:

sestatus
# Expected output: "SELinux status:  enabled" and "Current mode:  enforcing"

Common SELinux Issues and Fixes

Issue 1: /etc/perfsonar/lsregistrationdaemon.conf Has Wrong Label

Symptom: Audit log shows:

SELinux is preventing /usr/bin/perl from getattr access on the file /etc/perfsonar/lsregistrationdaemon.conf.

Root cause: The configuration file was created or modified (e.g., via restore or manual edit) and has an incorrect SELinux label. The file should be labeled lsregistrationdaemon_etc_t but may be labeled admin_home_t or have no label.

Fix: Apply restorecon to relabel the file:

# Restore the default SELinux context for the file
sudo /sbin/restorecon -v /etc/perfsonar/lsregistrationdaemon.conf

# Verify the label is now correct
ls -Z /etc/perfsonar/lsregistrationdaemon.conf
# Expected: system_u:object_r:lsregistrationdaemon_etc_t:s0

Automatic fix during restore:

Our perfSONAR-update-lsregistration.sh helper attempts to automatically apply restorecon after writing the configuration file. If restorecon is available on your system, it runs without user intervention:

# Use the helper to restore config (with automatic restorecon attempt)
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh restore --local \
    --input ./my-lsreg.conf

# Or extract and run a self-contained restore script
/opt/perfsonar-toolkit/tools_scripts/perfSONAR-update-lsregistration.sh extract --local \
    --output ./restore-lsreg.sh
./restore-lsreg.sh  # This script includes a restorecon attempt

Preventing the issue:

  • Always use the helper script (perfSONAR-update-lsregistration.sh) for configuration changes, as it handles restorecon automatically.
  • After any manual edits to /etc/perfsonar/lsregistrationdaemon.conf, explicitly run restorecon:
    sudo vi /etc/perfsonar/lsregistrationdaemon.conf
    sudo /sbin/restorecon -v /etc/perfsonar/lsregistrationdaemon.conf  # Fix labels immediately
    sudo systemctl restart perfsonar-lsregistrationdaemon
    

Issue 2: Other Services (ethtool, df, python3, postgresql, collect2) Generating Audit Alerts

Symptoms: Audit log shows alerts for various tools running in unexpected SELinux contexts:

SELinux is preventing /usr/sbin/ethtool from setopt access on netlink_generic_socket labeled httpd_t.
SELinux is preventing /usr/bin/df from getattr access on the directory /var/cache/openafs.
SELinux is preventing /usr/bin/python3.9 from execute access on the file ldconfig.
SELinux is preventing /usr/libexec/gcc/x86_64-redhat-linux/11/collect2 from search access on the directory snapd.

Root cause: These alerts typically stem from: - Tools invoked from web interfaces or services running in different SELinux contexts (e.g., httpd_t, postgresql_t) - Third-party or system utilities that lack complete SELinux policy coverage - Legitimate operations conflicting with default policy rules - Build/compilation tools invoked during package installation (usually transient)

Assessment and diagnosis:

  1. Check if the alert is related to perfSONAR functionality:
# View recent audit alerts
tail -100 /var/log/audit/audit.log

# Filter by command name to see context
grep "ethtool\|df\|python\|collect2\|ldconfig" /var/log/audit/audit.log | head -20

# Count alert types to identify patterns
ausearch -m AVC | awk -F'avc:' '{print $2}' | sort | uniq -c | sort -rn | head -10
  1. Determine the source process and context:

  2. Alerts mentioning httpd_t usually indicate the web UI triggered the operation (typically safe to allow)

  3. Alerts from postgresql_t indicate database tools being invoked (context boundary may not be required)
  4. Alerts from lsregistrationdaemon_t indicate the registration daemon needs access (fix labels first, not policies)
  5. Alerts from gcc/collect2 during package install are usually transient (monitor periodically)

  6. Create a local SELinux policy module (if operation is verified as safe)

# Generate policy module for a specific alert (example: ethtool)
sudo ausearch -c 'ethtool' --raw | audit2allow -M my-ethtool

# Review the generated module to ensure it's safe
cat my-ethtool.te

# Install the module (if approved and safe)
sudo semodule -i my-ethtool.pp

# Verify installation
semodule -l | grep my-ethtool

Specific service fixes:

ethtool netlink access (from httpd_t or lsregistrationdaemon_t): - Operation: Checking NIC link status, speed, duplex (safe) - Source: Web UI health checks or daemon monitoring - Fix:

sudo ausearch -c 'ethtool' --raw | audit2allow -M my-ethtool
sudo semodule -i my-ethtool.pp

df/stat on /var/cache/openafs (from lsregistrationdaemon_t): - Operation: Checking available disk space (safe) - Source: Registration daemon system health queries - Fix:

sudo ausearch -c 'df' --raw | audit2allow -M my-df
sudo semodule -i my-df.pp

python3/postgresql context issues (collect2, ldconfig): - Operation: Build tools, library checks during package installation (usually transient) - Assessment: These are typically safe but may be ephemeral - Fix (if persistent):

# For postgresql-related alerts
sudo ausearch -c 'validate-config' --raw | audit2allow -M my-postgresql
sudo semodule -i my-postgresql.pp

Audit log monitoring (prevents future surprises):

# Check for recent AVC denials
sudo ausearch -m AVC -ts recent | tail -50

# Create a daily monitoring script
cat > /usr/local/bin/check-selinux-alerts.sh << 'EOF'
#!/bin/bash
# Check for recent SELinux audit alerts

RECENT_ALERTS=$(ausearch -m AVC -ts recent 2>/dev/null | wc -l)

if [ $RECENT_ALERTS -gt 0 ]; then
    echo "WARNING: Found $RECENT_ALERTS recent SELinux alerts:"
    ausearch -m AVC -ts recent | tail -20
else
    echo "OK: No recent SELinux audit alerts"
fi
EOF

chmod 0755 /usr/local/bin/check-selinux-alerts.sh

# Add to cron (runs daily at 9 AM)
echo "0 9 * * * root /usr/local/bin/check-selinux-alerts.sh" | sudo tee /etc/cron.d/selinux-alert-check

Best practice for handling alerts:

  1. Log all alerts for 1-2 weeks to establish a baseline
  2. Review and categorize (safe vs. unsafe operations)
  3. Create local policy modules only for verified, safe operations
  4. Document each module in your change log
  5. Monitor weekly for new or unexpected alerts

Issue 3: Audit Log Flooding

Symptom: Audit log grows very large due to repeated identical alerts.

Mitigation:

# View count of each AVC alert type
ausearch -m AVC | awk -F'avc:' '{print $2}' | sort | uniq -c | sort -rn | head -20

# Suppress specific alerts (if they are verified as safe):
# Add rules to /etc/audit/audit.rules or /etc/audit/rules.d/
# (requires audit service restart and SELinux expertise)

Best Practices for SELinux with perfSONAR

  1. Use automated tools: Always use the helper scripts (perfSONAR-update-lsregistration.sh, perfSONAR-install-nftables.sh) which handle SELinux contexts automatically.

  2. Run restorecon after manual edits: If you manually edit any perfSONAR configuration file, immediately restore the SELinux context:

    sudo /sbin/restorecon -v /path/to/file
    

  3. Monitor audit logs regularly: Check /var/log/audit/audit.log weekly to catch new issues early.

  4. Document exceptions: If you create local SELinux policy modules, document them in your change log so future admins understand why they exist.

  5. Keep policies minimal: Only add local policy modules for operations that are verified as safe and necessary. Overly permissive policies increase security risk.


Step 10 – Post-Install Validation

Perform these checks before handing the host over to operations:

  1. System services:

    Verify perfSONAR services
    # Check all perfSONAR services are running
    systemctl status pscheduler-scheduler pscheduler-runner pscheduler-archiver pscheduler-ticker
    systemctl status psconfig-pscheduler-agent owamp-server perfsonar-lsregistrationdaemon
    
    # Check web server (Apache)
    systemctl status apache2 --no-pager
    
    # Check measurement archive services
    systemctl status opensearch logstash
    

    Ensure all services show active (running) status.

  2. Web interface access:

    Verify web UI is accessible
    # Test HTTPS connectivity to web UI
    curl -k -s -o /dev/null -w "%{http_code}" https://localhost/toolkit
    # Should return 200
    
    # Check Apache error logs if issues
    journalctl -u apache2 -n 50
    

    Access the web UI in a browser: https://<your-hostname>/toolkit

    Verify the dashboard loads and shows measurement results (may take a few minutes after first tests run).

  3. Service logs:

    Check perfSONAR service logs for errors
    # Check pScheduler logs for errors
    journalctl -u pscheduler-scheduler -n 50 --no-pager
    journalctl -u pscheduler-runner -n 50 --no-pager
    
    # Check pSConfig agent logs
    journalctl -u psconfig-pscheduler-agent -n 50 --no-pager
    
    # Check registration daemon
    journalctl -u perfsonar-lsregistrationdaemon -n 20 --no-pager
    
  4. Network path validation:

    Test network connectivity and routing

    Test throughput to a remote endpoint:

    pscheduler task throughput --dest <remote-testpoint>
    

    Check routing from the host:

    tracepath -n <remote-testpoint>
    ip route get <remote-testpoint-ip>
    

    Confirm traffic uses the intended policy-based routes (check ip route get <dest>).

  5. Security posture:

    Check firewall, fail2ban, and SELinux
    # Check nftables firewall rules
    nft list ruleset | grep perfsonar
    
    # Check fail2ban status (automatically installed by toolkit)
    systemctl status fail2ban
    fail2ban-client status
    
    # Check for recent SELinux denials
    if command -v ausearch >/dev/null 2>&1; then
        ausearch --message AVC --just-one
    elif [ -f /var/log/audit/audit.log ]; then
        grep -i "avc.*denied" /var/log/audit/audit.log | tail -5
    else
        echo "SELinux audit tools not available"
    fi
    

    Investigate any SELinux denials or repeated Fail2Ban bans.

  6. Certificate check (if using HTTPS):

    Verify certificate validity
    # Check certificate via HTTPS connection
    echo | openssl s_client -connect <your-hostname>:443 -servername <your-hostname> 2>/dev/null | openssl x509 -noout -dates -issuer
    

    Ensure the certificate is valid and not expired.

  7. Measurement archive:

    Verify local archive is collecting data

    Check that OpenSearch is receiving measurement results:

    # Check OpenSearch cluster health
    curl -k https://localhost:9200/_cluster/health?pretty
    
    # Check for measurement data (after tests have run)
    curl -k https://localhost:9200/_cat/indices?v | grep pscheduler
    

    Via web UI: Navigate to https://<your-hostname>/toolkit/archive to view stored measurements.

  8. Reporting:

    Run perfSONAR diagnostic reports

    Run the perfSONAR troubleshoot command and send outputs to operations:

    pscheduler troubleshoot
    


Ongoing Maintenance

  • Quarterly or as-needed: Re-validate routing policy and nftables rules after network changes or security audits.

  • Monthly or during maintenance windows: Apply OS updates (dnf update) and reboot during scheduled downtime.

  • Monitor psconfig feeds for changes in mesh participation and test configuration.

  • Track certificate expiry with certbot renew --dry-run if you rely on Let's Encrypt (automatic renewal is configured but monitoring is recommended).

  • Review container logs periodically for errors: podman logs perfsonar-testpoint and podman logs certbot.

  • Verify auto-update timer is active: systemctl list-timers perfsonar-auto-update.timer.


Troubleshooting

Networking Issues

Policy-based routing not working correctly

Symptoms: Traffic not using expected interfaces, routing to wrong gateway.

Diagnostic steps:

# Check routing rules
ip rule show

# Check routing tables
ip route show table all


# Test specific route lookup
ip route get <destination-ip>

# Check NetworkManager connections
nmcli connection show

# Review PBR script log
tail -100 /var/log/perfSONAR-multi-nic-config.log

Solutions:

  • Verify /etc/perfSONAR-multi-nic-config.conf has correct IPs and gateways
  • Reapply configuration: /opt/perfsonar-toolkit/tools_scripts/perfSONAR-pbr-nm.sh --yes
  • Reboot if rules are not being applied correctly
  • Check for conflicting NetworkManager or systemd-networkd rules
DNS resolution failing for test endpoints

Symptoms: perfSONAR tests fail with "unknown host" or DNS errors.

Diagnostic steps:

# Test DNS resolution from container
podman exec -it perfsonar-testpoint dig <remote-testpoint>

# Check container's resolv.conf

podman exec -it perfsonar-testpoint cat /etc/resolv.conf

# Verify forward and reverse DNS
/opt/perfsonar-toolkit/tools_scripts/check-perfsonar-dns.sh

Solutions:

  • Ensure DNS servers are correctly configured on host
  • Fix missing PTR records in DNS zones
  • Verify forward A/AAAA records match reverse PTR records

Certificate Issues

Let's Encrypt certificate issuance fails

Symptoms: Certbot fails with "Failed to authenticate" or "Connection refused" errors.

Diagnostic steps:

# Check if port 80 is open
nft list ruleset | grep "80"

# Verify Apache is NOT listening on port 80 in container
podman exec perfsonar-testpoint netstat -tlnp | grep :80

# Test port 80 accessibility from external host
curl -v http://<your-fqdn>/

# Run certbot in verbose mode

podman run --rm --net=host \
    -v /etc/letsencrypt:/etc/letsencrypt:Z \
    -v /var/www/html:/var/www/html:Z \
    docker.io/certbot/certbot:latest certonly \
    --standalone -d <SERVER_FQDN> -m <EMAIL> --dry-run -vvv

Common causes:

  • Port 80 blocked by firewall: Add with perfSONAR-install-nftables.sh --ports=80,443
  • Apache listening on port 80: Verify testpoint-entrypoint-wrapper.sh patched Apache correctly
  • DNS not propagated: Wait for DNS changes to propagate globally
  • Rate limiting: Let's Encrypt has rate limits; wait if you've hit them
Certificate not loaded after renewal

Symptoms: Old certificate still in use after automatic renewal.

Diagnostic steps:

# Check certificate files
ls -la /etc/letsencrypt/live/<fqdn>/

# Verify deploy hook is configured
podman logs certbot 2>&1 | grep "deploy hook"

# Check if container restarted
podman ps --format 'table {{.Names}}\t{{.Status}}'

# Manually restart testpoint
podman restart perfsonar-testpoint

Solutions:

  • Verify deploy hook script exists and is executable: /opt/perfsonar-toolkit/tools_scripts/certbot-deploy-hook.sh
  • Ensure deploy hook is mounted in container at: /etc/letsencrypt/renewal-hooks/deploy/certbot-deploy-hook.sh
  • Verify Podman socket is mounted in certbot container: /run/podman/podman.sock
  • Check deploy hook logs: journalctl -u perfsonar-certbot.service | grep deploy
  • Manually restart testpoint after renewals if deploy hook fails: podman restart perfsonar-testpoint

Note: Certbot automatically executes scripts in /etc/letsencrypt/renewal-hooks/deploy/ when certificates are renewed. Do not use --deploy-hook parameter with full paths ending in .sh as certbot will append -hook to the filename.

perfSONAR Service Issues

perfSONAR services not running

Symptoms: Web interface not accessible, tests not running.

Diagnostic steps:

# Check service status inside container
podman exec perfsonar-testpoint systemctl status apache2
podman exec perfsonar-testpoint systemctl status pscheduler-ticker
podman exec perfsonar-testpoint systemctl status owamp-server

# Check for errors in service logs
podman exec perfsonar-testpoint journalctl -u apache2 -n 50
podman exec perfsonar-testpoint journalctl -u pscheduler-ticker -n 50

Solutions:

  • Restart services inside container: podman exec perfsonar-testpoint systemctl restart apache2
  • Check Apache SSL configuration was patched correctly
  • Verify certificates are in place: ls -la /etc/letsencrypt/live/
  • Restart container: podman restart perfsonar-testpoint

Auto-Update Issues

Auto-update not working

Symptoms: Containers not updating despite new images available.

Diagnostic steps:

# Check timer status
systemctl status perfsonar-auto-update.timer
systemctl list-timers perfsonar-auto-update.timer

# Check service logs
journalctl -u perfsonar-auto-update.service -n 100

# Check update log
tail -50 /var/log/perfsonar-auto-update.log

# Manually test update
systemctl start perfsonar-auto-update.service

Solutions:

  • Enable timer if not active: systemctl enable --now perfsonar-auto-update.timer
  • Verify script exists and is executable: ls -la /usr/local/bin/perfsonar-auto-update.sh
  • Check podman-compose is installed and working
  • Review script for errors and update if needed

General Debugging Tips

Useful debugging commands

Container management:

# View all containers (running and stopped)
podman ps -a

# View container resource usage
podman stats

# Enter container for interactive debugging
podman exec -it perfsonar-testpoint /bin/bash

# View compose configuration
cd /opt/perfsonar-toolkit && podman-compose config

Networking:

# Check which process is listening on a port
ss -tlnp | grep <port>

# Test connectivity to remote testpoint
ping <remote-ip>
traceroute <remote-ip>

# Check nftables rules
nft list ruleset

Logs:

# System journal for container runtime
journalctl -u podman -n 100

# All logs from a container
podman logs perfsonar-testpoint --tail=100

# Follow logs in real-time
podman logs -f perfsonar-testpoint