Running OSDF Cache in a Container¶
The OSG operates the Open Science Data Federation (OSDF), which provides organizations with a method to distribute their data in a scalable manner to thousands of jobs without needing to pre-stage data across sites or operate their own scalable infrastructure.
OSDF Caches transfer data to clients such as jobs or users. A set of caches are operated across the OSG for the benefit of nearby sites; in addition, each site may run its own cache in order to reduce the amount of data transferred over the WAN. This document outlines how to run a cache in a Docker container.
The OSDF cache was previously named "Stash Cache" and some documentation and software may use the old name.
Before starting the installation process, consider the following requirements:
- Docker: For the purpose of this guide, the host must have a running docker service
and you must have the ability to start containers (i.e., belong to the
- Network ports: The cache service requires the following open ports:
- Inbound TCP port 1094 for unauthenticated file access via the XRootD protocol (optional)
- Inbound TCP port 8000 for unauthenticated file access via HTTP(S) and/or
- Inbound TCP port 8443 for authenticated file access via HTTPS
- Outbound UDP port 9930 for reporting to
- File Systems: The cache needs host partitions to store user data. For improved performance and storage, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe).
- Host certificate: Required for authentication. See our host certificate documentation for instructions on how to request host certificates.
- Hardware requirements: We recommend that a cache has at least 10Gbps connectivity, 1 TB of disk space for the cache directory, and 12GB of RAM.
Registering the Cache¶
To be part of the OSDF, your cache must be registered with the OSG. You will need basic information like the resource name, hostname, host certificate DN, and the administrative and security contacts.
To register your cache host, follow the general registration instructions here.
The service type is
XRootD cache server.
This step must be completed before installation.
In your registration, you must specify which VOs your cache will serve by adding an
AllowedVOs list, with each line specifying a VO whose data you are willing to cache.
There are special values you may use in
ANY_PUBLICindicates that the cache is willing to serve public data from any VO.
ANYindicates that the cache is willing to serve data from any VO, both public and protected.
There are extra requirements for serving protected data:
- In addition to the cache allowing a VO in the
AllowedVOslist, that VO must also allow the cache in its
AllowedCacheslist. See the page on getting your VO's data into OSDF.
- There must be an authenticated XRootD instance on the cache server.
- There must be a
DNattribute in the resource registration with the subject DN of the host certificate
This is an example registration for a cache server that serves all public data:
MY_OSDF_CACHE: FQDN: my-cache.example.net Services: XRootD cache server: Description: OSDF cache server AllowedVOs: - ANY_PUBLIC
This is an example registration for a cache server that only serves protected data for the Open Science Pool:
MY_AUTH_OSDF_CACHE: FQDN: my-auth-cache.example.net Services: XRootD cache server: Description: OSDF cache server AllowedVOs: - OSG DN: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-auth-cache.example.net
This is an example registration for a cache server that serves all public data and protected data from the OSG VO:
MY_COMBO_OSDF_CACHE: FQDN: my-combo-cache.example.net Services: XRootD cache server: Description: OSDF cache server AllowedVOs: - OSG - ANY_PUBLIC DN: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Services/CN=my-combo-cache.example.net
Configuring the OSDF Cache¶
In addition to the required configuration above (ports and file systems), you may also configure the behavior of your cache with the following variables using an environment variable file:
Where the environment file on the docker host,
/opt/xcache/.env, has (at least) the following contents,
<YOUR_RESOURCE_NAME> with the name of your resource as
registered in Topology
<FQDN> with the public DNS name that should be used to contact your cache:
Providing a host certificate¶
The service will need a certificate for contacting central OSDF services and for authenticating to origins.
Follow our host certificate documentation to obtain a host certificate and key.
Then, volume-mount the host certificate to
and the key to
You must restart the container whenever you renew your certificate in order for the services to pick up the new certificate. If you automate certificate renewal, you should automate restarts as well. For example, if you are using Certbot for Let's Encrypt, you should write a "deploy hook" as documented on the Certbot site.
Further behavior of the cache can be configured by setting the following in the environment variable file:
XC_SPACE_LOW_WM: High-water and low-water marks for disk usage, as numbers between 0.00 (0%) and 1.00 (100%); when usage goes above the high-water mark, the cache will delete files until it hits the low-water mark.
XC_RAMSIZE: Amount of memory to use for storing blocks before writting them to disk. (Use higher for slower disks).
XC_BLOCKSIZE: Size of the blocks in the cache.
XC_PREFETCH: Number of blocks to prefetch from a file at once. This controls how aggressive the cache is to request portions of a file.
Running a Cache¶
Cache containers may be run with either multiple mounted host partitions (recommended) or a single host partition.
It is recommended to use a container orchestration service such as docker-compose or kubernetes whose details are beyond the scope of this document. The following sections provide examples for starting cache containers from the command-line as well as a more production-appropriate method using systemd.
Multiple host partitions (recommended)¶
For improved performance and storage, especially if your cache is serving over 10 TB of data, we recommend multiple partitions for handling namespaces (HDD, SSD, or NVMe), data (HDDs), and metadata (SSDs or NVMe).
Under this configuration the
<NAMESPACE PARTITION> is not used to store the files.
Instead, the partition stores symlinks to the files in the metadata and data partitions.
[email protected] $ docker run --rm \ --publish <HTTP HOST PORT>:8000 \ --publish <HTTPS HOST PORT>:8443 \ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \ --volume <NAMESPACE PARTITION>:/xcache/namespace \ --volume <METADATA PARTITION 1>:/xcache/meta1 ... --volume <METADATA PARTITION N>:/xcache/metaN --volume <DATA PARTITION 1>:/xcache/data1 ... --volume <DATA PARTITION N>:/xcache/dataN --env-file=/opt/xcache/.env \ opensciencegrid/stash-cache:3.6-release
For over 10 TB of assigned space we highly encourage to use this setup and mount
<NAMESPACE PARTITION> in
solid state disks or NVMe.
Single host partition¶
For a simpler installation, you may use a single host partition mounted to
[email protected] $ docker run --rm \ --publish <HTTP HOST PORT>:8000 \ --publish <HTTPS HOST PORT>:8443 \ --volume <HOST PARTITION>:/xcache \ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \ --env-file=/opt/xcache/.env \ opensciencegrid/stash-cache:3.6-release
Running a cache on container with systemd¶
An example systemd service file for the OSDF cache.
This will require creating the environment file in the directory
This example systemd file assumes
<HTTP HOST PORT> is
<HTTPS HOST PORT> is
<HOST PARTITION> is
/srv/cache, and the cert and key to use are in
Create the systemd service file
/etc/systemd/system/docker.stash-cache.service as follows:
[Unit] Description=Cache Container After=docker.service Requires=docker.service [Service] TimeoutStartSec=0 Restart=always ExecStartPre=-/usr/bin/docker stop %n ExecStartPre=-/usr/bin/docker rm %n ExecStartPre=/usr/bin/docker pull opensciencegrid/stash-cache:3.6-release ExecStart=/usr/bin/docker run --rm --name %n \ --publish 8000:8000 \ --publish 8443:8443 \ --volume /srv/cache:/xcache \ --volume /etc/ssl/host.crt:/etc/grid-security/hostcert.pem \ --volume /etc/ssl/host.key:/etc/grid-security/hostkey.pem \ --env-file /opt/xcache/.env \ opensciencegrid/stash-cache:3.6-release [Install] WantedBy=multi-user.target
Enable and start the service with:
[email protected] $ systemctl enable docker.stash-cache [email protected] $ systemctl start docker.stash-cache
You must register the cache before starting it up.
For caches that are connected to NICs over 40 Gbps we recommend that you disable the virtualized network and "bind" the container to the host network:
[email protected] $ docker run --rm \ --network="host" \ --volume <HOST PARTITION>:/cache \ --volume <HOST CERT>:/etc/grid-security/hostcert.pem \ --volume <HOST KEY>:/etc/grid-security/hostkey.pem \ --env-file=/opt/xcache/.env \ opensciencegrid/stash-cache:3.6-release
The cache uses the host's memory for two purposes:
- Caching files recently read from disk (via the kernel page cache).
- Buffering files recently received from the network before writing them to disk (to compensate for slow disks).
An easy way to increase the performance of the cache is to assign it more memory.
If you set a limit on the container's memory usage via the docker option
--memory or Kubernetes resource limits,
make sure it is at least twice the value of
Validating the Cache¶
The cache server functions as a normal HTTP server and can interact with typical HTTP clients, such as
<HTTP HOST PORT> is the port chosen in the
docker run command,
8000 by default.
[email protected] $ curl -O http://cache_host:<HTTP HOST PORT>/osgconnect/public/rynge/test.data
curl may not correctly report a failure, so verify that the contents of the file are: