Overview: Data Staging and Transfer to Jobs¶
Overview¶
As a distributed system, jobs on the OSPool will run in different
physical locations, where the computers that are executing jobs don't
have direct access to the files placed on the Access Point (e.g. in a
/home
directory). In order to run on this
kind of distributed system, jobs need to "bring along" the data, code,
packages, and other files from the access point (where the job is
submitted) to the execute points (where the job will run).
HTCondor's file transfer tools and plugins make this possible; input and
output files are specified as part of the job submission and then moved
to and from the execution location.
This guide describes where to place files on the access points, and how to use these files within jobs, with links to a more detailed guide for each use case.
Always Submit From /home
¶
Regardless of where data is placed, jobs should only be submitted with
condor_submit
from /home
Use HTCondor File Transfer for Smaller Job Files¶
You should use your /home
directory to stage job files where:
- individual input files per job are less than 1GB per file, and if there are multiple files, they total less than 1GB
- output files per job are less than 1GB per file
Files can to be transferred to and from the /home
directory
using HTCondor's file transfer mechanism. Input files can be
specified in the submit file and by default,
files created by your job will automatically be returned
to your /home
directory.
See our Transfer Files To and From /home guide for complete details on managing your files this way.
Use OSDF for Larger Files and Containers¶
You should use the OSDF (Open Science Data Federation) to stage job files where:
- individual input files per job are greater than 1GB per file
- an input file (of any size) is used by many jobs
- output files per job are greater than 1GB per file
You should also always use the OSDF to stage Singularity/Apptainer container
files (with the ending .sif
) for jobs.
Important Note: Files in OSDF are cached, so it is important to use a descriptive file name (possibly using version names or dates within the file name), or a directory structure with unique names to ensure you know what version of the file you are using within your job.
To use the OSDF, files are placed (or returned to) a local path, and moved to and from the job using a URL notation in the submit file.
To see where to place your files in the OSDF and how to use
OSDF URLs in transfer_input_files
/transfer_output_files
,
please see the OSDF guide.
Quotas¶
/home
and OSDF origins all have quota limits. /home
is usually
limited to 50 GBs, while OSDF limits vary. You can find out your current
usage by running quota
or quota -vs
Note that jobs will go on hold if quotas are exceeded.
If you want an increase in your quota, please send a request with justification to the ticket system [email protected]
External Data Transfer to/from Access Point¶
In general, common Unix tools such as rsync
, scp
, Putty, WinSCP,
gFTP
, etc. can be used to upload data from your computer to access
point, or to download files from the access point.
See our Data Transfer Guide for more details.
FAQ¶
For additional data information, see also the "Data Storage and Transfer" section of our FAQ.
Data Policies¶
Please see the OSPool Polices for important usage polices.