Skip to content

Transfer Smaller Job Files To and From /home

As described in the Overview: Data Staging and Transfer to Jobs any data, files, or even software that is <1GB should be staged in your /home directory on your Access Point. Files in your /home directory can be transferred to jobs via your HTCondor submit file.

Transfer Files From /home Using HTCondor

Transfer Input Files from /home

To transfer input files from /home, list the files by name in the transfer_input_files submit file option. You can use either absolute or relative paths to your input files. Multiple files can be specified using a comma-separated list.

To transfer files from your /home directory use the transfer_input_files statement in your HTCondor submit file. For example:

# submit file example

# transfer small file from /home 
transfer_input_files = my_data.csv

Multiple files can be specified using a comma-separated list, for example:

# transfer multiple files from /home
transfer_input_files = my_data.csv, my_software.tar.gz, my_script.py

When using transfer_input_files to transfer files located in /home, keep in mind that the path to the file is relative to the location of the submit file. If you have files located in a different /home subdirectory, we recommend specifying the full path to those files, which is also a matter of good practice, for example:

transfer_input_files = /home/username/path/to/my_software.tar.gz

Note that the path is not replicated on the remote side. The job will only see my_software.tar.gz in the top level job directory.

Above, username refers to your access point username.

Use HTCondor To Transfer Outputs

By default, HTCondor will transfer any new or modified files in the
job's top-level directory back to your /home directory location from
which the condor_submit command was performed. This behavior only
applies to files in the top-level directory of where your job executes, meaning HTCondor will ignore any files created in subdirectories of the job's top-level directory.
Several options exist for modifying this
default output file transfer behavior, including those described in
this guide.

What is the top-level directory of a job?

Before executing a job, HTCondor will create a new directory on the execute node just for your job - this is the top-level directory of the job and the path is stored in the environment variable _CONDOR_SCRATCH_DIR. All of the input files transferred via transfer_input_files will first be written to this directory and it is from this path that a job starts to execute. After a job has completed the top-level directory and all of it's contents are deleted.

Select Specific Output Files To Transfer to /home Using HTCondor

As described above, HTCondor will, by default, transfer any files that are generated during the execution of your job(s) back to your /home directory. If your job(s) will produce multiple output files but you only need to retain a subset of these output files, you can use a submit file option to only transfer back this file:

transfer_output_files = output.svg

Alternatively, you can delete the unrequired output files or move them to a subdirectory as a step in the bash executable script of your job - only the output files that remain in the top-level directory will be transferred back to your /home directory.

Organize Output Files in /home

By default, output files will be copied back to the directory in /home where you ran the condor_submit command. To modify these behavior, you can use the transfer_output_remaps option in the HTCondor submit file. The syntax for transfer_output_remaps is:

transfer_output_remaps = "Output1.txt = path/to/save/file/under/output.txt; Output2.txt = path/to/save/file/under/RenamedOutput.txt"

What if my output file(s) are not written to the top-level directory?

If your output files are written to a subdirectory, use the steps described below to convert the output directory to a "tarball" that is written to the top-level directory.

Alternatively, you can include steps in the executable bash script of your job to move (i.e. mv) output files from a subdirectory to the top-level directory. For example, if there is an output file that needs to be transferred back to the login node named job_output.txt written to job_output/:

#! /bin/bash

# various commands needed to run your job

# move csv files to scratch dir
mv job_output/job_output.txt $_CONDOR_SCRATCH_DIR

Group Multiple Output Files For Convenience

If your jobs will generate multiple output files, we recommend combining all output into a compressed tar archive for convenience, particularly when transferring your results to your local computer from your login node. To create a compressed tar archive, include commands in your your bash executable script to create a new subdirectory, move all of the output to this new subdirectory, and create a tar archive. For example:

#! /bin/bash

# various commands needed to run your job

# create output tar archive
mkdir my_output
mv my_job_output.csv my_job_output.svg my_output/
tar -czf my_job.output.tar.gz my_ouput/

The example above will create a file called my_job.output.tar.gz that contains all the output that was moved to my_output. Be sure to create my_job.output.tar.gz in the top-level directory of where your job executes and HTCondor will automatically transfer this tar archive back to your /home directory.