Skip to content

Bonus Data Exercise 3.1: Large Input Data

In this exercise, we will do a similar version of the previous exercise. This exercise should take 10-15 minutes.


In the previous exercises, we used two "web-based" tools to stage and deliver our files to jobs: the web proxy and Stash. Another alternative for handling large files (both input and output), especially if they are unique to each job, is a local shared filesystem. This is a filesystem that all (or most) of the execute servers can access, so data stored there can be copied to the job from that system instead of as a transfer or download.

For this example, we'll be submitting the same jobs as the previous exercise, but we will stage our data in a shared filesystem local to CHTC. The name of our shared filesystem is Staging and user directories are found as sub-directories of the path /staging. This is just one example of what it can look like to use a shared filesystem. If you are running jobs at your own institution, the shared filesystem and how to access it may be different.

Accessing the Filesystem

Running on

For these next 2 exercises, we will be using

Because our shared filesystem is only available on the local CHTC HTCondor pool, you'll need to log into our local submit server,

Once you've logged in, navigate to your Staging directory. It should be at the location /staging/<USERNAME>, where <USERNAME> is your username on

Previous Files


Like the previous example, we'll start by downloading our source movie files into the Staging directory. Run this command in your Staging directory, /staging/<USERNAME>.

user@learn $ wget

While the files are copying, feel free to open a second connection to and follow the instructions below. Once the files have finished downloading, untar them.

Software, Executable, Submit File

Because these jobs will be similar to the previous exercise, we can copy the software (ffmpeg), our executable ( and submit file from to, or, feel free to replicate these by following the instructions in the previous exercise. These files should go into a sub-directory of your home directory, not your Staging directory.


What changes will we need to make to our previous job submission in order to submit it in CHTC, using the Staging location? Read on.


The major actions of our script will be the same:

  1. Copy the movie file to the job's current working directory,
  2. Run the appropriate ffmpeg command
  3. Remove the original movie file.

The main difference is that the mov file will be copied from your Staging directory instead of being downloaded from Stash. Like before, your script should remove that file before the job completes so that it doesn't get transferred back to the submit server.

  1. Remove the lines in the that mention module load.

  2. Remove the stashcp line

  3. Change the first command of your script to only copy one .mov file:

    cp /staging/<USERNAME>/ ./

You should use your username on in the path above. If you have a version of the script that uses arguments instead of the filenames, that's okay.

Submit File

  1. Remove any previous requirements and add a line to the file (before the final queue statement) that ensures your job will land on computers that have access to Staging:
    requirements = (Target.HasCHTCStaging == true)

Initial Job

As before, we should test our job submission with a single mov file before submitting jobs for all three. Alter your submit file (if necessary) to run a job that converts the file.

Once the job finishes, check to make sure everything ran as expected:

  1. Check the directory where you submitted the job. Did the output .mp4 file return?
  2. Also in the directory where you submitted the job - did the original .mov file return here accidentally?
  3. Check file sizes. How big is the returned .mp4 file? How does that compare to the original .mov input?

If your job successfully returned the converted .mp4 file and not the .mov file to the submit server, and the .mp4 file was appropriately scaled down, then our script did what it should have.

Multiple jobs

Change your submit file as in the previous exercise in order to submit 3 jobs to convert all three files!