Software Exercise 1.2: Writing a Wrapper Script¶
In this exercise, you will create a wrapper script to run the same program (blastx
) as the previous exercise.
Background¶
Wrapper scripts are a useful tool for running software that can't be compiled into one piece, needs to be installed with every job, or just for running extra steps. A wrapper script can either install the software from the source code, or use an already existing software (as in this exercise). Not only does this portability technique work with almost any kind of software that can be locally installed, it also allows for a great deal of control and flexibility for what happens within your job. Once you can write a script to handle your software (and often your data as well), you can submit a large variety of workflows to a distributed computing system like the Open Science Grid.
For this exercise, we will write a wrapper script as an alternate way to run the same job as the previous exercise.
Wrapper Script, part 1¶
Our wrapper script will be a bash script that runs several commands.
-
In the same directory as the last exercise (still logged into
login05.osgconnect.net
) make a file calledrun_blast.sh
. -
The first line we'll place in the script is the basic command for running blast. Based on our previous submit file, what command needs to go into the script?
-
Once you have an idea, check against the example below:
#!/bin/bash ncbi-blast-2.13.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results.txt
Note
The "header" of
#!/bin/bash
will tell the computer that this is a bash shell script and can be run in the same way that you would run individual commands on the command line.
Submit File Changes¶
We now need to make some changes to our submit file.
-
Make a copy of your previous submit file and open it to edit.
-
Since we are now using a wrapper script, that will be our job's executable. Replace the original
blastx
exeuctable with the name of our wrapper script and comment out the arguments line.executable = run_blast.sh #arguments =
-
Note that since the
blastx
program is no longer listed as the executable, it will be need to be included intransfer_input_files
. Instead of transferring just that program, we will transfer the original downloadedtar.gz
file. To achieve efficiency, we'll also transfer the pdbaa database as the originaltar.gz
file instead of as the unzipped folder:transfer_input_files = pdbaa.tar.gz, mouse.fa, ncbi-blast-2.13.0+-x64-linux.tar.gz
-
If you really want to be on top of things, look at the log file for the last exercise, and update your memory and disk requests to be just slightly above the actual "Usage" values in the log.
Before submitting, make sure to make the below additional changes to the wrapper script!
Wrapper Script, part 2¶
Now that our database and BLAST software are being transferred to the job as tar.gz
files, our script needs to accommodate.
-
Opening your
run_blast.sh
script, add two commands at the start to un-tar the BLAST and pdbaatar.gz
files. See the previous exercise if you're not sure what these commands looks like. -
In order to distinguish this job from our previous job, change the output file name to something besides
results.txt
. -
The completed script
run_blast.sh
should look like this:#!/bin/bash tar -xzf ncbi-blast-2.13.0+-x64-linux.tar.gz tar -xzf pdbaa.tar.gz ncbi-blast-2.13.0+/bin/blastx -db pdbaa/pdbaa -query mouse.fa -out results2.txt
-
While not strictly necessary, it's a good idea to enable executable permissions on the wrapper script, like so:
username@login $ chmod u+x run_blast.sh
Your job is now ready to submit. Submit it using condor_submit
and monitor using condor_q
.