Roadmap to HTC Workload Submission¶
Overview¶
This guide lays out the steps needed to go from logging in to an OSG Access Point to running a full scale high throughput computing (HTC) workload on OSG's Open Science Pool (OSPool). The steps listed here apply to any new workload submission, whether you are a long-time OSG user or just getting started with your first workload, with helpful links to our documentation pages.
This guide assumes that you have applied for an OSG Access Point account and have been approved after meeting with an OSG Research Computing Facilitator. If you don't yet have an account, you can apply for one here or contact us with any questions you have.
Learning how to get started on the OSG does not need to end with this document or our guides! Learn about our training opportunities and personal facilitation support in the Getting Help section below.
1. Introduction to the OSPool and OSG Resources¶
The OSG's Open Science Pool is best-suited for computing work that can be run as many, independent tasks, in an approach called "high throughput computing." For more information on what kind of work is a good fit for the OSG, see Is the Open Science Pool for You?.
Learn more about the services provided by the OSG in this video:
2. Log on to an OSG Access Point¶
If you have not done so, apply for an account here. A Research Computing Facilitator will contact you within one business day to arrange a meeting to discuss your computational goals and to activate your account.
Note that there are multiple classes of access points provided. When your account was activated, you should have been told which access point your account belongs to:
Log In to "uw.osg-htc.org" Access Points (e.g., ap40.uw.osg-htc.org)
If your account is on the uw.osg-htc.org Access Points (e.g., accounts on ap40.uw.osg-htc.org), follow instructions in this guide for logging in: Log In to uw.osg-htc.org Access Points
Log In to "OSG Connect" Access Points (e.g., ap20.uc.osg-htc.org)
If your account is on the OSG Connect Access points (e.g., accounts on ap20.uc.osg-htc.org, ap21.uc.osg-htc.org), follow instructions in this guide for logging in: Log In to OSG Connect Access Points
3. Learn to Submit HTCondor Jobs¶
Computational work is run on the OSPool by submitting it as “jobs” to the HTCondor scheduler. Jobs submitted to HTCondor are then scheduled and run on different resources that are part of the Open Science Pool. Before submitting your own computational work, it is important to understand how HTCondor job submission works. The following guides show how to submit basic HTCondor jobs.
4. Test a First Job¶
After learning about the basics of HTCondor job submission, you will need to generate your own HTCondor job -- including the software needed by the job and the appropriate mechanism to handle the data. We recommend doing this using a single test job.
Prepare your software¶
Software is an integral part of your HTC workflow. Whether you’ve written it yourself, inherited it from your research group, or use common open-source packages, any required executables and libraries will need to be made available to your jobs if they are to run on the OSPool.
Read through this overview of Using Software to help you determine the best way to provide your software. We also have the following guides/tutorials for each major software portability approach:
- To install your own software, begin with the guide on Compiling Software and then complete the Example Software Compilation tutorial.
- To use precompiled binaries, try the example presented in the AutoDock Vina tutorial and/or the Julia tutorial.
- To use Apptainer/Singularity/Docker containers for your jobs, see the Create an Apptainer/Singularity Container Image
Finally, here are some additional guides specific to some of the most common scripting languages and software tools used on OSG**:
**This is not a complete list. Feel free to search for your software in our Knowledge base.
Manage your data¶
The data for your jobs will need to be transferred to each job that runs in the OSPool, and HTCondor has built-in features for getting data to jobs. Our Data Management guide discussed the relevant approaches, when to use them, and where to stage data for each.
Assign the Appropriate Job Duration Category¶
Jobs running in the OSPool may be interrupted at any time, and will be re-run by HTCondor, unless a single execution of a job exceeds the allowed duration. Jobs expected to take longer than 10 hours will need to identify themselves as 'Long' according to our Job Duration policies. Remember that jobs expected to take longer than 20 hours are not a good fit for the OSPool (see Is the Open Science Pool for You?) without implementing self-checkpointing (further below).
5. Scale Up¶
After you have a sample job running successfully, you’ll want to scale up in one or two steps (first run several jobs, before running ALL of them). HTCondor has many useful features that make it easy to submit multiple jobs with the same submit file.
- Easily submit multiple jobs
- Scaling up after success with test jobs discusses how to test your jobs for duration, memory and disk usage, and the total amount of space you might need on the
6. Special Use Cases¶
If you think any of the below applies to you, please get in touch and our facilitation team will be happy to discuss your individual case.
- Run sequential workflows of jobs: Workflows with HTCondor's DAGMan
- Implement self-checkpointing for long jobs: HTCondor Checkpointing Guide
- Build your own Apptainer container: Create an Apptainer/Singularity Container Image
- Submit more than 10,000 jobs at once: FAQ, search for 'max_idle'
- Larger or speciality resource requests:
- GPUs: GPU Jobs
- Multiple CPUs: Multicore Jobs
- Large Memory: Large Memory Jobs
Getting Help¶
The OSG Facilitation team is here to help with questions and issues that come up as you work through these roadmap steps. We are available via email, office hours, appointments, and offer regular training opportunities. See our Get Help page and OSG Training page for all the different ways you can reach us. Our purpose is to assist you with achieving your computational goals, so we want to hear from you!