The role of HTC in advancing population genetics research
By: Hannah Cheren
June 1, 2022
Postdoctoral researcher Parul Johri uses OSG services, the HTCondor Software Suite, and the population genetics simulation program SLiM to investigate historical patterns of genetic variation.
Running hundreds of thousands of simulations is no easy task for just any researcher. When Parul Johri was faced with this particular problem, she knew she needed more computational power, which is where the OSG came into play.
Johri is a postdoctoral researcher with the Jensen Lab at Arizona State University who recently spoke about using high throughput computing (HTC) in her population genetics work at the recent OSG All-Hands Meeting 2022. Running hundreds of thousands of jobs that harnessed more than nine million computing hours on OSG’s Open Science Pool (OSPool), she shared that OSG services and the HTCondor Software Suite (HTCSS) were essential capabilities: “Without these HTC services and technologies, it would not have been possible to complete any of this work.”
Population genetics research focuses on understanding the impact of processes like selection and mutation that affect genetic variation within natural populations. However, there are no mathematical expressions to describe patterns of genetic variation in populations with complex histories and selection. Instead, hundreds of thousands of simulations are required to model these complicated evolutionary scenario trajectories, with HTCSS playing a critical role.
Some HTCSS features and HTC services and technologies were helpful for Johri’s work. First, high-throughput simulations are easy to communicate and execute via an HTCSS Access Point operated as part of the OSG Connect service. Beginning with population parameters that describe the entire population, Johri can create a single HTCSS submit file to simulate hundreds of thousands of gene samples across the genomes for each of these parameters. She then creates hundreds of thousands of evolutionary replicates for each simulation to make inferences about the parameters from a natural population. Each simulation is managed as a single job by HTCSS.
Additionally, because the OSPool supports the execution of user software within containers, Johri can easily run this work using SLiM, a population-genetic simulator. She and other population genetics researchers use these parameters to create simulations that imitate realistic data, making SLiM a beneficial and convenient program. Christina Koch, a Research Computing Facilitator at the CHTC, helped Johri create a SLiM container, making it easy to run on the OSPool.
The SLiM software doesn’t require input files, just the parameters Johri passes as commands to SLiM in the HTCSS submit file. HTCSS capabilities are available via the Access Points operated by OSG as part of the OSG Connect service for US-based research projects. After she submits the jobs through an HTCSS Access Point, SLiM performs simulations for each input parameter. It sends back an output file – anything from a simple summary statistic to entire genome samples of individuals from the simulated population.
Through an HTCSS Access Point, Johri ran three million jobs for examining genetic variation in Drosophila (common fruit flies common to genetics research), 50,000 jobs for influenza, and one and a half million jobs for humans. Using over nine and a half million wall hours in the last three years, Johri has published three manuscripts rich with genetic patterns and findings.
Looking towards the horizon, Johri views HTC services as a vital resource: “I’m hoping that HTC services and technologies will continue to play a central role in performing evolutionary inferences in the future.” This hope doesn’t only apply to Johri’s research –– it’s reflective of the entire field of population genetics. With dHTC services and technologies like the OSPool and HTCSS at their fingertips, population genetics researchers everywhere can push the field’s boundaries.