Conda with Containers¶
The Anaconda/Miniconda distribution of Python is a common tool for installing and managing Python-based software and other tools.
There are two ways of using Conda on the OSPool: with a tarball, or via a custom Apptainer/Singularity container. Either works well, but the container solution might be better if your Conda environment contains non-Python tools.
Overview¶
When should you use Miniconda as an installation method in OSG?
- Your software has specific conda-centric installation instructions.
- The above is true and the software has a lot of dependencies.
- You mainly use Python to do your work.
Notes on terminology:
- conda is a Python package manager and package ecosystem that exists in parallel with
pip
and PyPI. - Miniconda is a slim Python distribution, containing the minimum amount of packages necessary for a Python installation that can use conda.
- Anaconda is a pre-built scientific Python distribution based on Miniconda that has many useful scientific packages pre-installed.
To create the smallest, most portable Python installation possible, we recommend starting with Miniconda and installing only the packages you actually require.
To use a Miniconda installation for your jobs, create an Apptainer/Singularity definition file and build it (general instructions here).
Apptainer/Singularity Definition File¶
The definition file tells Apptainer/Singularity how the container should be built,
and what the environment setup should take place when the container is instantiated.
In the following example, the container is based on Ubuntu 22.04. A few base
operating system tools are installed, then Miniconda, followed by a set of
conda
commands to define the Conda environment. The %environment
is used
to ensure jobs are getting the environment activated before the job runs. To build
your own custom image, start by modifing the conda install
line to include
the packages you need.
Bootstrap: docker
From: ubuntu:22.04
%environment
# set up environment for when using the container
. /opt/conda/etc/profile.d/conda.sh
conda activate
%post
# base os
apt-get update -y
apt-get install -y build-essential wget
# install miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -f -p /opt/conda
rm Miniconda3-latest-Linux-x86_64.sh
# install conda components - add the packages you need here
. /opt/conda/etc/profile.d/conda.sh
conda activate
conda install -y -c conda-forge numpy cowpy
conda update --all
The next step is to build the image. Run:
$ apptainer build my-container.sif image.def
You can explore the container locally to make sure it works as expected with the shell subcommand:
$ apptainer shell my-container.sif
This example will give you an interactive shell. You can explore the
container and test your code with your own inputs from your /home
directory, which is automatically mounted (but note - $HOME will not be
available to your jobs later). Once you are down exploring, exit the
container by running exit
or with CTRL+D
It is important to use the correct transfer mechanism to get the image to your job. Please make sure you use OSDF and version your container in the filename. For example:
$ cp my-container.sif /ospool/protected/<username>/my-container-v1.sif
Submit Jobs¶
An example submit file could look like:
# File Name: conda_submission.sub
# specify the newly built image
+SingularityImage = "osdf:///ospool/protected/<username>/my-container-v1.sif"
# Specify your executable (single binary or a script that runs several
# commands) and arguments to be passed to jobs.
# $(Process) will be a integer number for each job, starting with "0"
# and increasing for the relevant number of jobs.
executable = science.py
arguments = $(Process)
# Specify the name of the log, standard error, and standard output (or "screen output") files.
log = science_with_conda.log
error = science_with_conda.err
output = science_with_conda.out
# Transfer any file needed for our job to complete.
transfer_input_files =
# Specify Job duration category as "Medium" (expected runtime <10 hr) or "Long" (expected runtime <20 hr).
+JobDurationCategory = “Medium”
# Tell HTCondor requirements your job needs,
# what amount of compute resources each job will need on the computer where it runs.
requirements =
request_cpus = 1
request_memory = 1GB
request_disk = 5GB
# Tell HTCondor to run 1 instance of our job:
queue 1
Specifying Exact Dependency Versions¶
An important part of improving reproducibility and consistency between runs is to ensure that you use the correct/expected versions of your dependencies.
When you run a command like conda install numpy
conda
tries to install the most recent version of numpy
For example, numpy
version 1.22.3
was released on Mar 7, 2022. To install exactly this version of numpy, you would run conda install numpy=1.22.3
(the same works for pip
if you replace =
with ==
). We recommend installing with an explicit version to make sure you have exactly the version of a package that you want. This is often called “pinning” or “locking” the version of the package.
If you want a record of what is installed in your environment, or want to reproduce your environment on another computer, conda can create a file, usually called environment.yml
, that describes the exact versions of all of the packages you have installed in an environment. An example environment.yml
file:
channels:
- conda-forge
- defaults
dependencies:
- cowpy
- numpy=1.25.0
To use the environment.yml
in the build, modify the image definition to copy the file, and
then replace the conda install
with a conda env create
. Also note that it is good style
to name the environment. We call it science
in this example:
Bootstrap: docker
From: ubuntu:22.04
%files
environment.yml
%environment
# set up environment for when using the container
. /opt/conda/etc/profile.d/conda.sh
conda activate science
%post
# base os
apt-get update -y
apt-get install -y build-essential wget
# install miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -f -p /opt/conda
rm Miniconda3-latest-Linux-x86_64.sh
# install conda components - add the packages you need here
. /opt/conda/etc/profile.d/conda.sh
conda activate
conda env create -n science -f environment.yml
conda update --all
If you use a source control system like git
, we recommend checking your environment.yml
file into source control and making sure to recreate it when you make changes to your environment. Putting your environment under source control gives you a way to track how it changes along with your own code.
More information on conda environments can be found in their documentation.