Task Arrays
It is a common use case to execute the same command for multiple input parameters, for example:
- Performing a simulation for each input file in a directory
- Training many machine learning models using hyperparameter search
HyperQueue allows you to do this using a job containing many tasks. We call such jobs Task arrays. You can create a task array with a single submit
command and then manage all created tasks as a single group using its containing job.
Note
Task arrays are somewhat similar to "job arrays" used by PBS and Slurm. However, HQ does not use PBS/Slurm job arrays for implementing this feature. Therefore, the limits that are commonly enforced on job arrays on HPC clusters do not apply to HyperQueue task arrays.
Creating task arrays#
To create a task array, you must provide some source that will determine how many tasks should be created and what environment variables should be passed to each task so that you can differentiate them.
Currently, you can create a task array from a range of integers, from each line of a text file or from each item of a JSON array. You cannot combine these sources, as they are mutually exclusive.
Handling many output files
By default, each task in a task array will create two output files (containing stdout
and stderr
output). Creating large task arrays will thus generate a lot of files, which can be problematic especially on network-based shared filesystems, such as Lustre. To avoid this, you can either disable the output or use Output streaming.
Integer range#
The simplest way of creating a task array is to specify an integer range. A task will be started for each integer in the range. You can then differentiate between the individual tasks using task id that can be accessed through the HQ_TASK_ID
environment variable.
You can enter the range as two unsigned numbers separated by a dash1, where the first number should be smaller than the second one. The range is inclusive.
The range is entered using the --array
option:
# Task array with 3 tasks, with ids 1, 2, 3
$ hq submit --array 1-3 ...
# Task array with 6 tasks, with ids 0, 2, 4, 6, 8, 10
$ hq submit --array 0-10:2 ...
Lines of a file#
Another way of creating a task array is to provide a text file with multiple lines. Each line from the file will be passed to a separate task, which can access the value of the line using the environment variable HQ_ENTRY
.
This is useful if you want to e.g. process each file inside some directory. You can generate a text file that will contain each filepath on a separate line and then pass it to the submit command using the --each-line
option:
$ hq submit --each-line entries.txt ...
JSON array#
You can also specify the source using a JSON array stored inside a file. HyperQueue will then create a task for each item in the array and pass the item as a JSON string to the corresponding task using the environment variable HQ_ENTRY
.
Note
The root JSON value stored inside the file must be an array.
You can create a task array in this way using the --from-json
option:
$ hq submit --from-json items.json ...
-
The full syntax can be seen in the second selector of the ID selector shortcut. ↩
Created: November 2, 2021