Skip to content

Task Arrays

It is a common use case to execute the same command for multiple input parameters, for example:

  • Perform a simulation for each input file in a directory or for each line in a CSV file.
  • Train many machine learning models using hyperparameter search for each model configuration.

HyperQueue allows you to do this using a job that contains many tasks. We call such jobs Task arrays. You can create a task array with a single submit command and then manage all created tasks as a single group using its containing job.

Note

Task arrays are somewhat similar to "job arrays" used by PBS and Slurm. However, HQ does not use PBS/Slurm job arrays for implementing this feature. Therefore, the limits that are commonly enforced on job arrays on HPC clusters do not apply to HyperQueue task arrays.

Creating task arrays#

To create a task array, you must provide some source that will determine how many tasks should be created and what inputs (environment variables) should be passed to each task so that you can differentiate them.

Currently, you can create a task array from a range of integers, from each line of a text file or from each item of a JSON array. You cannot combine these sources, as they are mutually exclusive.

Handling many output files

By default, each task in a task array will create two output files (containing stdout and stderr output). Creating large task arrays will thus generate a lot of files, which can be problematic especially on network-based shared filesystems, such as Lustre. To avoid this, you can either disable the output or use Output streaming.

Integer range#

The simplest way of creating a task array is to specify an integer range. A task will be started for each integer in the range. You can then differentiate between the individual tasks using task id that can be accessed through the HQ_TASK_ID environment variable.

You can enter the range as two unsigned numbers separated by a dash1, where the first number should be smaller than the second one. The range is inclusive.

The range is entered using the --array option:

# Task array with 3 tasks, with ids 1, 2, 3
$ hq submit --array 1-3 ...

# Task array with 6 tasks, with ids 0, 2, 4, 6, 8, 10
$ hq submit --array 0-10:2 ...

Lines of a file#

Another way of creating a task array is to provide a text file with multiple lines. Each line from the file will be passed to a separate task, which can access the value of the line using the environment variable HQ_ENTRY.

This is useful if you want to e.g. process each file inside some directory. You can generate a text file that will contain each filepath on a separate line and then pass it to the submit command using the --each-line option:

$ hq submit --each-line entries.txt ...

Tip

To directly use an environment variable in the submitted command, you have to make sure that it will be expanded when the command is executed, not when the command is submitted. You should also execute the command in a bash script if you want to specify it directly and not via a script file.

For example, the following command is incorrect, as it will expand HQ_ENTRY during submission (probably to an empty string) and submit a command ls:

$ hq submit --each-line files.txt ls $HQ_ENTRY
To actually submit the command ls $HQ_ENTRY, you can e.g. wrap the command in apostrophes and run it in a shell:
$ hq submit --each-line files.txt bash -c 'ls $HQ_ENTRY'

JSON array#

You can also specify the source using a JSON array stored inside a file. HyperQueue will then create a task for each item in the array and pass the item as a JSON string to the corresponding task using the environment variable HQ_ENTRY.

Note

The root JSON value stored inside the file must be an array.

You can create a task array in this way using the --from-json option:

$ hq submit --from-json items.json ...

If items.json contained this content:

[{
  "batch_size": 4,
  "learning_rate": 0.01
}, {
  "batch_size": 8,
  "learning_rate": 0.001
}]
then HyperQueue would create two tasks, one with HQ_ENTRY set to {"batch_size": 4, "learning_rate": 0.01} and the other with HQ_ENTRY set to {"batch_size": 8, "learning_rate": 0.001}.

Combining with --each-line/--from-json with --array#

Option --each-line or --from-json can be combined with option --array. In such case, only a subset of lines/json will be submitted. If --array defines an ID that exceeds the number of lines in the file (or the number of elements in JSON), then the ID is silently removed.

For example:

$ hq submit --each-line input.txt --array "2, 8-10"

If input.txt has sufficiently many lines then it will create array job with four tasks. One for 3rd line of file and three tasks for 9th-11th line (note that first line has id 0). It analogously works for --from-json.


  1. The full syntax can be seen in the second selector of the ID selector shortcut


Last update: December 12, 2023
Created: November 2, 2021
Back to top