Data arrays#
Executing the same program for each item of an array of data is a perfect use-case for HyperQueue. It contains built-in support for generating a task array from a file containing a JSON array or from a file where each task input is specified on a separate line.
Processing many input files with the same program#
Let's say that we have a directory with 100 data files that we want to process using some program.
First, we create an input file (called e.g. inputs.txt
) that will store the filepaths of all these data files:
/data/input-01.txt
/data/input-02.txt
...
Then we create a bash script called e.g. compute.sh
that will be executed by each HyperQueue task. Each such task will receive a single line from inputs.txt
in the HQ_ENTRY
environment variable. Our bash script will simply forward this line to a program of our choosing:
#!/bin/bash
/home/user/my-program --param a=b --input ${HQ_ENTRY}
And finally, we can submit a task graph where a single task will be spawned for each line in the file above using the following command:
$ hq submit --each-line=inputs.txt ./compute.sh
If the inputs.txt
file contained 100 lines, the command above would create a single job with 100 tasks.