Skip to content

Jobs and Tasks

The main unit of computation within HyperQueue is called a Task. It represents a single computation (currently, a single execution of some program) that is scheduled and executed on a worker.

To actually compute something, you have to create a Job, which is a collection of tasks (a task graph). Jobs are units of computation management - you can submit, query or cancel jobs using the CLI.

Note

This section focuses on simple jobs, where each job contains exactly one task. See Task arrays to find out how to create jobs with multiple tasks.

Identification numbers#

Each job is identified by a positive integer that is assigned by the HyperQueue server when the job is submitted. We refer to it as Job id.

Each task within a job is identified by an unsigned 32b integer called Task id. Task id is either generated by the server or assigned by the user. Task ids are always relative to a specific job, two tasks inside different jobs can thus have the same task id.

In simple jobs, task id is always set to 0.

Submitting jobs#

To submit a simple job that will execute some executable with the provided arguments, use the hq submit command:

$ hq submit <program> <arg1> <arg2> ...

When you submit a job, the server will assign it a unique job id and print it. You can use this ID in following commands to refer to the submitted job.

After the job is submitted, HyperQueue will distribute it to a connected worker that will then execute the provided command.

Warning

The provided command will be executed on a worker that might be running on a different machine. You should thus make sure that the binary will be available there and that you provide an absolute path to it.

Note

When your command contains its own command line flags, you must put the command and its flags after --:

$ hq submit -- /bin/bash -c 'echo $PPID'

There are many parameters that you can set for the executed program, they are listed below.

Name#

Each job has an assigned name. It has only an informative character for the user. By default, the name is derived from the job's program name. You can also set the job name explicitly with the --name option:

$ hq submit --name=<NAME> ...

Working directory#

By default, the working directory of the job will be set to the directory from which the job was submitted. You can change this using the --cwd option:

$ hq submit --cwd=<path> ...

Warning

Make sure that the provided path exists on all worker nodes.

Hint

You can use placeholders in the working directory path.

Output#

By default, each job will produce two files containing the standard output and standard error output, respectively. The default paths of these files are

  • %{SUBMIT_DIR}/job-%{JOB_ID}/%{TASK_ID}.stdout for stdout
  • %{SUBMIT_DIR}/job-%{JOB_ID}/%{TASK_ID}.stderr for stderr

%{JOB_ID} and %{TASK_ID} are so-called placeholders, you can read about them below.

You can change these paths with the --stdout and --stderr options. You can also avoid creating stdout/stderr files completely by setting the value to none:

$ hq submit --stdout=out.txt --stderr=err.txt ...
$ hq submit --stdout=none ...

Warning

Make sure that the provided path(s) exist on all worker nodes. Also note that if you provide a relative path, it will be resolved relative to the directory from where you submit the job, not relative to the working directory of the job. If you want to change that, use the %{CWD} placeholder.

Environment variables#

You can set environment variables which will be passed to the provided command when the job is executed using the --env <KEY>=<VAL> option. Multiple environment variables can be passed if you repeat the option.

$ hq submit --env KEY1=VAL1 --env KEY2=VAL2 ...

Each executed task will also automatically receive the following environment variables:

Variable name Explanation
HQ_JOB_ID Job id
HQ_TASK_ID Task id
HQ_INSTANCE_ID Instance id

Time management#

You can specify two time-related parameters when submitting a job. They will be applied to each task of the submitted job.

  • Time Limit is the maximal running time of a task. If it is reached, the task will be terminated, and it will transition into the Failed state. This setting has no impact on scheduling.

    This can serve as a sanity check to make sure that some task will not run indefinitely. You can set it with the --time-limit option1:

    $ hq submit --time-limit=<duration> ...
    

    Note

    Time limit is counted separately for each task. If you set a time limit of 3 minutes and create two tasks, where each will run for two minutes, the time limit will not be hit.

  • Time Request is the minimal remaining lifetime that a worker must have in order to start executing the task. Workers that do not have enough remaining lifetime will not be considered for running this task.

    Time requests are only used during scheduling, where the server decides which worker should execute which task. Once a task is scheduled and starts executing on a worker, the time request value will not have any effect.

    You can set the time request using the --time-request option1:

    $ hq submit --time-request=<duration> ...
    

    Note

    Workers with an unknown remaining lifetime will be able to execute any task, disregarding its time request.

Here is an example situation where time limit and time request can be used:

Let's assume that we have a collection of tasks where the vast majority of tasks usually finish within 10 minutes, but some of them run for (at most) 30 minutes. We do not know in advance which tasks will be "slow". In this case we may want to set the time limit to 35 minutes to protect us against an error (deadlock, endless loop, etc.).

However, since we know that each task will usually take at least 10 minutes to execute, we don't want to start executing it on a worker if we know that the worker will definitely terminate in less than 10 minutes. It would only cause unnecessary lost computational resources. Therefore, we can set the time request to 10 minutes.

Priority#

You can modify the order in which tasks are executed using Priority. Priority can be any 32b signed integer. A lower number signifies lower priority, e.g. when task A with priority 5 and task B with priority 3 are scheduled to the same worker and only one of them may be executed, then A will be executed first.

You can set the priority using the --priority option:

$hq submit --priority=<PRIORITY>

If no priority is specified, then each task will have priority 0.

Placeholders#

You can use special variables when setting certain job parameters (working directory, output paths, log path). These variables, called Placeholders, will be replaced by job or task-specific information before the job is executed.

Placeholders are enclosed in curly braces ({}) and prefixed with a percent (%) sign.

You can use the following placeholders:

Placeholder Will be replaced by Available for
%{JOB_ID} Job ID stdout, stderr, cwd, log
%{TASK_ID} Task ID stdout, stderr, cwd
%{INSTANCE_ID} Instance ID stdout, stderr, cwd
%{SUBMIT_DIR} Directory from which the job was submitted. stdout, stderr, cwd, log
%{CWD} Working directory of the task. stdout, stderr
%{SERVER_UID} Server unique ID (a string of length 6)[^uid] stdout, stderr, cwd, log

[^uid] Server generates a random SERVER_UID string every time a new server is started (hq server start).

State#

At any moment in time, each task and job has a specific state that represents what is currently happening to it. You can query the state of a job with the following command2:

$ hq job info <job-id>

Task state#

Each task starts in the Waiting state and can end up in one of the terminal states: Finished, Failed or Canceled.

Waiting-----------------\
   | ^                  |
   | |                  |
   v |                  |
Running-----------------|
   | |                  |
   | \--------\         |
   |          |         |
   v          v         v
Finished    Failed   Canceled
  • Waiting The task was submitted and is now waiting to be executed.
  • Running The task is running on a worker. It may become Waiting again when the worker where the task is running crashes.
  • Finished The task has successfully finished.
  • Failed The task has failed.
  • Canceled The task has been canceled.

If a task is in the Finished, Failed or Canceled state, it is completed.

Job state#

The state of a job is derived from the states of its individual tasks. The state is determined by the first rule that matches from the following list of rules:

  1. If at least one task is Running, then job state is Running.
  2. If at least one task has not been completed yet, then job state is Waiting.
  3. If at least one task is Failed, then job state is Failed.
  4. If at least one task is Canceled, then job state is Canceled.
  5. All tasks have to be Finished, therefore the job state will also be Finished.

Cancelling jobs#

You can prematurely terminate a submitted job that haven't been completed yet by cancelling it using the hq job cancel command2:

$ hq job cancel <job-selector>

Cancelling a job will cancel all of its tasks that are not yet completed.

Waiting for jobs#

There are three ways of waiting until a job completes:

  • Submit and wait You can use the --wait flag when submitting a job. This will cause the submission command to wait until the job becomes complete:

    $ hq submit --wait ...
    

    Tip

    This method can be used for benchmarking the job duration.

  • Wait command There is a separate hq job wait command that can be used to wait until an existing job completes2:

    $ hq job wait <job-selector>
    
  • Interactive wait If you want to interactively observe the status of a job (which is useful especially if it has multiple tasks), you can use the hq job progress command:

    $ hq submit --progress ...
    
    $ hq job progress <selector>
    

Attaching standard input#

When --stdin flag is used, HQ captures standard input and attaches it to each task of a job. When a task is started then the attached data is written into the standard input of the task.

This can be used to submitting scripts without creating file. The following command will capture stdin and executes it in Bash

$ hq submit --stdin bash

If you want to parse #HQ directives from standard input, you can use --directives=stdin.

Task directory#

When a job is submitted with --task-dir then a temporary directory is created for each task and passed via environment variable HQ_TASK_DIR. This directory is automatically deleted when the task is completed (for any reason).

Providing own error message#

A task may pass its own error message into the HyperQueue. HyperQueue provides a filename via environment variable HQ_ERROR_FILENAME, if a task creates this file and terminates with a non-zero return code, then the content of this file is taken as an error message.

HQ_ERROR_FILENAME is provided only if task directory is set on. The filename is always placed inside the task directory.

If the message is longer than 2KiB, then it is truncated to 2KiB.

If task terminates with zero return code, then the error file is ignored.

Useful job commands#

Here is a list of useful job commands:

Display job table#

$ hq job list
$ hq job list --all

You can display only jobs having the selected states by using the --filter flag:

$ hq job list --filter running,waiting

Valid filter values are:

  • waiting
  • running
  • finished
  • failed
  • canceled

Display information about a specific job#

$ hq job info <job-selector>

Display information about individual tasks (potentially across multiple jobs)#

$ hq task list <job-selector> [--task-status <status>] [--tasks <task-selector>]

Display job stdout/stderr#

$ hq job cat <job-id> [--tasks <task-selector>] <stdout/stderr>

  1. You can use various shortcuts for the duration value. 

  2. You can use various shortcuts to select multiple jobs at once. 


Last update: June 10, 2022
Created: November 2, 2021
Back to top