Skip to content

Submitting jobs#

You can use the Python API to submit jobs (directed acyclic graphs of tasks) through a Client. In addition to the functionality offered by the HyperQueue CLI, you can use the Python API to add dependencies between jobs, configure each task individually and create tasks out of Python functions.

Job#

To build a job, you first have to create an instance of the Job class.

from hyperqueue import Job

job = Job()

Tasks#

Once you have created a job, you can add tasks to it. Currently, each task can represent either the execution of an external program or the execution of a Python function.

To create complex workflows, you can also specify dependencies between tasks.

External programs#

To create a task that will execute an external program, you can use the program method of a Job:

job.program(["/bin/my-program", "foo", "bar", "--arg", "42"])

You can pass the program arguments or various other parameters to the task. The program method will return a Task object that represents the created task. This object can be used further e.g. for defining dependencies.

Python functions#

If you want to execute a Python function as a task, you can use the function method of a Job:

def preprocess_data(fast, path):
    with open(path) as f:
        data = f.read()
    if fast:
        preprocess_fast(data)
    else:
        preprocess(data)

job.function(preprocess_data, args=(True, "/data/a.txt"))
job.function(preprocess_data, args=(False, "/data/b.txt"))

You can pass both positional and keyword arguments to the function. The arguments will be serialized using cloudpickle.

Python tasks can be useful to perform e.g. various data preprocessing and organization tasks. You can co-locate the logic of Python tasks together with the code that defines the submitted workflow (job), without the need to write an additional external script.

Same as with the program method, function will return a Task that can used to define dependencies.

Notice

Currently, a new Python interpreter will be started for each Python task.

Parametrizing tasks#

You can parametrize both external or Python tasks by setting their working directory, standard output paths, environment variables or HyperQueue specific parameters like resources or time limits. In contrast to the CLI, where you can only use a single set of parameters for all tasks of a job, with the Python API you can specify these parameters individually for each task.

You can find more details in the documentation of the program or function methods.

Submitting a job#

Once you have added some tasks to the job, you can submit it using the Client's submit method:

client = Client()
submitted = client.submit(job)

To wait until the job has finished executing, use the wait_for_jobs method:

client.wait_for_jobs([submitted])

Last update: March 27, 2023
Created: August 10, 2022
Back to top