Skip to content

Submitting jobs#

You can use the Python API to submit jobs (directed acyclic graphs of tasks) through a Client. In addition to the functionality offered by the HyperQueue CLI, you can use the Python API to add dependencies between jobs, configure each task individually and create tasks out of Python functions.

Job#

To build a job, you first have to create an instance of the Job class.

from hyperqueue import Job

job = Job()

Tasks#

Once you have created a job, you can add tasks to it. Currently, each task can represent either the execution of an external program or the execution of a Python function.

To create complex workflows, you can also specify dependencies between tasks.

External programs#

To create a task that will execute an external program, you can use the program method of a Job:

job.program(["/bin/my-program", "foo", "bar", "--arg", "42"])

You can pass the program arguments or various other parameters to the task. The program method will return a Task object that represents the created task. This object can be used further e.g. for defining dependencies.

Python functions#

If you want to execute a Python function as a task, you can use the function method of a Job:

def preprocess_data(fast, path):
    with open(path) as f:
        data = f.read()
    if fast:
        preprocess_fast(data)
    else:
        preprocess(data)

job.function(preprocess_data, args=(True, "/data/a.txt"))
job.function(preprocess_data, args=(False, "/data/b.txt"))

You can pass both positional and keyword arguments to the function. The arguments will be serialized using cloudpickle.

Python tasks can be useful to perform e.g. various data preprocessing and organization tasks. You can co-locate the logic of Python tasks together with the code that defines the submitted workflow (job), without the need to write an additional external script.

Same as with the program method, function will return a Task that can used to define dependencies.

Notice

Currently, a new Python interpreter will be started for each Python task.

Python environment#

When you use a Python function as a task, the task will attempt to import the hyperqueue package when it executes (to perform some bookkeeping on the background). This function will be executed on a worker - this means that it needs to have access to the correct Python version (and virtual environment) that contains the hyperqueue package!

To make sure that the function will be executed in the correct Python environment, you can use PythonEnv and its prologue argument. It lets you specify a (shell) command that will be executed before the Python interpreter that executes your function is spawned.

from hyperqueue.task.function import PythonEnv
from hyperqueue import Client

env = PythonEnv(
    prologue="ml Python/XYZ && source /<my-path-to-venv>/bin/activate"
)
client = Client(python_env=env)

If you use Python functions as tasks, it is pretty much required to use PythonEnv, unless your workers are already spawned in an environment that has the correct Python loaded (e.g. using .bashrc or a similar mechanism).

Parametrizing tasks#

You can parametrize both external or Python tasks by setting their working directory, standard output paths, environment variables or HyperQueue specific parameters like resources or time limits. In contrast to the CLI, where you can only use a single set of parameters for all tasks of a job, with the Python API you can specify these parameters individually for each task.

You can find more details in the documentation of the program or function methods.

Submitting a job#

Once you have added some tasks to the job, you can submit it using the Client's submit method:

client = Client()
submitted = client.submit(job)

To wait until the job has finished executing, use the wait_for_jobs method:

client.wait_for_jobs([submitted])

Last update: April 20, 2023
Created: August 10, 2022
Back to top