Submitting jobs#
You can use the Python API to submit jobs (directed acyclic graphs of tasks) through a Client
. In addition to the functionality offered by the HyperQueue CLI, you can use the Python API to add dependencies between jobs, configure each task individually and create tasks out of Python functions.
Job#
To build a job, you first have to create an instance of the Job
class.
from hyperqueue import Job
job = Job()
Tasks#
Once you have created a job, you can add tasks to it. Currently, each task can represent either the execution of an external program or the execution of a Python function.
To create complex workflows, you can also specify dependencies between tasks.
External programs#
To create a task that will execute an external program, you can use the program
method of a Job
:
job.program(["/bin/my-program", "foo", "bar", "--arg", "42"])
You can pass the program arguments or various other parameters to the task. The program
method will return a Task
object that represents the created task. This object can be used further e.g. for defining dependencies.
Python functions#
If you want to execute a Python function as a task, you can use the function
method of a Job
:
def preprocess_data(fast, path):
with open(path) as f:
data = f.read()
if fast:
preprocess_fast(data)
else:
preprocess(data)
job.function(preprocess_data, args=(True, "/data/a.txt"))
job.function(preprocess_data, args=(False, "/data/b.txt"))
You can pass both positional and keyword arguments to the function. The arguments will be serialized using cloudpickle.
Python tasks can be useful to perform e.g. various data preprocessing and organization tasks. You can co-locate the logic of Python tasks together with the code that defines the submitted workflow (job), without the need to write an additional external script.
Same as with the program
method, function
will return a Task
that can used to define dependencies.
Notice
Currently, a new Python interpreter will be started for each Python task.
Parametrizing tasks#
You can parametrize both external or Python tasks by setting their working directory, standard output paths, environment variables or HyperQueue specific parameters like resources or time limits. In contrast to the CLI, where you can only use a single set of parameters for all tasks of a job, with the Python API you can specify these parameters individually for each task.
You can find more details in the documentation of the program
or function
methods.
Submitting a job#
Once you have added some tasks to the job, you can submit it using the Client
's submit
method:
client = Client()
submitted = client.submit(job)
To wait until the job has finished executing, use the wait_for_jobs
method:
client.wait_for_jobs([submitted])
Created: August 10, 2022