Output Streaming
Jobs containing many tasks will generate a large amount of stdout
and stderr
files, which can be problematic, especially on network-based shared filesystems, such as Lustre. For example, when you submit the following task array:
$ hq submit --array=1-10000 my-computation.sh
20000
files (10000
for stdout and 10000
for stderr) will be created on the disk.
To avoid this situation, HyperQueue can optionally stream the stdout
and stderr
output of tasks into a compact format that do not create a file per task.
Note
In this section, we refer to stdout
and stderr
as channels.
Redirecting output to the stream#
You can redirect the output of stdout
and stderr
to a log file and thus enable output streaming by passing a path to a filename where the log will be stored with the --stream
option:
$ hq submit --stream=<output-log-path> --array=1-10_000 ...
Output log path has to be a directory and it the user responsibility to ensure existence of the directory and visibility of each worker.
This command would cause the stdout
and stderr
of all 10_000
tasks to be streamed into the server, which will write them to files in <output-log-path>
. The streamed data is written in a compact way independently on the number of tasks. The format also contains additional metadata, which allows the resulting file to be filtered/sorted by tasks or channel.
Tip
You can use selected placeholders inside the stream path.
Partial redirection#
By default, both stdout
and stderr
will be streamed if you specify --stream
and do not specify an explicit path for stdout
and stderr
. To stream only one of the channels, you can use the --stdout
/--stderr
options to redirect one of them to a file or to disable it completely.
For example:
# Redirecting stdout into a file, streaming stderr into `my-log`
$ hq submit --stream=my-log --stdout="stdout-%{TASK_ID}" ...
# Streaming stdout into `my-log`, disabling stderr
$ hq submit --stream=my-log --stderr=none ...
Guarantees#
HyperQueue provides the following guarantees regarding output streaming:
When a task is Finished
or Failed
it is guaranteed that all data produced by the task is flushed into the streaming file. With the following two exceptions:
-
If the streaming itself fails (e.g. because there was insufficient disk space for the stream file), then the task will fail with an error prefixed with
"Streamer:"
and no streaming guarantees will be upheld. -
When a task is
Canceled
or task fails because of time limit is reached, then the part of its stream that was buffered in the worker is dropped to avoid spending additional resources for this task.
Inspecting the stream files#
HyperQueue lets you inspect the data stored inside the stream file using various subcommands. All these commands have the following structure:
$ hq output-log <output-log-path> <subcommand> <subcommand-args>
Stream summary#
You can display a summary of a log file using the summary
subcommand:
$ hq output-log <output-log-path> summary
Stream jobs#
To print all job IDs that streaming in the stream path, you can run the following command:
$ hq output-log <output-log-path> jobs
Printing stream content#
If you want to simply print the (textual) content of the log file, without any associating metadata, you can use the cat
subcommand:
$ hq output-log <output-log-path> cat <job-id> <stdout/stderr>
It will print the raw content of either stdout
or stderr
, ordered by task id. All outputs will be concatenated one after another. You can use this to process the streamed data e.g. by a postprocessing script.
By default, this command will fail if there is an unfinished stream (i.e. when some task is still running and streaming data into the log). If you want to use cat
even when the log is not finished yet, use the --allow-unfinished
option.
If you want to see the output of a specific task, you can use the --task=<task-id>
option.
Stream metadata#
If you want to inspect the contents of the log, along with its inner metadata that shows which task and which channel has produced which part of the data, you can use the show
subcommand:
$ hq output-log <log-file-path> show
The output will have the form J.T:C> DATA
where J
is a job id, T
is a task id and C
is 0
for stdout
channel and 1
for stderr
channel.
You can filter a specific channel with the --channel=stdout/stderr
flag.
Exporting log#
Log can be exported into JSON by the following command:
$ hq output-log <log-file-path> export
This prints the log file into a JSON format on standard output.
Superseded streams#
When a worker crashes while executing a task, the task will be restarted. HyperQueue gives each run of task a difference INSTANCE_ID, and it is a part of stream metadata, hence HyperQueue streaming is able to avoid mixing outputs from different executions of the same task, when a task is restarted.
HyperQueue automatically marks all output from previous instance of a task except the last instance as superseded. You can see statistics about superseded data via hq output-log <output-log-path> summary
command. In the current version, superseded data is ignored by all other commands.
More server instances#
HyperQueue supports writing streams from the different server instances into the same directory. If you run hq output-log
commands over such directory then it will detect the situation and prints all server uids that writes into the directory. You have to specify the server instance via hq output-log --server-uid=<SERVER_UID> ...
when working with such a output log directory.
Note
When a server is restored from a journal file, it will maintain the same server UID. When a server is started "from a scratch" a new server uid is generated.
Working with non-shared file system#
You do not need to have a shared file system when working with streaming. It is just your responsibility to collect all generated files into one directory before using hq output-log
commands.
Created: November 2, 2021