Server

The server is a crucial component of HyperQueue which manages workers and jobs. Before running any computations or deploying workers, you must first start the server.

Starting the server#

The server can be started by running the hq server start command:

$ hq server start

You can change the hostname under which the server is visible to workers with the --host option:

$ hq server start --host=HOST

Server directory#

When the server is started, it creates a server directory where it stores information needed for submitting jobs and connecting workers. This directory is then used to select a running HyperQueue instance.

By default, the server directory will be stored in $HOME/.hq-server. This location may be changed with the option --server-dir=<PATH>, which is available for all HyperQueue CLI commands. You can run more instances of HyperQueue under the same Unix user, by making them use different server directories.

If you use a non-default server directory, make sure to pass the same --server-dir to all HyperQueue commands that should use the selected HyperQueue server:

$ hq --server-dir=foo server start &
$ hq --server-dir=foo worker start

Tip

To avoid having to pass the --server-dir parameter to all hq commands separately, you can also pass it through theHQ_SERVER_DIR environment variable, and export it to share it for all commands in the same terminal session:

$ export HQ_SERVER_DIR=bar
$ hq server start &
$ hq worker start &

Server directory access

Encryption keys are stored in the server directory. Whoever has access to the server directory may submit jobs, connect workers to the server and decrypt communication between HyperQueue components. By default, the directory is only accessible by the user who started the server.

Running multiple servers#

When you start the server, it will create a new subdirectory in the server directory, which will store the data of the current running instance. It will also create a symlink hq-current which will point to the currently active subdirectory. Using this approach, you can start a server using the same server directory multiple times without overwriting data of the previous runs.

Keeping the server alive#

The server is supposed to be a long-lived component. If you shut it down, all workers will disconnect and all computations will be stopped. Therefore, it is important to make sure that the server will stay running e.g. even after you disconnect from a cluster where the server is deployed.

For example, if you SSH into a login node of an HPC cluster and then run the server like this:

$ hq server start

The server will quit when your SSH session ends, because it will receive a SIGHUP signal. You can use established Unix approaches to avoid this behavior, for example prepending the command with nohup or using a terminal multiplexer like tmux.

Resuming stopped/crashed server#

The server supports resilience, which allows it to restore its state after it is stopped or if it crashes. To enable resilience, you can tell the server to log events into a journal file, using the --journal flag:

$ hq server start --journal /path/to/journal

If the server is stopped or it crashes, and you use the same command to start the server (using the same journal file path), it will continue from the last point:

$ hq server start --journal /path/to/journal

This functionality restores the state of jobs and automatic allocation queues. However, it does not restore worker connections; in the current version, new workers have to be connected to the server after it restarts.

Warning

If the server crashes, the last few seconds of progress may be lost. For example, when a task is finished and the server crashes before the journal is written, then after resuming the server, the task will be recomputed.

Exporting journal events#

If you'd like to programmatically analyze events that are stored in the journal file, you can export them to JSON using the hq journal export command:

$ hq journal export <journal-path>

The events will be read from the provided journal and printed to stdout encoded in JSON, one event per line (this corresponds to line-delimited JSON, i.e. JSON Lines).

You can also directly stream events in real-time from the server using the hq journal stream command:

$ hq journal stream

Warning

The JSON format of the journal events and their definition is currently unstable and can change with a new HyperQueue version.

Pruning the journal#

The hq journal prune command removes all completed jobs and disconnected workers from the journal file, in order to reduce its size on disk.

Flushing the journal#

The hq journal flush command will force the server to flush the journal, so that the latest state of affairs is persisted to disk. It is mainly useful for testing or if you are going to run hq journal export while a server is running (however, it is usually better to use hq journal stream).

Waiting for server availability#

If you need to wait for a server to become available (for example when coordinating server startup in scripts), you can use the hq server wait command:

$ hq server wait

This command will repeatedly attempt to connect to the server (every 5 seconds) until it succeeds or until a timeout is reached. By default, it will wait for up to 5 minutes, but you can specify a custom timeout¹:

# Wait for up to 2 minutes
$ hq server wait --timeout 2m

This is particularly useful in deployment scripts where you start a server and then need to ensure it's ready before proceeding with other operations like connecting workers or submitting jobs.

Stopping the server#

You can stop a running server with the hq server stop command:

$ hq server stop

When a server is stopped, all running jobs and connected workers will be immediately stopped.

You can use various shortcuts for the timeout duration. ↩