Server
The server is a crucial component of HyperQueue which manages workers and jobs. Before running any computations or deploying workers, you must first start the server.
Starting the server#
The server can be started by running the following command:
$ hq server start
You can change the hostname under which the server is visible to workers with the --host option:
$ hq server start --host=HOST
Server directory#
When the server is started, it creates a server directory where it stores information needed for submitting jobs and connecting workers. This directory is then used to select a running HyperQueue instance.
By default, the server directory will be stored in $HOME/.hq-server
. This location may be changed with the option --server-dir=<PATH>
, which is available for all HyperQueue CLI commands. You can run more instances of HyperQueue under the same Unix user, by making them use different server directories.
If you use a non-default server directory, make sure to pass the same --server-dir
to all HyperQueue commands that should use the selected HyperQueue server:
$ hq --server-dir=foo server start
$ hq --server-dir=foo worker start
Important
When you start the server, it will create a new subdirectory in the server directory, which will store the data of the current running instance. It will also create a symlink hq-current
which will point to the currently active subdirectory. Using this approach, you can start a server using the same server directory multiple times without overwriting data of the previous runs.
Server directory access
Encryption keys are stored in the server directory. Whoever has access to the server directory may submit jobs, connect workers to the server and decrypt communication between HyperQueue components. By default, the directory is only accessible by the user who started the server.
Keeping the server alive#
The server is supposed to be a long-lived component. If you shut it down, all workers will disconnect and all computations will be stopped. Therefore, it is important to make sure that the server will stay running e.g. even after you disconnect from a cluster where the server is deployed.
For example, if you SSH into a login node of an HPC cluster and then run the server like this:
$ hq server start
The server will quit when your SSH session ends, because it will receive a SIGHUP signal. You can use established Unix approaches to avoid this behavior, for example prepending the command with nohup or using a terminal multiplexer like tmux.
Resuming stopped/crashed server#
When a server is started with a journal, it may be resumed even when a server crashed. Journal is a file where server writes a serie of events.
You can start the server as follows:
$ hq server start --journal /path/to/journal
If server is stopped or crashed, and you use the same command to start the server and it will continue from the last point:
$ hq server start --journal /path/to/journal
Warning
This functionality resumes the state of jobs and auto allocation queues, not worker connections. In the current version, new workers have to be connected to the server when a new server is started.
Warning
If the server crashes, last few seconds of progress may be lost. For example when a task is finished and the server crashes before the journal is written, then after resumming the server, it will appear as not computed.
Stopping server#
You can stop a running server with the following command:
$ hq server stop
When a server is stopped, all running jobs and connected workers will be immediately stopped.
Created: November 2, 2021