Remote execution of workflows
To run atomate2
in a remote cluster, repeat the steps of creating a conda environment and
installing the Python libraries on that cluster. Note that this is only possible for clusters you have passwordless
access to, which can be achieved by using ssh-keys
. The calculation request is initiated in your local computer
and is sent to the remote cluster through the jobflow_remote
library which is then actually run on that cluster
based on the input parameters you provided.
Jobflow-remote
consists of three parts — User, Runner, and Worker. The
first is where the user creates new Flows and adds them to the DB. It also allows checking the state of the Jobs
and analyzing/fixing failed ones. The second is where the daemon runs, taking care of advancing the state of the
Jobs. The third is where the Jobs are actually executed. All these should have a Python environment with at least
jobflow-remote
installed and configured. This is explained in the following sections. The full documentation of
jobflow-remote
can be found here.
Installing jobflow_remote
jobflow_remote
is a Python addition to jobflow
that allows to run jobflow
Jobs and Flows on remote resources.
As such, it should be installed separately into the same environment with atomate2
and jobflow
. To install,
simply activate the environment and run pip
command:
(atomate2-env) user@host:~$ pip install jobflow-remote
jf
command should become available in the environment.
Configuring jobflow_remote
Add the following jobflow_remote
configuration file to ~/.jfremote/timewarp.yaml
on your local computer.
Replace timewarp.yaml
with the name of your remote cluster. pre_run
invokes the commands run before running
the calculation. Modify this according to your needs.
timewarp.yaml:
name: timewarp
log_level: debug
workers:
timewarp_worker:
type: remote
interactive_login: false
scheduler_type: slurm
work_dir: /home/user/atomate2
pre_run: |
source ~/.bashrc
source atomate2-env/bin/activate
timeout_execute: 60
host: cluster.uni.edu
user: user
queue:
store:
type: MongoStore
host: localhost
database: timewarp
collection_name: queue
exec_config: {}
jobstore:
docs_store:
type: MongoStore
database: timewarp
host: localhost
port: 27017
collection_name: outputs
additional_stores:
data:
type: GridFSStore
database: timewarp
host: localhost
port: 27017
collection_name: outputs_blobs
work_dir
is the working directory (i.e., directory where calculations are run), and
pre_run
commands activate the Python environment with atomate2
and jobflow-remote
installed on the remote cluster.
One can get the full stub of the config file with dummy values by running jf project generate
.
After that, we configure the commands to use both on local and remote machines. On the remote server, create the
file ~/.config/atomate2/atomate2.yaml
with the following content.
atomate2.yaml:
AIMS_CMD: srun aims.x > aims.out
Then add the following line to the ~/.bashrc
on the remote computer,
export ATOMATE2_CONFIG_FILE="${HOME}/.config/atomate2/atomate2.yaml"
We have to then start the jf-runner on the local computer with the following command:
jf runner start
Configuring database
For the remote server calculations, we store the data in the MongoDB database timewarp
with the collections queue
, outputs
and outputs_blobs
. Create these through mongosh
with,
```mongo
use timewarp;
db.createCollection("queue");
db.createCollection("outputs");
db.createCollection("outputs_blobs");
```
The collections are similar to the case of running locally with one addition: queue
collection will be used by
jobflow-remote
to store the state of the queue.
After creating the database, we can reset it by issuing the command:
We have to then start the jf-runner on the local computer with the following command:
jf admin reset
Note that this command has to be run with stopped runner. If the runner is running (you can check by looking at jf
runner status
), you have to stop it and re-start after DB reset:
jf runner stop
jf admin reset
jf runner start