Remote execution of workflows

To run atomate2 in a remote cluster, repeat the steps of creating a conda environment and installing the Python libraries on that cluster. Note that this is only possible for clusters you have passwordless access to, which can be achieved by using ssh-keys. The calculation request is initiated in your local computer and is sent to the remote cluster through the jobflow_remote library which is then actually run on that cluster based on the input parameters you provided.

Jobflow-remote consists of three parts — User, Runner, and Worker. The first is where the user creates new Flows and adds them to the DB. It also allows checking the state of the Jobs and analyzing/fixing failed ones. The second is where the daemon runs, taking care of advancing the state of the Jobs. The third is where the Jobs are actually executed. All these should have a Python environment with at least jobflow-remote installed and configured. This is explained in the following sections. The full documentation of jobflow-remote can be found here.

Installing `jobflow_remote`

jobflow_remote is a Python addition to jobflow that allows to run jobflow Jobs and Flows on remote resources. As such, it should be installed separately into the same environment with atomate2 and jobflow. To install, simply activate the environment and run pip command:

(atomate2-env) user@host:~$ pip install jobflow-remote

After that jf command should become available in the environment.

Configuring `jobflow_remote`

Add the following jobflow_remote configuration file to ~/.jfremote/timewarp.yaml on your local computer. Replace timewarp.yaml with the name of your remote cluster. pre_run invokes the commands run before running the calculation. Modify this according to your needs. timewarp.yaml:

name: timewarp
log_level: debug
workers:
  timewarp_worker:
    type: remote
    interactive_login: false
    scheduler_type: slurm
    work_dir: /home/user/atomate2
    pre_run: |
      source ~/.bashrc
      source atomate2-env/bin/activate
    timeout_execute: 60
    host: cluster.uni.edu
    user: user
queue:
  store:
    type: MongoStore
    host: localhost
    database: timewarp
    collection_name: queue
exec_config: {}
jobstore:
  docs_store:
    type: MongoStore
    database: timewarp
    host: localhost
    port: 27017
    collection_name: outputs
  additional_stores:
    data:
      type: GridFSStore
      database: timewarp
      host: localhost
      port: 27017
      collection_name: outputs_blobs

Here, work_dir is the working directory (i.e., directory where calculations are run), and pre_run commands activate the Python environment with atomate2 and jobflow-remote installed on the remote cluster.

One can get the full stub of the config file with dummy values by running jf project generate.

After that, we configure the commands to use both on local and remote machines. On the remote server, create the file ~/.config/atomate2/atomate2.yaml with the following content.

atomate2.yaml:

AIMS_CMD: srun aims.x > aims.out

Then add the following line to the ~/.bashrc on the remote computer,

export ATOMATE2_CONFIG_FILE="${HOME}/.config/atomate2/atomate2.yaml"

We have to then start the jf-runner on the local computer with the following command:

jf runner start

Configuring database

For the remote server calculations, we store the data in the MongoDB database timewarp with the collections queue, outputs and outputs_blobs. Create these through mongosh with,

```mongo
use timewarp;
db.createCollection("queue");
db.createCollection("outputs");
db.createCollection("outputs_blobs");
```

The collections are similar to the case of running locally with one addition: queue collection will be used by jobflow-remote to store the state of the queue.

After creating the database, we can reset it by issuing the command:
We have to then start the jf-runner on the local computer with the following command:

jf admin reset

Note that this command has to be run with stopped runner. If the runner is running (you can check by looking at jf runner status), you have to stop it and re-start after DB reset:

jf runner stop
jf admin reset
jf runner start

Remote execution of workflows

Installing jobflow_remote

Configuring jobflow_remote

Configuring database

Installing `jobflow_remote`

Configuring `jobflow_remote`