Remote execution of workflows
To run atomate2 in a remote cluster, repeat the steps of creating a conda environment and
installing the Python libraries on that cluster. Note that this is only possible for clusters you have passwordless
access to, which can be achieved by using ssh-keys. The calculation request is initiated in your local computer
and is sent to the remote cluster through the jobflow_remote library which is then actually run on that cluster
based on the input parameters you provided.
Jobflow-remote consists of three parts — User, Runner, and Worker. The
first is where the user creates new Flows and adds them to the DB. It also allows checking the state of the Jobs
and analyzing/fixing failed ones. The second is where the daemon runs, taking care of advancing the state of the
Jobs. The third is where the Jobs are actually executed. All these should have a Python environment with at least
jobflow-remote installed and configured. This is explained in the following sections. The full documentation of
jobflow-remote can be found here.
Installing jobflow_remote
jobflow_remote is a Python addition to jobflow that allows to run jobflow Jobs and Flows on remote resources.
As such, it should be installed separately into the same environment with atomate2 and jobflow. To install,
simply activate the environment and run pip command:
(atomate2-env) user@host:~$ pip install jobflow-remote
jf command should become available in the environment.
Configuring jobflow_remote
Add the following jobflow_remote configuration file to ~/.jfremote/timewarp.yaml on your local computer.
Replace timewarp.yaml with the name of your remote cluster. pre_run invokes the commands run before running
the calculation. Modify this according to your needs.
timewarp.yaml:
name: timewarp
log_level: debug
workers:
timewarp_worker:
type: remote
interactive_login: false
scheduler_type: slurm
work_dir: /home/user/atomate2
pre_run: |
source ~/.bashrc
source atomate2-env/bin/activate
timeout_execute: 60
host: cluster.uni.edu
user: user
queue:
store:
type: MongoStore
host: localhost
database: timewarp
collection_name: queue
exec_config: {}
jobstore:
docs_store:
type: MongoStore
database: timewarp
host: localhost
port: 27017
collection_name: outputs
additional_stores:
data:
type: GridFSStore
database: timewarp
host: localhost
port: 27017
collection_name: outputs_blobs
work_dir is the working directory (i.e., directory where calculations are run), and
pre_run commands activate the Python environment with atomate2 and jobflow-remote installed on the remote cluster.
One can get the full stub of the config file with dummy values by running jf project generate.
After that, we configure the commands to use both on local and remote machines. On the remote server, create the
file ~/.config/atomate2/atomate2.yaml with the following content.
atomate2.yaml:
AIMS_CMD: srun aims.x > aims.out
Then add the following line to the ~/.bashrc on the remote computer,
export ATOMATE2_CONFIG_FILE="${HOME}/.config/atomate2/atomate2.yaml"
We have to then start the jf-runner on the local computer with the following command:
jf runner start
Configuring database
For the remote server calculations, we store the data in the MongoDB database timewarp with the collections queue, outputs and outputs_blobs. Create these through mongosh with,
```mongo
use timewarp;
db.createCollection("queue");
db.createCollection("outputs");
db.createCollection("outputs_blobs");
```
The collections are similar to the case of running locally with one addition: queue collection will be used by
jobflow-remote to store the state of the queue.
After creating the database, we can reset it by issuing the command:
We have to then start the jf-runner on the local computer with the following command:
jf admin reset
Note that this command has to be run with stopped runner. If the runner is running (you can check by looking at jf
runner status), you have to stop it and re-start after DB reset:
jf runner stop
jf admin reset
jf runner start