Using Rmpi on ARC ----------------- Example Rmpi script for SLURM The following is a brief example of how to run an Rmpi script on SLURM. This example uses the latest R version R/4.4.0-gfbf-2023a Create the following script and save it as ``Rmpi-test.R``:: # # Rmpi test # library("Rmpi") # Spawn (size-1) slaves, one being resevered for master. # size <- mpi.universe.size() mpi.spawn.Rslaves(nslaves = size - 1) # Identify each slave process and display a message # mpi.bcast.cmd(rank <- mpi.comm.rank()) mpi.bcast.cmd(size <- mpi.comm.size()) mpi.bcast.cmd(hostname <- mpi.get.processor.name()) mpi.remote.exec(paste("This is rank", rank, "of", size, "running on node: ", hostname)) # Close all slaves and finish # mpi.close.Rslaves(dellog = FALSE) mpi.quit() Now create a submission script called ``submit.sh`` containing the following lines:: #! /bin/bash #SBATCH --job-name=mpi-test #SBATCH --time=00:10:00 #SBATCH --nodes=2 #SBATCH --ntasks-per-node=8 #SBATCH --output=rmpi.out #SBATCH --partition=devel module load R/4.4.0-gfbf-2023a mpirun -np 1 R --vanilla -f Rmpi-test.R rm *.log The script above requests at total of 16 processes, where 8 are run on each of 2 nodes. The job is sent to the ``devel`` partition (hence maximum 10 minute run time) and the job output is sent to a file named ``rmpi.out`` To submit this job, use the command:: sbatch submit.sh After a few minutes the job should complete and a file named ``rmpi.out`` should be created in the submission directory, this should contain the following output:: $ cat rmpi.out R version Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > # > # Rmpi test > # > library("Rmpi") > > # Spawn (size-1) slaves, one being resevered for master. > # > size <- mpi.universe.size() > mpi.spawn.Rslaves(nslaves = size - 1) 15 slaves are spawned successfully. 0 failed. master (rank 0 , comm 1) of size 16 is running on: arc-c266 slave1 (rank 1 , comm 1) of size 16 is running on: arc-c266 slave2 (rank 2 , comm 1) of size 16 is running on: arc-c266 slave3 (rank 3 , comm 1) of size 16 is running on: arc-c266 ... ... ... slave14 (rank 14, comm 1) of size 16 is running on: arc-c267 slave15 (rank 15, comm 1) of size 16 is running on: arc-c267 > > # Identify each slave process and display a message > # > mpi.bcast.cmd(rank <- mpi.comm.rank()) > mpi.bcast.cmd(size <- mpi.comm.size()) > mpi.bcast.cmd(hostname <- mpi.get.processor.name()) > mpi.remote.exec(paste("This is rank", rank, "of", size, "running on node: ", hostname)) $slave1 [1] "This is rank 1 of 16 running on node: arc-c266" $slave2 [1] "This is rank 2 of 16 running on node: arc-c266" $slave3 [1] "This is rank 3 of 16 running on node: arc-c266" $slave4 [1] "This is rank 4 of 16 running on node: arc-c266" $slave5 [1] "This is rank 5 of 16 running on node: arc-c266" $slave6 [1] "This is rank 6 of 16 running on node: arc-c266" $slave7 [1] "This is rank 7 of 16 running on node: arc-c266" $slave8 [1] "This is rank 8 of 16 running on node: arc-c267" $slave9 [1] "This is rank 9 of 16 running on node: arc-c267" $slave10 [1] "This is rank 10 of 16 running on node: arc-c267" $slave11 [1] "This is rank 11 of 16 running on node: arc-c267" $slave12 [1] "This is rank 12 of 16 running on node: arc-c267" $slave13 [1] "This is rank 13 of 16 running on node: arc-c267" $slave14 [1] "This is rank 14 of 16 running on node: arc-c267" $slave15 [1] "This is rank 15 of 16 running on node: arc-c267" > > # Close all slaves and finish > # > mpi.close.Rslaves(dellog = FALSE) [1] 1 > mpi.quit() From the above output you can see the slave processes ran correctly with a total of 16 processes across two ARC compute nodes.