Center for High Performance Computing and Big Data

MPI

We have several implementations of mpi available on the cluster. They are OpenMPI and MVAPICH.

Both openMPI and MVAPICH use ssh as a backend for talking between nodes. One issue when you first use ssh, is that you have to accept the destination machine's fingerprint. Another issue is having to type in your password each time you connect.

We have created a script in /opt/exec/bin to make this job alot easier. Just run 'sh /opt/exec/bin/login.sh', and the script will automatically generate new SSH keys for you, and then connect to every node and accept its fingerprint. Then you are all set to use MPI. Don't worry if you're a power user and already have ssh keys, the script will not clobber them.

OpenMPI

(preferred)

environment module:

module load openmpi/gnu/1.7.3

MPICH

environment module:

module load mpich2/gnu/1.5

MVAPICH2

environment module:

module load mvapich2/gnu/2.0a

Using MPI

With all the environment variables set, compiling with mpi is fairly straight forward. Just use mpicc to compile your project as if you were using gcc. 'mpicc cpi.c -o cpi' would compile the mpi code in cpi.c to a binary file named cpi.

To run an mpi program, you need to use mpirun. However running an mpi program with torque is a little bit tricky, so here is an example. Torque runs the script in a 'login' shell, so any environment variables you have in your current system are not carried over. This means that you should load the mpi module in the script.

#!/bin/sh -login
#PBS -l mem=1000mb
#PBS -l nodes=4:ppn=16
#PBS -l walltime=1:00:00
#PBS -N mpiexample
#PBS -j oe
	
## Automatically calculate the total number of processors
NP=$(cat ${PBS_NODEFILE} | wc -l)

#exports at runtime
module load openmpi/qlogic/gnu/1.7.3

#cd into the working directory 
cd ~/torque/mpi

#copy the file listing the nodes we can use to our working directory
cp ${PBS_NODEFILE} ./

#display the command we are going to run, then run it.
echo mpirun -machinefile ${PBS_NODEFILE} -n ${NP} cpi
mpirun -machinefile ${PBS_NODEFILE} -n ${NP} cpi

There's alot of stuff going on in this script, so let's break it down one step at a time. The top PBS comments are similar to what we've seen before. However there is some extra stuff to integrate mpi with torque. For more detailed information on torque scripts for MPI, please refer to the mpi section of the torque script page.

Because this is a login shell, we need to re-specify our environment variables. This is also a good idea to make sure our environment is set up how we want. This new shell starts in your home directory, so we need to cd back into the working directory for our program. To make our lives easier, we can copy the pbs nodefile to our working directory to review later.

The last step in the script is to run mpirun with our program. We first specify the list of nodes as a machinefile, so mpi uses this list when picking what nodes to run on. Then we tell it to generate as many threads as we previously calculated. Then we specify the program to run.