8.16. Using PBS and SLURMΒΆ
Running parallel_stereo
can be very computationally expensive, so
often it is launched on high-performance multi-machine systems. Here
it will be shown how to run this program on a Portable Batch System
(PBS) setup, such as NASA Pleiades, and on a Simple Linux Utility
for Resource Management (SLURM) system.
In either of these, it is assumed that all compute nodes share storage space and are able communicate with ssh without password.
On a PBS system, one can have a script as follows:
#!/bin/bash
# Change to current directory
cd $PBS_O_WORKDIR
# Set the path to ASP tools
export PATH=/path/to/ASP/bin:$PATH
# Run parallel_stereo
parallel_stereo --stereo-algorithm asp_mgm \
--processes 4 --subpixel-mode 3 -t rpc \
--nodes-list $PBS_NODEFILE \
left.tif right.tif left.xml right.xml \
run/run
# Run point2dem
point2dem run/run-PC.tif
Note the two special environmental variables PBS_O_WORKDIR
and PBS_NODEFILE
which refer to the current work directory in which the script is started, and the
list of nodes allocated for the job.
Ensure the option --nodes-list
is set, otherwise only the head node
will be used.
This script, named for example, run.sh
, can be launched as:
qsub -m n -r n -N jobName -l walltime=12:00:00 \
-W group_list=yourGroup -j oe -S /bin/bash \
-l select=8:ncpus=20:model=ivy -- $(pwd)/run.sh <args>
Additional arguments can be passed in on this line to run.sh
,
which can be accessed from within that script as $1
, $2
, etc.,
per bash shell conventions.
It is strongly suggested to learn what each of the above options does and adjust them for your needs.
With SLURM, a script as follows can work:
#!/bin/bash
#SBATCH --job-name=asp
#SBATCH --output=asp.log
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=36
#SBATCH --time=50:00:00
#SBATCH --partition=queue1
# Change to the directory in which the job was submitted
cd $SLURM_SUBMIT_DIR
# Create a temporary list of nodes in current directory
nodesList=$(mktemp -p $(pwd))
# Set up the nodes list
scontrol show hostname $SLURM_NODELIST | tr ' ' '\n' > $nodesList
# Run parallel_stereo. (Ensure that this program is in the path.)
parallel_stereo --nodes-list $nodesList \
--processes 4 \
--parallel-options '--sshdelay 0.1' \
<other ASP options>
# Delete the temporary list of nodes
/bin/rm -fv $nodesList
As before, the options and values above should be adjusted for your needs.
Ensure the option --nodes-list
is set, otherwise only the head node
will be used.
If your SLURM setup requires a custom ssh port, set in the list of nodes the full ssh command to each node, rather than the node name. Example:
ssh -p port1 node1
ssh -p port2 node2