To finalize cluster, we need to provide users a way to launch jobs from login nodes, and some compilers and libraries (gcc/g++/gfortran, openmpi).
User are now able to login on login1 node.
Using bob user we created before, we can now login on login1 node, and launch a very basic job using srun. However, srun will lock current shell during execution of the job, or if no resources are currently available, during the waiting time.
A better solution is to submit jobs to slurm, using sbatch. To do that, as bob, create a file bobjob.sh like this:
#!/bin/bash #SBATCH -J bobjob #SBATCH -o bobjob.out.%j #SBATCH -e bobjob.err.%j #SBATCH -N 1 #SBATCH -n 1 #SBATCH --ntasks-per-node=1 #SBATCH -p computenodes #SBATCH --exclusive #SBATCH -t 00:10:00 echo "###" date echo "###" hostname sleep 30s echo "###" date echo "###"
What does it contains?
Note that -N, -n and –ntasks-per-node are redundant. In normal time, you only have to specify two of them, and slurm will deduce the last one.
Now, we can submit the job using:
sbatch bobjob.sh
Slurm will return you the job id (here it gave me 25), and you can see what is the satus of the job now using squeue command. Here, just after submitting the job with sbatch, the squeue gives:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 25 computeno bobjob bob R 0:06 1 compute1
So here, our job 25 is running (R) on compute1, since 6 seconds. When job is completed (or even when it is occurring), you can check the log files, present in the same directory than the one where you submitted the job. For example here, we can check the content of bobjob.out.25:
### Wed Jul 20 17:34:56 CEST 2016 ### compute1.bull.local ### Wed Jul 20 17:35:26 CEST 2016 ###
You can find more on slurm usage in the slurm documentation.
Next steps of job submissions (OpenMP and MPI) assume you installed a GCC compiler and an OpenMPI stack. See bottom of the page to know how to do that.
Now bob want's to submit an OpenMP job. We will use and Hello world example, see OpenMP page of this website for sources.
Without using modules, and using a gcc provided in /soft/compilers/gcc/4.8.1 (for example), batsh script would be like this:
#!/bin/bash #SBATCH -J bobjobopenmp #SBATCH -o bobjobopenmp.out.%j #SBATCH -e bobjobopenmp.err.%j #SBATCH -N 1 #SBATCH --ntasks-per-node=1 #SBATCH -p computenodes #SBATCH --exclusive #SBATCH -t 00:10:00 echo "###" date echo "###" export LD_LIBRARY_PATH=/soft/compilers/gcc/4.8.1/lib64:$LD_LIBRARY_PATH export PATH=/soft/compilers/gcc/4.8.1/bin:$PATH export OMP_NUM_THREADS=8 /home/bob/./myopenprogram.exe echo "###" date echo "###"
Now bob want's to submit an MPI job, using 32 processes (so 4 nodes, because remember, in this tutorial our compute nodes have 8 cores each). We will use and Hello world example, see MPI page of this website for sources.
Without using modules, and using a gcc provided in /soft/compilers/gcc/4.8.1 and openmpi provided in /soft/tools/openmpi/1.6.4 (for example), batsh script would be like this:
#!/bin/bash #SBATCH -J bobjobopenmp #SBATCH -o bobjobopenmp.out.%j #SBATCH -e bobjobopenmp.err.%j #SBATCH -N 32 #SBATCH --ntasks-per-node=8 #SBATCH -p computenodes #SBATCH --exclusive #SBATCH -t 00:10:00 echo "###" date echo "###" export LD_LIBRARY_PATH=/soft/compilers/gcc/4.8.1/lib64:$LD_LIBRARY_PATH export PATH=/soft/compilers/gcc/4.8.1/bin:$PATH export LD_LIBRARY_PATH=/soft/tools/openmpi/1.6.4/lib:$LD_LIBRARY_PATH export PATH=/soft/tools/openmpi/1.6.4/bin:$PATH srun /home/bob/./myopenprogram.exe echo "###" date echo "###"
Note that here, slurm will automatically calculate that we need 4 nodes to do this job, because we said “I want 32 processes, with 8 per nodes”. But we could also have used:
#SBATCH -N 32 #SBATCH -n 4
or
#SBATCH -n 4 #SBATCH --ntasks-per-node=8
To allow users to compiler their code and to use MPI programs, we will now compile GCC (with support for gcc, g++ and gfortran), and OpenMPI. All these tools/libs will be put in the /soft directory, because we need them to be available on all compute nodes and login nodes. However, for flexibility, we don't want them to be installed inside the systems (/usr/lib, /usr/bin, /var/lib, etc). Why? Because in it's life, with many users, a cluster will need multiple versions of some libraries, multiple versions of some compilers, etc.Having one of them installed inside the system would break the ability to switch from one version to the other. We will provide a way to manage versions of compiler and libraries (and dependencies) using a tool called modules.
To do all of these next steps, you need to install few packages on the node where you will do compilation some packages (do that on nfs nodes, so no gcc or other compilers will be installed somewhere else accessible to users. Users will have to use the one you compiled):
yum install gcc gcc-c++ make m4 bzip2
We will keep the following organization on /soft:
etc.
See compilers page of this website on how to compiler a basic GCC. Just be sure to specify the prefix into /soft/compilers/gcc/yourgccversion.
tar xjvf gmp-6.1.1.tar.bz2 cd gmp-6.1.1 ./configure --prefix=/soft/tools/gmp/6.1.1 --enable-cxx make make install cd ../ tar xjvf mpfr-3.1.4.tar.bz2 cd mpfr-3.1.4 ./configure --prefix=/soft/tools/mpfr/3.1.4 --with-gmp=/soft/tools/gmp/6.1.1/ make make install cd ../ tar xzvf mpc-1.0.3.tar.gz cd mpc-1.0.3 ./configure --prefix=/soft/tools/mpc/1.0.3 --with-mpfr=/soft/tools/mpfr/3.1.4 --with-gmp=/soft/tools/gmp/6.1.1/ make make install cd ../ tar xjvf gcc-4.9.3.tar.bz2 export LD_LIBRARY_PATH=/soft/tools/gmp/6.1.1/lib/:/soft/tools/mpfr/3.1.4/lib/:/soft/tools/mpc/1.0.3/lib/:$LD_LIBRARY_PATH mkdir gcc-build cd gcc-build ../gcc-4.9.3/./configure --prefix=/soft/compilers/gcc/4.9.3 --enable-languages=c,c++,objc,obj-c++,fortran --disable-multilib --with-gmp=/soft/tools/gmp/6.1.1/ --with-mpfr=/soft/tools/mpfr/3.1.4/ --with-mpc=/soft/tools/mpc/1.0.3/ make make install
To use this GCC, just need to export libs and bin:
export LD_LIBRARY_PATH=/soft/tools/gmp/6.1.1/lib/:/soft/tools/mpfr/3.1.4/lib/:/soft/tools/mpc/1.0.3/lib/:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/soft/compilers/gcc/4.9.3/lib64:/soft/compilers/gcc/4.9.3/lib:$LD_LIBRARY_PATH export PATH=/soft/compilers/gcc/4.9.3/bin/:$PATH
We will use here a very basic OpenMPI. Have a look at libraries page on how to compile OpenMPI (use first way provided, OpenMPI configure will automatically detect what to do, this will be enough for this tutorial).
yum install slurm-devel
./configure FC=gfortran CXX=g++ CC=gcc --with-slurm --with-pmi=/usr --prefix=/soft/tools/openmpi/2.0.0
Have a look on the page dedicated to libraries of this website. Do not forget to specify position using prefix.