This tutorial tries to explain how to install a basic HPC cluster. First part will be dedicated to core installation, i.e. the bare minimum. Then additional features will be proposed.
This tutorial will focus on simplicity, lightness and security. All softwares used are very common and when facing an error, a quick look on the web will most of the time solves the issue.
Few words on names used:
What a “classic” HPC cluster is dedicated of? Purpose is to provide users compute resources. Users will be able to login on dedicated nodes (called login nodes), upload their code and data, then compile their code, and launch jobs (calculations) on the cluster.
What a “classic” HPC cluster is made of? Most of the time, a cluster is composed of:
(Original black and white image from Roger Rössing, otothek_df_roe-neg_0006125_016_Sch%C3%A4fer_vor_seiner_Schafherde_auf_einer_Wiese_im_Harz.jpg)
An HPC cluster can be seen like a sheep flock. The admin sys (shepherd), the management/io node (shepherd dog), and the compute/login nodes (sheep). This leads to two types of nodes, like cloud computing: pets (shepherd dog) and cattle (sheep). While the safety of your pets must be absolute for good production, losing cattle is common and considered normal.
In HPC, most of the time, management node, file system (io) nodes, etc, are considered as pets. On the other hand, compute nodes and login nodes are considered cattle.
Same philosophy apply for file systems: some must be safe, others can be faster but “losable”, and users have to understand it and take precautions. In this tutorial, /home will be considered safe, and /scratch fast but losable.
The cluster structure will be as follows:
In this configuration, there will be one master node (and optional a second one for HA). One NFS node will be deployed, for /home and /hpc-softwares (and another optional io node for fast file system, /scratch, with a parallel file system). The Slurm job scheduler will be used, and an LDAP server with web interface will also be installed for user managements. Login nodes and calculation nodes will then be deployed on the fly with PXE. The cluster will be monitored (optional) using Nagios.
All services can be divided in dedicated VM, for easy backup, test and update.
Network information (IP will be the same for both type of cluster) :
Netmask will be 255.255.0.0 (/16)
Domain name will be sphen.local
Note: if you plan to test this tutorial in virtualbox, 10.0.X.X range may already been taken by virtualbox NAT. In this case, use 10.1.X.X for this tutorial.
The administration (and optional HA) network will be 172.16.0.1 for batman (on enp0s8) and 172.16.0.21 for optional robin (on enp0s8). Netmask will be /24 on this network. We will connect on this network to manage cluster, and when enforcing security, connection to management nodes will only be possible through this network.
The infiniband network (optional) will be with same ip pattern than Ethernet network, but with 10.100.x.x/16 range.
All nodes will be installed with a minimal install Centos 7.2 (but this tutorial also works fine with an RHEL 7.2 and should also work with higher versions of RHEL/Centos 7). Few tips are provided in the Prepare install page. I provide some rpm for system administration (munge, slurm, nagios, etc) and other rpm for HPC softwares (gcc, fftw, etc).
Final notes before we start:
Some parts are optional, feel free to skip them. Also, it is strongly recommended to install first the bare minimum cluster (management node with repository, DHCP, DNS, PXE, NTP, ldap, Slurm, then nfs node, then login and compute nodes), check if it works, and only after that install the optional parts (nagios, parallel file system, etc).