We will now deploy compute nodes, and add it to the slurm pool. Compute nodes should be installed using PXE (see managing cluster chapter on how to do that).
All commands here have to be done on the compute node, except when specified somewhere else. We will also assume that we work on a freshly installed (first reboot after PXE OS installation) node, mycompute1.
First thing to do is to setup repository client, to be able to install packages.
Simple way is to upload files from batman. This is why we updated the repository file on batman after installing the http server (so that this file is the same for all nodes on the cluster, even for batman). To copy the files from batman to freshly installed compute1 node, on batman use:
scp /etc/yum.repos.d/os_base.local.repo mycompute1:/etc/yum.repos.d/os_base.local.repo scp /etc/yum.repos.d/own.local.repo mycompute1:/etc/yum.repos.d/own.local.repo ssh mycompute1 "rm -f /etc/yum.repos.d/CentOS-*" ssh mycompute1 "yum clean all; yum update -y"
We need node to use a static ip. Login using root, and edit network file /etc/sysconfig/network-scripts/ifcfg-enp0s3 as follows (here, it is for mycompute1) :
DEVICE="enp0s3" NAME="enp0s3" TYPE="Ethernet" NM_CONTROLLED=no ONBOOT="yes" BOOTPROTO="static" IPADDR="10.0.3.1" NETMASK=255.255.0.0
Apply using (or reboot):
systemctl restart network
Disable network manager, and disable firewalld :
systemctl disable NetworkManager systemctl stop NetworkManager systemctl disable firewalld.service systemctl stop firewalld.service
As for repositories, you can upload the same file than the one on batman. To do so, on batman, use the following command:
scp /etc/resolv.conf mycompute1:/etc/resolv.conf
Or do it manually. To do so, edit /etc/resolv.conf as following, to tell host where to find dns service:
search sphen.local nameserver 10.0.0.1
Add ntp server ip to local configuration. If using centos, use:
sed -i.bak '/centos.pool.ntp.org/ d' /etc/ntp.conf
Else if RHEL:
sed -i.bak '/rhel.pool.ntp.org/ d' /etc/ntp.conf
Then add ip:
echo "server 10.0.0.1 iburst" >> /etc/ntp.conf
And start ntpd and sync with server:
systemctl start ntpd systemctl enable ntpd ntpq -p
To install munge and slurm, do the same steps than described in main server installation.
But instead of generating a munge key, we will copy the one from the master to the server (so both have the same file).
yum install munge munge-libs
Then from batman :
scp /etc/munge/munge.key mycompute1:/etc/munge/munge.key
Now do the remaining configuration:
chmod 0400 /etc/munge/munge.key chown munge:munge /etc/munge/munge.key mkdir /var/run/munge chown munge:munge /var/run/munge -R chmod -R 0755 /var/run/munge systemctl start munge systemctl enable munge
Same for slurm.conf file. We will copy it from batman. Also, instead of launching slurmctld service at the end, launch and enable slurmd.
Install rpm, and create all required directory and also the slurm user:
groupadd -g 777 slurm useradd -m -c "Slurm workload manager" -d /etc/slurm -u 777 -g slurm -s /bin/bash slurm yum install slurm slurm-munge mkdir /var/spool/slurmd chown -R slurm:slurm /var/spool/slurmd mkdir /etc/slurm/SLURM chown -R slurm:slurm /etc/slurm/SLURM chmod 0755 -R /var/spool/slurmd mkdir /var/log/slurm/ chown -R slurm:slurm /var/log/slurm/
Copy from batman the slurm.conf file to compute:
scp /etc/slurm/slurm.conf mycompute1:/etc/slurm/slurm.conf
Then start slurm server:
systemctl start slurmd systemctl enable slurmd
To test when failing to start, use -D -vvvvvv:
slurmd -D -vvvvvv
On each compute node, mount /hpc-softwares and /home. Install needed packages:
yum -y install nfs-utils
Then start needed services:
systemctl start rpcbind systemctl enable rpcbind
Create soft directory, /home should already be there:
mkdir /hpc-softwares
Then edit /etc/fstab, and add at the end:
10.0.1.1:/hpc-softwares /hpc-softwares nfs ro,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0 10.0.1.1:/home /home nfs rw,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0
And mount the directories:
mount /hpc-softwares mount /home
LDAP configuration is easy on client side. First, install needed packages:
yum -y install openldap-clients nss-pam-ldapd
Then, tell client where server is and what is base domain to use:
authconfig --enableldap --enableldapauth --ldapserver=10.0.0.1 --ldapbasedn="dc=sphen,dc=local" --enablemkhomedir --update
Now, to activate SSL exchanges:
echo "TLS_REQCERT allow" >> /etc/openldap/ldap.conf echo "tls_reqcert allow" >> /etc/nslcd.conf authconfig --enableldaptls --update
That’s all for a compute node. We will see later tools to automate this installation.