====== Compute nodes ======
We will now deploy compute nodes, and add it to the slurm pool. Compute nodes should be installed using PXE (see [[system:linux_cluster:managing_cluster|managing cluster]] chapter on how to do that).\\
All commands here have to be done on the compute node, except when specified somewhere else. We will also assume that we work on a freshly installed (first reboot after PXE OS installation) node, mycompute1.
===== Repository =====
First thing to do is to setup repository client, to be able to install packages.
Simple way is to upload files from batman. This is why we updated the repository file on batman after installing the http server (so that this file is the same for all nodes on the cluster, even for batman). To copy the files from batman to freshly installed compute1 node, __on batman__ use:
scp /etc/yum.repos.d/os_base.local.repo mycompute1:/etc/yum.repos.d/os_base.local.repo
scp /etc/yum.repos.d/own.local.repo mycompute1:/etc/yum.repos.d/own.local.repo
ssh mycompute1 "rm -f /etc/yum.repos.d/CentOS-*"
ssh mycompute1 "yum clean all; yum update -y"
===== Network =====
We need node to use a static ip. Login using root, and edit network file **/etc/sysconfig/network-scripts/ifcfg-enp0s3** as follows (here, it is for mycompute1) :
DEVICE="enp0s3"
NAME="enp0s3"
TYPE="Ethernet"
NM_CONTROLLED=no
ONBOOT="yes"
BOOTPROTO="static"
IPADDR="10.0.3.1"
NETMASK=255.255.0.0
Apply using (or reboot):
systemctl restart network
===== Firewall =====
Disable network manager, and disable firewalld :
systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl disable firewalld.service
systemctl stop firewalld.service
===== Dns =====
As for repositories, you can upload the same file than the one on batman. To do so, on batman, use the following command:
scp /etc/resolv.conf mycompute1:/etc/resolv.conf
Or do it manually. To do so, edit /etc/resolv.conf as following, to tell host where to find dns service:
search sphen.local
nameserver 10.0.0.1
===== Ntp =====
Add ntp server ip to local configuration. If using centos, use:
sed -i.bak '/centos.pool.ntp.org/ d' /etc/ntp.conf
Else if RHEL:
sed -i.bak '/rhel.pool.ntp.org/ d' /etc/ntp.conf
Then add ip:
echo "server 10.0.0.1 iburst" >> /etc/ntp.conf
And start ntpd and sync with server:
systemctl start ntpd
systemctl enable ntpd
ntpq -p
===== Munge and Slurm =====
To install munge and slurm, do the same steps than described in main server installation.
But instead of generating a munge key, we will copy the one from the master to the server (so both have the same file).
yum install munge munge-libs
Then from //batman// :
scp /etc/munge/munge.key mycompute1:/etc/munge/munge.key
Now do the remaining configuration:
chmod 0400 /etc/munge/munge.key
chown munge:munge /etc/munge/munge.key
mkdir /var/run/munge
chown munge:munge /var/run/munge -R
chmod -R 0755 /var/run/munge
systemctl start munge
systemctl enable munge
Same for slurm.conf file. We will copy it from batman. Also, instead of launching slurmctld service at the end, launch and enable slurmd.
Install rpm, and create all required directory and also the slurm user:
groupadd -g 777 slurm
useradd -m -c "Slurm workload manager" -d /etc/slurm -u 777 -g slurm -s /bin/bash slurm
yum install slurm slurm-munge
mkdir /var/spool/slurmd
chown -R slurm:slurm /var/spool/slurmd
mkdir /etc/slurm/SLURM
chown -R slurm:slurm /etc/slurm/SLURM
chmod 0755 -R /var/spool/slurmd
mkdir /var/log/slurm/
chown -R slurm:slurm /var/log/slurm/
Copy from batman the slurm.conf file to compute:
scp /etc/slurm/slurm.conf mycompute1:/etc/slurm/slurm.conf
Then start slurm server:
systemctl start slurmd
systemctl enable slurmd
To test when failing to start, use -D -vvvvvv:
slurmd -D -vvvvvv
===== Nfs =====
On each compute node, mount /hpc-softwares and /home.
Install needed packages:
yum -y install nfs-utils
Then start needed services:
systemctl start rpcbind
systemctl enable rpcbind
Create soft directory, /home should already be there:
mkdir /hpc-softwares
Then edit /etc/fstab, and add at the end:
10.0.1.1:/hpc-softwares /hpc-softwares nfs ro,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0
10.0.1.1:/home /home nfs rw,rsize=32768,wsize=32768,intr,nfsvers=3,bg 0 0
And mount the directories:
mount /hpc-softwares
mount /home
===== Ldap =====
LDAP configuration is easy on client side. First, install needed packages:
yum -y install openldap-clients nss-pam-ldapd
Then, tell client where server is and what is base domain to use:
authconfig --enableldap --enableldapauth --ldapserver=10.0.0.1 --ldapbasedn="dc=sphen,dc=local" --enablemkhomedir --update
Now, to activate SSL exchanges:
echo "TLS_REQCERT allow" >> /etc/openldap/ldap.conf
echo "tls_reqcert allow" >> /etc/nslcd.conf
authconfig --enableldaptls --update
That’s all for a compute node. We will see later tools to automate this installation.