====== Cluster Deployment Management Tools ======
{{ :system:linux_cluster:ghot-in-the-shell-end.jpg?500 |}}
There are many deployment and management tools. Here is my recommended list :
* Big HPC cluster : Salt Stack
* Small HPC cluster : Ansible
* Very heterogeneous managed stuff : Puppet
My hearth goes to Salt and Ansible. Salt is really flexible and has an important thing for HPC or Cloud : loops. This is a personal choice.
===== Salt Stack =====
==== Basic commands ====
Ensure all minion are up to date :
salt '*' state.highstate -v
Replace '*' by the hostname or the ip of the minion you want to update only the state.
To get all the process done on a specific minion, login on it with ssh, then run :
salt-call -l debug state.highstate
This is a good way to debug and see what is going on.
==== Tip ====
Use jinja in top files, and share a file containing all servers ip for both pillar and state top :
Pillar top file (/srv/pillar/top.sls):
{% import_yaml 'servers.sls' as vars %}
base:
'{{vars.servers.dhcp.ip}}':
- pkgs
- services
- dhcp-server
State top file (/srv/salt/top.sls):
{% import_yaml 'servers.sls' as vars %}
base:
'{{vars.servers.dhcp.ip}}':
- repository.client
- dhcp.server
Then edit your server.sls file in /srv/pillar/server.sls :
servers:
repository:
name: repo0
ip: 172.16.0.12
dhcp:
name: dhcp0
ip: 172.16.0.12
To finish, add a link in linux to allow state top to read also this file :
ln -s /srv/pillar/servers.sls /srv/salt/servers.sls
Using this, all Salt is dynamic !! :-D
==== Jinja2 ====
Basic jinja2 syntax :
Get a value/string from pillar (here pillar network, value of subnet) :
{{ salt['pillar.get']('network:subnet') }}
Loop on a pillar last level list :
{% for rangeip in salt['pillar.get']('network:dhcp:range') %}
range {{rangeip}};
{% endfor %}
Loop on a pillar non last level list :
{% for host, args in salt['pillar.get']('nodes', {}).items() %}
host {{ host }} {
hardware ethernet {{ args.hwaddr }};
fixed-address {{ args.ip }};
}
{% endfor %}
Split a string into a list using a specific character as separator (useful for DNS configuration files !):
{% set list1 = salt['pillar.get']('network:subnet').split('.') %}
{% if salt['pillar.get']('network:netmask') == '255.255.255.0' %}
option broadcast-address {{ list1[0] }}.{{ list1[1] }}.{{ list1[2] }}.255;
{% elif salt['pillar.get']('network:netmask') == '255.255.0.0' %}
option broadcast-address {{ list1[0] }}.{{ list1[1] }}.225.255;
{% elif salt['pillar.get']('network:netmask') == '255.0.0.0' %}
option broadcast-address {{ list1[0] }}.255.225.255;
{% else %}
option broadcast-address CANNOT UNDERSTAND NETMASK !!! See dhcp/dhcpd.conf.jinja;
{% endif %}
More : http://jinja.pocoo.org/docs/dev/templates/
===== Ansible =====
==== Install ====
git clone git://github.com/ansible/ansible.git --recursive
tar cvzf ansible.tar.gz ansible
mkdir /home/yourlogin/pip
pip install --ignore-installed --target=/home/yourlogin/pip --install-option="--install-purelib=/home/yourlogin/pip" paramiko PyYAML Jinja2 httplib2 six
tar cvzf pip.tar.gz /home/yourlogin/pip
Then on master node :
cd /root
tar xvzf ansible.tar.gz
tar xvzf pip.tar.gz
Add nodes into ansible nodes file (/root/ansible_hosts) :
[repo]
10.0.0.2
[dhcp]
10.0.0.3
[pxe]
10.0.0.4
Now launch ansible environnement (must be done each time you will use ansible) :
source /root/ansible/hacking/env-setup
export PYTHONPATH=/root/pip/:$PYTHONPATH
Set node list file :
export ANSIBLE_INVENTORY=/root/ansible_hosts
And check everything is ok (for this example, only 2 of the 3 other nodes are online, 4 is offline to display the error) :
ansible all -m ping
10.0.0.2 | SUCCESS => {
"changed": false,
"ping": "pong"
}
10.0.0.3 | SUCCESS => {
"changed": false,
"ping": "pong"
}
10.0.0.4 | FAILED! => {
"failed": true,
"msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue"
}
==== Playbooks example ====
To execute a playbook on nodes :
ansible-playbook /root/nodes/repo/playbooks/default.pb
If something goes wrong, you can see debug using -v (-vv -vvv -vvvv etc for level of debugs). Note also that ansible will stop when an error is detected. You can ignore errors using ignore_errors :
- name: Say hello
command: echo "hello"
ignore_errors: true
=== Repository ===
---
- hosts: repo
remote_user: root
tasks:
###############################################################
########### vsftpd server installation
###
- name: Installing vsftpd rpm
command: chdir=/mnt/Packages/ rpm -ivh vsftpd-3.0.2-9.el7.x86_64.rpm
- name: Enable vsftpd on start
command: systemctl enable vsftpd
- name: Start vsftpd on start
command: systemctl start vsftpd
- name: Installing libxml2-python rpm
command: chdir=/mnt/Packages/ rpm -ivh libxml2-python-2.9.1-5.el7_0.1.x86_64.rpm
- name: Installing deltarpm rpm
command: chdir=/mnt/Packages/ rpm -ivh deltarpm-3.6-3.el7.x86_64.rpm
- name: Installing python-deltarpm rpm
command: chdir=/mnt/Packages/ rpm -ivh python-deltarpm-3.6-3.el7.x86_64.rpm
- name: Installing createrepo rpm
command: chdir=/mnt/Packages/ rpm -ivh createrepo-0.9.9-23.el7.noarch.rpm
###############################################################
########### copy rpm and create repository
###
- file: path=/var/ftp/pub/localrepo state=directory mode=0755
- name: Copy packages from DVD/iso to repository and repository configuration file
shell: cp -ar /mnt/Packages/*.* /var/ftp/pub/localrepo/
- copy: src=/root/nodes/repo/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
- name: Create repository
command: createrepo -v /var/ftp/pub/localrepo/
- name: Restore SEl flags
command: restorecon -R /var/ftp
###############################################################
########### remove online repositories and update yum
###
- file: path=/etc/yum.repos.d.old state=directory mode=0755
- name: Move online repo
shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
- name: Restore SEl flags
command: restorecon -R /etc/yum.repos.d
- name: Yum list repo
command: yum repolist
- name: Yum clean
command: yum clean all
- name: Yum update
command: yum update
###############################################################
########### disable firewall and set selinux to permissive
###
- name: Stop firewalld
command: systemctl stop firewalld
- name: Disable firewalld and copy selinx configuration file
command: systemctl disable firewalld
- copy: src=/root/nodes/repo/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
- name: Restore SEl flags
command: restorecon /etc/selinux/config
###############################################################
########### check and reboot
###
- name: Make sure vsftpd is running
service: name=vsftpd state=running
- name: Restart node
command: /sbin/reboot
async: 0
poll: 0
ignore_errors: true
- name: Waiting for node to come back
local_action: wait_for host={{ inventory_hostname }}
state=started
port=22
delay=1
timeout=300
sudo: false
=== dhcp ===
---
- hosts: dhcp
remote_user: root
tasks:
###############################################################
########### add local repository, remove online repositories and update yum
###
- copy: src=/root/nodes/dhcp/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
- file: path=/etc/yum.repos.d.old state=directory mode=0755
- name: Move online repo
shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
- name: Restore SEl flags
command: restorecon -R /etc/yum.repos.d
- name: Yum list repo
command: yum repolist
- name: Yum clean
command: yum clean all
- name: Yum update
command: yum update
###############################################################
########### install dhcpd server
###
- name: Install dhcpd server
yum: name=dhcp state=latest
- copy: src=/root/nodes/dhcp/files/default.dhcpd.conf dest=/etc/dhcp/dhcpd.conf owner=root group=root mode=0644
- name: Restore SEl flags
command: restorecon /etc/dhcp/dhcpd.conf
- name: Enable dhcpd on start
command: systemctl enable dhcpd.service
- name: Start dhcpd on start
command: systemctl start dhcpd.service
###############################################################
########### disable firewall and set selinux to permissive
###
- name: Stop firewalld
command: systemctl stop firewalld
- name: Disable firewalld and copy selinx configuration file
command: systemctl disable firewalld
- copy: src=/root/nodes/dhcp/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
- name: Restore SEl flags
command: restorecon /etc/selinux/confi
###############################################################
########### check and reboot
###
- name: Make sure dhcpd is running
service: name=dhcpd state=running
- name: Restart node
command: /sbin/reboot
async: 0
poll: 0
ignore_errors: true
- name: Waiting for node to come back
local_action: wait_for host={{ inventory_hostname }}
state=started
port=22
delay=1
timeout=300
sudo: false
=== PXE ===
---
- hosts: pxe
remote_user: root
tasks:
###############################################################
########### add local repository, remove online repositories and update yum
###
- copy: src=/root/nodes/pxe/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
- file: path=/etc/yum.repos.d.old state=directory mode=0755
- name: Move online repo
shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
- name: Restore SEl flags
command: restorecon -R /etc/yum.repos.d
- name: Yum list repo
command: yum repolist
- name: Yum clean
command: yum clean all
- name: Yum update
command: yum update
###############################################################
########### install tftp, xinetd and vsftpd
###
- name: Install tftp
yum: name=tftp state=latest
- name: Install tftp-server
yum: name=tftp-server state=latest
- name: Install xinetd
yum: name=xinetd state=latest
- copy: src=/root/nodes/pxe/files/default.tftp dest=/etc/xinetd.d/tftp owner=root group=root mode=0640
- name: Restore SEl flags
command: restorecon /etc/xinetd.d/tftp
- name: Start xinetd
command: systemctl start xinetd
- name: Enable xinetd
command: systemctl enable xinetd
- name: Install syslinux
yum: name=syslinux state=latest
- name: Install wget
yum: name=wget state=latest
- name: Install vsftpd
yum: name=vsftpd state=latest
###############################################################
########### copy files for pxe boot
###
- name: Copy pxelinux.0
command: cp -v /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot
- name: Copy menu.c32
command: cp -v /usr/share/syslinux/menu.c32 /var/lib/tftpboot
- name: Copy memdisk
command: cp -v /usr/share/syslinux/memdisk /var/lib/tftpboot
- name: Copy mboot.c32
command: cp -v /usr/share/syslinux/mboot.c32 /var/lib/tftpboot
- name: Copy chain.c32
command: cp -v /usr/share/syslinux/chain.c32 /var/lib/tftpboot
- file: path=/var/lib/tftpboot/pxelinux.cfg state=directory mode=0755
- file: path=/var/lib/tftpboot/netboot/ state=directory mode=0755
- file: path=/var/ftp/pub/iso state=directory mode=0755
- name: Copy vmlinuz
command: cp /mnt/images/pxeboot/vmlinuz /var/lib/tftpboot/netboot/
- name: Copy initrd.img
command: cp /mnt/images/pxeboot/initrd.img /var/lib/tftpboot/netboot/
- name: Restore SEl flags
command: restorecon -R /var/lib/tftpboot
- copy: src=/root/nodes/pxe/files/default.ks.cfg dest=/var/ftp/pub/ks.cfg owner=root group=root mode=0644
- name: Restore SEl flags
command: restorecon /var/ftp/pub/ks.cfg
- copy: src=/root/nodes/pxe/files/default.pxelinux.cfg.default dest=/var/lib/tftpboot/pxelinux.cfg/default owner=root group=root mode=0644
- name: Restore SEl flags
command: restorecon /var/lib/tftpboot/pxelinux.cfg/default
###############################################################
########### copy minimal iso content and start services
###
- name: Copy minimal iso to /var/ftp/pub/iso/
shell: cp -Rv /mnt/* /var/ftp/pub/iso/
- name: Restore SEl flags
command: restorecon -R /var/ftp/pub/
- name: Start vsftpd
command: systemctl start vsftpd
- name: Enable vsftpd
command: systemctl enable vsftpd
- name: Restart vsftpd
command: systemctl restart vsftpd
- name: Restart xinetd
command: systemctl restart xinetd
- name: Set rights on /var/lib/tftpboot
command: chmod 777 /var/lib/tftpboot
###############################################################
########### disable firewall and set selinux to permissive
###
- name: Stop firewalld
command: systemctl stop firewalld
- name: Disable firewalld and copy selinx configuration file
command: systemctl disable firewalld
- copy: src=/root/nodes/pxe/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
- name: Restore SEl flags
command: restorecon /etc/selinux/config
###############################################################
########### check and reboot
###
- name: Make sure vsftpd is running
service: name=vsftpd state=running
- name: Make sure xinetd is running
service: name=xinetd state=running
- name: Restart node
command: /sbin/reboot
async: 0
poll: 0
ignore_errors: true
- name: Waiting for node to come back
local_action: wait_for host={{ inventory_hostname }}
state=started
port=22
delay=1
timeout=300
sudo: false
===== Puppet =====
How to debug YAML files: brute force with http://www.yamllint.com/
==== Errors ====
From : http://makandracards.com/makandra/29365-vague-puppet-error-messages-with-broken-yaml-files
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: (): found character that cannot start any token while scanning for the next token at line 1297 column 3
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: undefined method `empty?' for nil:NilClass at /etc/puppet/environments/production/manifests/nodes.pp:1 on node example.makandra.de
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
Need to check presence of " when using %{ at start of a line :
Bad:
foo: %{::fqdn}
Good:
foo: "%{::fqdn}"