User Tools

Site Tools


Site Tools

Cluster Deployment Management Tools

There are many deployment and management tools. Here is my recommended list :

  • Big HPC cluster : Salt Stack
  • Small HPC cluster : Ansible
  • Very heterogeneous managed stuff : Puppet

My hearth goes to Salt and Ansible. Salt is really flexible and has an important thing for HPC or Cloud : loops. This is a personal choice.

Salt Stack

Basic commands

Ensure all minion are up to date :

salt '*' state.highstate -v

Replace '*' by the hostname or the ip of the minion you want to update only the state.

To get all the process done on a specific minion, login on it with ssh, then run :

 salt-call -l debug state.highstate 

This is a good way to debug and see what is going on.

Tip

Use jinja in top files, and share a file containing all servers ip for both pillar and state top :

Pillar top file (/srv/pillar/top.sls):

{% import_yaml 'servers.sls' as vars %}
base:
  '{{vars.servers.dhcp.ip}}':
    - pkgs
    - services
    - dhcp-server

State top file (/srv/salt/top.sls):

{% import_yaml 'servers.sls' as vars %}
base:
  '{{vars.servers.dhcp.ip}}':
    - repository.client
    - dhcp.server

Then edit your server.sls file in /srv/pillar/server.sls :

servers:
    repository:
        name: repo0
        ip: 172.16.0.12
    dhcp:
        name: dhcp0
        ip: 172.16.0.12

To finish, add a link in linux to allow state top to read also this file :

ln -s /srv/pillar/servers.sls /srv/salt/servers.sls

Using this, all Salt is dynamic !! :-D

Jinja2

Basic jinja2 syntax :

Get a value/string from pillar (here pillar network, value of subnet) :

 {{ salt['pillar.get']('network:subnet') }} 

Loop on a pillar last level list :

 {% for rangeip in salt['pillar.get']('network:dhcp:range') %}
 range {{rangeip}};
 {% endfor %}

Loop on a pillar non last level list :

 {% for host, args in salt['pillar.get']('nodes', {}).items() %}
 host {{ host }} {
 hardware ethernet {{ args.hwaddr }};
 fixed-address {{ args.ip }};
 }

 {% endfor %}

Split a string into a list using a specific character as separator (useful for DNS configuration files !):

 {% set list1 = salt['pillar.get']('network:subnet').split('.') %}
 {% if salt['pillar.get']('network:netmask') == '255.255.255.0' %}
 option broadcast-address {{ list1[0] }}.{{ list1[1] }}.{{ list1[2] }}.255;
 {% elif salt['pillar.get']('network:netmask') == '255.255.0.0' %}
 option broadcast-address {{ list1[0] }}.{{ list1[1] }}.225.255;
 {% elif salt['pillar.get']('network:netmask') == '255.0.0.0' %}
 option broadcast-address {{ list1[0] }}.255.225.255;
 {% else %}
 option broadcast-address CANNOT UNDERSTAND NETMASK !!! See dhcp/dhcpd.conf.jinja;
 {% endif %}

More : http://jinja.pocoo.org/docs/dev/templates/

Ansible

Install

git clone git://github.com/ansible/ansible.git --recursive
tar cvzf ansible.tar.gz ansible
mkdir /home/yourlogin/pip
pip install --ignore-installed --target=/home/yourlogin/pip --install-option="--install-purelib=/home/yourlogin/pip" paramiko PyYAML Jinja2 httplib2 six
tar cvzf pip.tar.gz /home/yourlogin/pip

Then on master node :

cd /root
tar xvzf ansible.tar.gz
tar xvzf pip.tar.gz

Add nodes into ansible nodes file (/root/ansible_hosts) :

[repo]
10.0.0.2

[dhcp]
10.0.0.3

[pxe]
10.0.0.4

Now launch ansible environnement (must be done each time you will use ansible) :

source /root/ansible/hacking/env-setup
export PYTHONPATH=/root/pip/:$PYTHONPATH

Set node list file :

export ANSIBLE_INVENTORY=/root/ansible_hosts

And check everything is ok (for this example, only 2 of the 3 other nodes are online, 4 is offline to display the error) :

ansible all -m ping 
10.0.0.2 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
10.0.0.3 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
10.0.0.4 | FAILED! => {
    "failed": true, 
    "msg": "ERROR! SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue"
}

Playbooks example

To execute a playbook on nodes :

ansible-playbook /root/nodes/repo/playbooks/default.pb

If something goes wrong, you can see debug using -v (-vv -vvv -vvvv etc for level of debugs). Note also that ansible will stop when an error is detected. You can ignore errors using ignore_errors :

    - name: Say hello
      command: echo "hello"
      ignore_errors: true

Repository

---
- hosts: repo
  remote_user: root
  tasks:

    ###############################################################
    ########### vsftpd server installation
    ###

    - name: Installing vsftpd rpm
      command: chdir=/mnt/Packages/ rpm -ivh vsftpd-3.0.2-9.el7.x86_64.rpm 
    - name: Enable vsftpd on start
      command: systemctl enable vsftpd
    - name: Start vsftpd on start
      command: systemctl start vsftpd
    - name: Installing libxml2-python rpm
      command: chdir=/mnt/Packages/ rpm -ivh libxml2-python-2.9.1-5.el7_0.1.x86_64.rpm
    - name: Installing deltarpm rpm
      command: chdir=/mnt/Packages/ rpm -ivh deltarpm-3.6-3.el7.x86_64.rpm 
    - name: Installing python-deltarpm rpm
      command: chdir=/mnt/Packages/ rpm -ivh python-deltarpm-3.6-3.el7.x86_64.rpm
    - name: Installing createrepo rpm
      command: chdir=/mnt/Packages/ rpm -ivh createrepo-0.9.9-23.el7.noarch.rpm

    ###############################################################
    ########### copy rpm and create repository
    ###

    - file: path=/var/ftp/pub/localrepo state=directory mode=0755
    - name: Copy packages from DVD/iso to repository and repository configuration file
      shell: cp -ar /mnt/Packages/*.* /var/ftp/pub/localrepo/
    - copy: src=/root/nodes/repo/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
    - name: Create repository
      command: createrepo -v /var/ftp/pub/localrepo/
    - name: Restore SEl flags
      command: restorecon -R /var/ftp

    ###############################################################
    ########### remove online repositories and update yum
    ###

    - file: path=/etc/yum.repos.d.old state=directory mode=0755
    - name: Move online repo
      shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
    - name: Restore SEl flags
      command: restorecon -R /etc/yum.repos.d
    - name: Yum list repo
      command: yum repolist
    - name: Yum clean
      command: yum clean all
    - name: Yum update
      command: yum update

    ###############################################################
    ########### disable firewall and set selinux to permissive
    ###

    - name: Stop firewalld
      command: systemctl stop firewalld
    - name: Disable firewalld and copy selinx configuration file
      command: systemctl disable firewalld
    - copy: src=/root/nodes/repo/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
    - name: Restore SEl flags
      command: restorecon /etc/selinux/config

    ###############################################################
    ########### check and reboot
    ###

    - name: Make sure vsftpd is running
      service: name=vsftpd state=running

    - name: Restart node
      command: /sbin/reboot
      async: 0
      poll: 0
      ignore_errors: true
    - name: Waiting for node to come back
      local_action: wait_for host={{ inventory_hostname }}
                state=started
                port=22
                delay=1
                timeout=300
      sudo: false

dhcp

---
- hosts: dhcp
  remote_user: root
  tasks:

    ###############################################################
    ########### add local repository, remove online repositories and update yum
    ###

    - copy: src=/root/nodes/dhcp/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
    - file: path=/etc/yum.repos.d.old state=directory mode=0755
    - name: Move online repo
      shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
    - name: Restore SEl flags
      command: restorecon -R /etc/yum.repos.d
    - name: Yum list repo
      command: yum repolist
    - name: Yum clean
      command: yum clean all
    - name: Yum update
      command: yum update

    ###############################################################
    ########### install dhcpd server
    ###

    - name: Install dhcpd server
      yum: name=dhcp state=latest
    - copy: src=/root/nodes/dhcp/files/default.dhcpd.conf dest=/etc/dhcp/dhcpd.conf owner=root group=root mode=0644
    - name: Restore SEl flags
      command: restorecon /etc/dhcp/dhcpd.conf
    - name: Enable dhcpd on start
      command: systemctl enable dhcpd.service
    - name: Start dhcpd on start
      command: systemctl start dhcpd.service

    ###############################################################
    ########### disable firewall and set selinux to permissive
    ###

    - name: Stop firewalld
      command: systemctl stop firewalld
    - name: Disable firewalld and copy selinx configuration file
      command: systemctl disable firewalld
    - copy: src=/root/nodes/dhcp/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
    - name: Restore SEl flags
      command: restorecon /etc/selinux/confi

    ###############################################################
    ########### check and reboot
    ###

    - name: Make sure dhcpd is running
      service: name=dhcpd state=running

    - name: Restart node
      command: /sbin/reboot
      async: 0
      poll: 0
      ignore_errors: true
    - name: Waiting for node to come back
      local_action: wait_for host={{ inventory_hostname }}
                state=started
                port=22
                delay=1
                timeout=300
      sudo: false

PXE

---
- hosts: pxe
  remote_user: root
  tasks:

    ###############################################################
    ########### add local repository, remove online repositories and update yum
    ###

    - copy: src=/root/nodes/pxe/files/default.localrepo.repo dest=/etc/yum.repos.d/localrepo.repo owner=root group=root mode=0640
    - file: path=/etc/yum.repos.d.old state=directory mode=0755
    - name: Move online repo
      shell: mv /etc/yum.repos.d/CentOS-* /etc/yum.repos.d.old
    - name: Restore SEl flags
      command: restorecon -R /etc/yum.repos.d
    - name: Yum list repo
      command: yum repolist
    - name: Yum clean
      command: yum clean all
    - name: Yum update
      command: yum update

    ###############################################################
    ########### install tftp, xinetd and vsftpd
    ###

    - name: Install tftp
      yum: name=tftp state=latest
    - name: Install tftp-server
      yum: name=tftp-server state=latest
    - name: Install xinetd
      yum: name=xinetd state=latest

    - copy: src=/root/nodes/pxe/files/default.tftp dest=/etc/xinetd.d/tftp owner=root group=root mode=0640
    - name: Restore SEl flags
      command: restorecon /etc/xinetd.d/tftp
    - name: Start xinetd
      command: systemctl start xinetd
    - name: Enable xinetd
      command: systemctl enable xinetd

    - name: Install syslinux
      yum: name=syslinux state=latest
    - name: Install wget
      yum: name=wget state=latest
    - name: Install vsftpd
      yum: name=vsftpd state=latest

    ###############################################################
    ########### copy files for pxe boot
    ###

    - name: Copy pxelinux.0
      command: cp -v /usr/share/syslinux/pxelinux.0 /var/lib/tftpboot
    - name: Copy menu.c32
      command: cp -v /usr/share/syslinux/menu.c32 /var/lib/tftpboot
    - name: Copy memdisk
      command: cp -v /usr/share/syslinux/memdisk /var/lib/tftpboot
    - name: Copy mboot.c32
      command: cp -v /usr/share/syslinux/mboot.c32 /var/lib/tftpboot
    - name: Copy chain.c32
      command: cp -v /usr/share/syslinux/chain.c32 /var/lib/tftpboot

    - file: path=/var/lib/tftpboot/pxelinux.cfg state=directory mode=0755

    - file: path=/var/lib/tftpboot/netboot/ state=directory mode=0755

    - file: path=/var/ftp/pub/iso state=directory mode=0755
    - name: Copy vmlinuz
      command: cp /mnt/images/pxeboot/vmlinuz /var/lib/tftpboot/netboot/
    - name: Copy initrd.img
      command: cp /mnt/images/pxeboot/initrd.img /var/lib/tftpboot/netboot/
    - name: Restore SEl flags
      command: restorecon -R /var/lib/tftpboot

    - copy: src=/root/nodes/pxe/files/default.ks.cfg dest=/var/ftp/pub/ks.cfg owner=root group=root mode=0644
    - name: Restore SEl flags
      command: restorecon /var/ftp/pub/ks.cfg

    - copy: src=/root/nodes/pxe/files/default.pxelinux.cfg.default dest=/var/lib/tftpboot/pxelinux.cfg/default owner=root group=root mode=0644
    - name: Restore SEl flags
      command: restorecon /var/lib/tftpboot/pxelinux.cfg/default

    ###############################################################
    ########### copy minimal iso content and start services
    ###

    - name: Copy minimal iso to /var/ftp/pub/iso/
      shell: cp -Rv /mnt/* /var/ftp/pub/iso/
    - name: Restore SEl flags
      command: restorecon -R /var/ftp/pub/

    - name: Start vsftpd
      command: systemctl start vsftpd
    - name: Enable vsftpd
      command: systemctl enable vsftpd
    - name: Restart vsftpd

      command: systemctl restart vsftpd
    - name: Restart xinetd
      command: systemctl restart xinetd
    - name: Set rights on /var/lib/tftpboot 
      command: chmod 777 /var/lib/tftpboot 

    ###############################################################
    ########### disable firewall and set selinux to permissive
    ###

    - name: Stop firewalld
      command: systemctl stop firewalld
    - name: Disable firewalld and copy selinx configuration file
      command: systemctl disable firewalld
    - copy: src=/root/nodes/pxe/files/default.selinux dest=/etc/selinux/config owner=root group=root mode=0640
    - name: Restore SEl flags
      command: restorecon /etc/selinux/config

    ###############################################################
    ########### check and reboot
    ###

    - name: Make sure vsftpd is running
      service: name=vsftpd state=running
    - name: Make sure xinetd is running
      service: name=xinetd state=running

    - name: Restart node
      command: /sbin/reboot
      async: 0
      poll: 0
      ignore_errors: true
    - name: Waiting for node to come back
      local_action: wait_for host={{ inventory_hostname }}
                state=started
                port=22
                delay=1
                timeout=300
      sudo: false

Puppet

How to debug YAML files: brute force with http://www.yamllint.com/

Errors

From : http://makandracards.com/makandra/29365-vague-puppet-error-messages-with-broken-yaml-files

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: (<unknown>): found character that cannot start any token while scanning for the next token at line 1297 column 3
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: undefined method `empty?' for nil:NilClass at /etc/puppet/environments/production/manifests/nodes.pp:1 on node example.makandra.de
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

Need to check presence of “ when using %{ at start of a line : Bad:

foo: %{::fqdn}

Good:

foo: "%{::fqdn}"