Setting Up Raspberry Pi Slurm Cluster #
-
Flash disk(s) with raspberry pi lite. Insert disk(s) into pi(s) and power them on.
-
Run
sudo raspi-config
- update raspi-config
- change hostname
- setup ssh
- change password for pi user
- set wlan locale
- set timezone
-
setup login-less ssh between nodes.
-
create key on one of the nodes
sudo ssh-keygen # save it somewhere central like in /etc/ssh
-
configure ssh to use the newly created key by editing
/etc/ssh/ssh_config
and addingIdentityFile /etc/ssh/<key name>
-
make the key readable
chmod +r /etc/ssh/<key name>
-
for each other node, copy ssh key to the nodes (e.g. with
scp
) and repeat above steps.
-
-
update
/etc/hosts
with all hosts names. e.g.:10.0.0.1 pimananager 10.0.0.2 picompute1 10.0.0.3 picompute2
-
Setup dhcp server on manager/login node (detailed instructions)
-
Install packages required for this step:
sudo apt install iptables dnsmasq
-
In
/etc/dhcpcd.conf
, add the following:interface eth0 # for the ethernet network static ip_address=10.0.0.1/8 # provide static ip of 10.0.0.1 static domain_name_servers=8.8.8.8,8.8.4.4 #??? nolink # sets up the interface without being attached to ethernet move default config file and create new one with text below: sudo mv /etc/dnsmasq.conf /etc/dnsmasq.conf.old sudo nano /etc/dnsmasq.conf interface=eth0 listen-address=10.0.0.1 dhcp-range=10.0.0.32,10.0.0.128,12h dhcp-host=<mac-address of nodes>,10.0.0.2 dhcp-host=<mac-address of nodes>,10.0.0.3 ... server=8.8.8.8 server=8.8.4.4 bind-interfaces domain-needed bogus-priv expand-hosts
-
add
sleep 10
to very start of/etc/init.d/dnsmasq
. needs to be improved on -
reboot manager/login node
-
from
/etc/sysctl.conf
, uncommentnet.ipv4.ip_forward=1
-
Add iptables rules:
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT
-
ensure iptables rules are presistent between boots:
sudo apt install iptables-persistent
-
reboot switch and check DHCP leases were granted:
cat /var/lib/misc/dnsmasq.leases
-
Internet should be being forwarded to new nodes
-
-
update existing packages on all nodes:
sudo apt update; sudo apt upgrade -y; sudo apt update
-
packages to install:
sudo apt install cmake tcl tcl-dev ntpdate tmux git slurm-wlm openmpi-bin openmpi-common libopenmpi3 \ libopenmpi-dev git tree -y
-
turn off swapfiles (kills the sd card):
sudo dphys-swapfile swapoff sudo dphys-swapfile uninstall sudo update-rc.d dphys-swapfile remove
-
Setup NFS for shared filesystem:
-
on the node with the drive connected:
-
install nfs server package:
sudo apt install nfs-kernel-server -y
-
identify partition with
lsblk
and corresponding UUID withblkid
-
format disk:
sudo mkfs -t ext4 /dev/<partition>
-
make directory drive is to be mounted on:
sudo mkdir <directory>
-
mount drive by
sudo nano /etc/fstab
add the following:UUID=<UUID> <directory to mount to> ext4 defaults 0 <next integer in the list>
-
export the nfs by editing
sudo nano /etc/exports/
and add the following<directory to export> <ip style>/<search format>(rw,sync,no_root_squash,no_subtree_check)
examples:
/clusterfs 192.168.1.0/255.255.255.0(rw,sync,no_root_squash,no_subtree_check) /clusterfs 10.0.0.0/8.8.8.0(rw,sync,no_root_squash,no_subtree_check)
-
mount the drive and then export:
sudo mount -a sudo exportfs -a
-
-
on the nodes not connect to the drive:
-
install nfs common package:
sudo apt install nfs-common -y
-
make directory drive where nfs folder is to be located. For purposes of MPI programs, make the folder name the same on all nodes
-
mount drive by
sudo nano /etc/fstab
and add the following lines:<ip of node with drive>:<directory> <directory> nfs defaults 0 <next index>
-
mount drive:
sudo mount -a
-
-
-
install spack on nfs and setup spack env to load for root
-
Clone the Spack repo
cd /clusterfs git clone -c feature.manyFiles=true https://github.com/spack/spack.git
-
add spack setup script to root .bashrc by sudo nano /root/.bashrc and add:
. /share/spack/setup-env.sh
(would prefer this to be setup system wide)
-