DRBD Device Monitoring

Install

On all Ganeti nodes

just install check_drbd (http://www.monitoringexchange.org/attachment/download/Check-Plugins/Operating-Systems/Linux/check_drbd/check_drbd) somewhere in the path.

By default, this script will return CRITICAL if a device is unconfigured, and UNKNOWN if a device does not exists. These situations can happen with Ganeti when:

  • A VM is down
  • A VM is down and a node has been rebooted (then the drbd device disappears).

You may want to patch check_drbd like this to avoid “false” alerts:

--- check_drbd  2010-01-26 09:10:16.000000000 +0100
+++ /usr/local/bin/check_drbd   2010-01-26 09:06:23.000000000 +0100
@@ -38,7 +38,7 @@
              'WFConnection' => { 'value' => 'CRITICAL', 'type' => 'cs' },
               'WFReportParams' => { 'value' => 'CRITICAL', 'type' => 'cs' },
              'Connected' => { 'value' => 'OK', 'type' => 'cs' },
-             'Unconfigured' => { 'value' => 'CRITICAL', 'type' => 'cs' },
+             'Unconfigured' => { 'value' => 'OK', 'type' => 'cs' },
              # DRBD 0.6
              'SyncingAll' => { 'value' => 'WARNING', 'type' => 'cs' },
               'SyncingQuick' => { 'value' => 'WARNING', 'type' => 'cs' },
@@ -261,7 +261,7 @@
        }
        foreach my $device (@devices) {
                if (!(defined($cs{$device}))) {
-                       &myexit('UNKNOWN',"Could not find device $device");
+                       &myexit('OK',"Could not find device $device");
                }
                $check{$device} = 1;
        }

On Ganeti master

Put this script somewhere in the path (/usr/local/bin/drbd_gen_nagios.sh for example):

#!/bin/bash 
# 
# Copyright (C) 2009 Maxence Dunnewind <maxence@dunnewind.net> 
# 
# This program is free software: you can redistribute it and/or modify 
# it under the terms of the GNU General Public License as published by 
# the Free Software Foundation, either version 3 of the License, or 
# (at your option) any later version. 
# 
# This program is distributed in the hope that it will be useful, 
# but WITHOUT ANY WARRANTY; without even the implied warranty of 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
# GNU General Public License for more details. 
# 
# You should have received a copy of the GNU General Public License 
# along with this program.  If not, see <http://www.gnu.org/licenses/>. 

gnt-instance list  --no-header --separator=: -o name|while read vm;do 
        TMP_FILE=$(mktemp) 
        gnt-instance info -s $vm > $TMP_FILE 
        NODE_A=$(grep "nodeA" $TMP_FILE|grep -o "[^ ]\+,"|head -1|sed 
's/,//') 
        NODE_B=$(grep "nodeB" $TMP_FILE|grep -o "[^ ]\+,"|head -1|sed 
's/,//') 
        DEV_A=$(grep "nodeA" $TMP_FILE|cut -d "=" -f 2) 
        DEV_B=$(grep "nodeB" $TMP_FILE|cut -d "=" -f 2) 
        rm -rf $TMP_FILE 
        for disk in $DEV_A;do 
                echo "define service {" 
                echo " host_name                $NODE_A" 
                echo " service_description      DRBD instance $vm / 
device n° $disk" 
                echo " check_command            check_drbd!$disk" 
                echo " use                      generic-service" 
                echo "}" 
                echo "" 
        done 
        for disk in $DEV_B;do 
                echo "define service {" 
                echo " host_name                $NODE_B" 
                echo " service_description      DRBD instance $vm / 
device n° $disk" 
                echo " check_command            check_drbd!$disk" 
                echo " use                      generic-service" 
                echo "}" 
                echo "" 
        done 
done 

Important : This script only define drbd checks, so you already need to have the hosts configured in your nagios.

On nagios

Define check_drbd, which will use check_by_ssh to call the check_drbd script on the nodes.

define command{ 
        command_name    check_drbd 
        command_line    /usr/lib/nagios/plugins/check_by_ssh -H $HOSTADDRESS$ -l root -C "/usr/local/bin/check_drbd -d $ARG1$" 
} 

for example in /etc/nagiosX/conf.d/command.cfg.

Also, be sure nagios user can connect using ssh without password. If you don't have defined that, you need to:

# su - nagios
# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/var/run/nagios3/.ssh/id_rsa): /etc/nagios3/id_rsa
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 

Don't forget to replace the file. If you keep /var/run, your key will be deleted after next reboot. Also, keep the passphrase empty. Then, copy the id_rsa.pub under /root/.ssh/authorized_keys on all your hosts. You can also add a “from” field and a “command” one to limit access:

from="1.1.1.1",command="/usr/local/bin/check_drbd" ssh-rsa AAAAB3NzaC1yc2EAAAABIwA...

Use

To generate the config, after a new instance has been added / removed, or after a replace-disks, run:

drbd_gen_nagios.sh > drbd.cfg 

On the Ganeti master, then just copy the generated file into your Nagios conf.d directory, and check the config and restart Nagios.