problems in dayabay context

  1. do not want monitoring node to have the ssh keys to everything to be monitored


  1. nagios is venerable, apache + perl based, pain to configure, big community

macports is at 3.2.3


  1. Monit, God, Supervisord, Upstart
    1. focus on starting/restarting daemons and services
  2. Munin, Cacti
    1. focus on visualization of RRDTool data
  3. Collectd
  1. focus on collecting and publishing data


Python based flexibility, more bare-bones : more suitable to simple monitoring

  1. google:fabric cuisine watchdog

  2. fabric : python based ssh access to remote nodes, low level

    1. cuisine : simple function extensions using fabric primitives to add file/dir/text/user/group/sudo ops

    2. daemonwatch : (formerly watchdog)

      3. service is a collection of rules, with a frequency associated
      4. rules can succeed or fail and have output
      5. actions are bound to rule, triggered on success or fail
  3. i dont see the integration between daemonwatch and the others, daemonwatch looks to be entirely localnode

#!/usr/bin/env python
from watchdog import *

send_email = Email( "name@whereever", "Subj", "confiug....")
send_xmpp =  XMPP( "name@jabber", "Subj", "confiug....")

Monitor(    # the "main"
  Service(      # Service monitors the rules
      monitor= (
           HTTP(     # HTTP rule allows to test url
              #    Print("Failed..."),send_email,send_xmpp,
              #  ]

              fail = [
                  Incident( errors=5, during=Time.s(10), actions=[send_email,send_xmpp])

 # also Incident (smart action) to check if something happening repeatedly within time windows