Links

Content Skeleton

This Page

Previous topic

DB scripts

Next topic

SQLite3

Value Monitoring

Value Monitoring

Warning

Keeping this operational with ancient python23 is advantageous

Simple monitoring and recording the output of commands that return a single value or a dict string. The results are stored with a timestamp in a sqlite DB

Usage with diskmon section is shown below. The section must correspond to a section name in the config file, which defaults to ~/.env.cnf:

valmon.py -s diskmon rec rep mon 

Usage from cron:

52 * * * * ( valmon.py -s diskmon rec rep mon ) > $CRONLOG_DIR/diskmon.log 2>&1 

Installation

Usage of the valmon.py script and the env python modules that it is based upon requires these to be installed as described at Installing env. Essentially this just requires a symbolic link from python site-packages and a PATH setting to give easy access to scripts from eg /root/env/bin

Separation of concerns

The value monitoring in valmon.py is kept generic, with all the specifics of obtaining the values handled within the command called and choosing constraints to apply to them within the config.

For example the diskmon section uses the disk_usage.py script which returns a dict string:

[blyth@cms01 e]$ disk_usage.py 
{'gb_total': '131.74', 'gb_free': '24.90', 'percent_free': '18.90', 'percent_used': '76.02'}

Other sections like oomon monitors the single integer returned by the below command:

[root@cms02 ~]# grep oom /var/log/messages | wc -l 
0 

This approach allows the value monitoring and persistence framework to be reused for monitoring any quantity which commands or scripts can be written to obtain.

Command arguments

rec
record status into the SQLite DB, but running the configured command and storing results
mon
check if the last entry in the DB table conforms to the expectations, if not set notification email
rep
status report based on the DB entries
msg
show what the notification email and subject would be without sending email, a blank msg indicates that no email would be sent
ls
a simple query against the table for the configured section, for debugging

Schema Versions

The variables available in context which may be constrained correspond to the fields in the table. These have changed through various versions.

0.1
date, val
0.2
date, val, runtime, rc, ret

Configuration

The command to run and the constraints applied to what it returns are obtained from config. This approach is taken to allow most typical changes of varying constraints to be done via configuration only.

Examples:

[oomon]

note = despite notification being enabled this failed to notify me, apparently the C2 OOM issue made the machine incapable of sending email ?
cmd = grep oom /var/log/messages | wc -l  
return = int
constraints = ( val == 0, )
dbpath = ~/.env/oomon.sqlite
tn = oomon

[diskmon]

note = stores the dict returned by the command as a string in the DB without interpretation
cmd = disk_usage.py /data
valmon_version = 0.2 
return = dict
constraints = ( gb_free > 10, )
dbpath = ~/.env/envmon.sqlite
tn = diskmon

[sshagent_mon]

note = require an sshagent process is running by constraining the return code from the pgrep command 
valmon_version = 0.2
email = blyth@hep1.phys.ntu.edu.tw
cmd = pgrep ssh-agent 
return = int 
constraints = ( rc == 0, )
dbpath = ~/.env/sshagent_mon.sqlite
tn = sshagent_mon

[dbsrvmon]

note = currently set to fail via age
chdir = /var/dbbackup/dbsrv/belle7.nuu.edu.tw/channelquality_db_belle7/archive/10000
cmd = digestpath.py 
valmon_version = 0.2 
return = dict
constraints = ( tarball_count >= 34, dna_mismatch == 0, age < 86400 , age < 1000, )
dbpath = ~/.env/dbsrvmon.sqlite
tn = channelquality_db


[envmon]

note = check C2 server from cron on other nodes
hostport = dayabay.phys.ntu.edu.tw
# from N need to get to C2 via nginx reverse proxy on H
#hostport = hfag.phys.ntu.edu.tw:90  
cmd = curl -s --connect-timeout 3 http://%(hostport)s/repos/env/ | grep trunk | wc -l
return = int
constraints = ( val == 1, )
instruction = require a single trunk to be found, verifying that the apache interface to SVN is working 
observations = may 16, 2013 observing variable response times that triggering notifications with a 3s timeout    
dbpath = ~/.env/envmon.sqlite
tn = envmon

[envmon_demo]

note = check C2 server from cron on C, 
cmd = curl -s --connect-timeout 3 http://dayabay.phys.ntu.edu.tw/repos/env/ | grep trunk | wc -l
return = int
valmin = -100
valmax = 100 
constraints = ( val == 1 and val < valmax, val > valmin , val < valmax )
instruction = 
    the simple python `constraints` expression is evaluated within the scope of 
    the section config values (with things that can be coerced to floats so coerced)
    the constraint needs to evaluate to a tuple of one or more bools. 
    To specify a one element tuple a trailing comma is needed, eg "( val > valmin, )"

dbpath = ~/.env/envmon.sqlite
tn = envmon

Source python cron

When forced to use source rather than system python 2.3 on C2 had to setup the cron environment accordingly:

SHELL=/bin/bash
HOME=/home/blyth
ENV_HOME=/home/blyth/env
CRONLOG_DIR=/home/blyth/cronlog
PATH=/home/blyth/env/bin:/data/env/system/python/Python-2.5.1/bin:/usr/bin:/bin
LD_LIBRARY_PATH=/data/env/system/python/Python-2.5.1/lib
42 * * * * * ( valmon.py -s envmon rec rep mon ) > $CRONLOG_DIR/envmon.log 2>&1 

Avoided this complication by yum install python-sqlite2, see simtab for notes on this.