Debugging Monit

Notes on debugging monit start program and stop program commands.

At GoFreeRange I recently spent some time debugging a couple of Monit start program and stop program commands, so I thought I'd share some notes I made in case they're of use to anyone else.

Although we mainly deploy to Ubuntu, I wanted to be able to debug the commands on my local OSX machine. I found I could run Monit in non-daemon mode with verbose logging enabled, but it doesn't show the stdout/stderr generated when it runs your start program or stop program commands. So it seems the best you can do is to create an environment that mimics the Monit environment and try running the commands from there.

According to the documentation, the Monit process runs as superuser and has only a very limited set of environment variables. Notably it only has the following PATH set: /bin:/usr/bin:/sbin:/usr/sbin.

In our automated provisioning and deployment solution, we use a Brightbox Ruby Enterprise Edition package which replaces the existing system Ruby, so the default Monit PATH is enough to find the Ruby binaries.

We use Bundler, so we need to have the bundler gem installed as a system gem in the default Ruby. We use bundle install with the --path option to install project gems into a directory under the Capistrano shared directory. This means that at environment load time the bundler gem finds the project gems based on the BUNDLE_PATH specified in the project's .bundle/config. The advantage of this is that this takes care of adding any PATHs to gem binaries and again we can manage with just the default Monit PATH.

Next I created a monitrc containing the start program and stop program commands and put it in my local project root. Note that we run the commands as a specified user and group (run_as_username & run_as_group) :-

check process myprocess
  with pidfile /Users/myusername/myproject/log/myprocess.pid
  start program = "/usr/bin/env RAILS_ENV=development /bin/sh -c \
    'cd /Users/myusername/myproject/ && script/daemon script/myscript start'" \
    as uid run_as_username and gid run_as_group
  stop program = "/usr/bin/env RAILS_ENV=development /bin/sh -c \
    'cd /Users/myusername/myproject/ && script/daemon script/myscript stop'" \
    as uid run_as_username and gid run_as_group

Since Monit is run as superuser, I found I needed to change the ownership of this monitrc file so the Monit process could read it :-

$ sudo chown root:wheel /Users/myusername/myproject/monitrc

We try to use a standard daemon mechanism for all our background processes and so we put all the pid files in the log directory under the Capistrano shared directory (the project log directory in development). I found I needed to change the ownership for my local log directory to match the run_as_username & run_as_group so that it can write and delete the pidfile :-

$ sudo chown run_as_username:run_as_group /Users/myusername/myproject/log

At this point I was in a position to run the start program command either directly in a shell environment mimicking the Monit environment or by running Monit itself. I achieved the former as follows :-

$ sudo su
$ env -i PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/sh
$ su run_as_username
$ /usr/bin/env RAILS_ENV=development /bin/sh -c \
  'cd /Users/myusername/myproject/ && script/daemon script/myscript start'
Running daemon command 'start' for 'script/myscript' with app name 'myscript'
$ /usr/bin/env RAILS_ENV=development /bin/sh -c \
  'cd /Users/myusername/myproject/ && script/daemon script/myscript stop'
Running daemon command 'stop' for 'script/myscript' with app name 'myscript'

This meant I could see stdout/stderr and diagnose any problems. Note that in my case I had to create a new user and group for run_as_username & run_as_group. Once this was working, I was able to run Monit as follows :-

$ sudo su
$ env -i PATH=/bin:/usr/bin:/sbin:/usr/sbin /bin/sh
$ /usr/local/bin/monit -c /Users/myusername/myproject/monitrc -v

You should see Monit trying to start myprocess and checking for the existence of the myprocess.pid file. If this doesn't work, it will eventually timeout.