Friday, October 26, 2007

Keeping Mongrel Alive when it Keeps Crashing

For some time, I have had a problem with a rails deployment of mine. It is running apache in front of several mongrel instances. Problem is, the mongrel processes simply disappear, without a trace. Their pid files are still lying around. The log shows nothing. There appears to be no system in it. There can be weeks between and there can be days between. It can happen in the night and it can happen at daytime. Sucks.

Today, I sat down and wrote a simple bash shell script, which performs some basic surveillance and a restart, if mongrel processes disappear. Would be more nice for them not to crash, I know.

If you have the same problem, I here give you my quick and simple script. It even notifies by email when a restart has been performed. Of course, comments on improvements are more than welcome.
#!/bin/bash

SLEEP_INTERVAL=60
PID_DIR=/var/opt/rails/yourapp/log
CLUSTER_CTL_CMD=/etc/init.d/mongrel_cluster_ctl
NOTIFY_EMAIL=youremail@goes.here

if [ $USER != "root" ]; then
echo "I need to be run as root. From here on, I will quickly grow cold."
exit 1
fi

# Act a bit like a daemon, by ignoring HUP and
# being nice and releasing yourself to / in case someone needs to
# unmount where you were started.
#
trap 1
cd /

function restart_mongrels() {
echo "stopping..."
$CLUSTER_CTL_CMD stop
echo "stopped, will sleep 10 secs"
sleep 10
echo "done sleeping, will remove any old pid files and start mongrels"
rm -f $PID_DIR/mongrel.*.pid
$CLUSTER_CTL_CMD start
echo "issued start, will wait 10 secs more, to let it fire up properly"
sleep 10
mail -s "Mongrels restarted" $NOTIFY_EMAIL <<ENDMAIL
At `date` your mongrel instanses were restarted.
Have a blast digging around in the logfiles trying to find out why...
.
ENDMAIL
}

while (true); do
# start by checking, that there are ANY pidfiles around
ls $PID_DIR/mongrel.*.pid > /dev/null 2>&1
ANY_PID_FILES=$?
if [ $ANY_PID_FILES -ne 0 ]; then
echo "`date`: oops, found no pid files at all in $PID_DIR, going for a restart"
restart_mongrels
fi

# if there are pidfiles, check that their processes are running
for pidfile in $PID_DIR/mongrel.*.pid ; do
# check that pidfile is still here (as in, we remove them all deeper in this for-loop, if mongrels are down)
if [ ! -f $pidfile ]; then
echo "skipping missing pid file: $pidfile (can happen after a restart)"
continue
fi

PID=`cat $pidfile`
ps -p$PID > /dev/null
PID_CHECK_RESULT=$?
if [ $PID_CHECK_RESULT -ne 0 ]; then
echo "`date`: Oops, did not find process for pid $PID in pid file $pidfile, will restart mongrels"
restart_mongrels
fi
done
echo "`date`: Checked all mongrel instances are up an running, will sleep for $SLEEP_INTERVAL seconds and check again...Zzzz"
sleep $SLEEP_INTERVAL
done

6 comments:

Shadowfiend said...

Wouldn't it make more sense to make this a script without a loop and invoke it as a cron job rather than running it in a sleeping loop?

Tech Per said...

Yes, that makes good sense and would be a nice simplifaction of the script. Thanks for the idea.

Tech Per said...

BTW: Just checked out rubyshell, as I noticed it on your blog. Nice idea. I have not tried it yet though, as I am currently hurting on windoze.

Shadowfiend said...

Well, it's not quite there yet :-P The latest release, 0.5, actually shows its alphaness, as several annoying bugs snuck in. I'm probably going to make an intermediate 0.5.5 release soon that'll just fix those bugs and a couple of others, as the creation of new features is a bit behind thanks to a busy schedule.

I wanted to support Windows, actually, but it quickly became apparent that Windows's way of providing access to certain things is just absolutely horrid (a console that blocks completely while you're typing was a problem, since initially pipes were spun off into a separate thread). Now that we switched to readline and a few other changes were made, that might no longer be the case, though... I'll have to have a look-see.

Tech Per said...

Sure would be nice with another option on a better shell for windows.

The super-lousy cmd.exe just barely qualifies to actually being called a shell.

I thought Monad, now named Powershell or some crap, would have been nice. It has interesting ideas, and I like the binding to .net for extensibility. But, they delivered it in a crappy black box, that looks and acts so much like cmd.exe.

Arun Gupta said...

Another recommended option is to deploy Rails apps on GlassFish - either as Web ARchive (WAR) or using native GlassFish v3 gem. Read more about the gem at:

http://blogs.sun.com/arungupta/entry/announcing_glassfish_gem_for_rails and
http://headius.blogspot.com/2007/09/end-is-near-for-mongrel.html

Or watch a screencast that shows how WAR can be deployed on GlassFish at:

http://blogs.sun.com/arungupta/entry/screencast_web_9_jruby_on