Things I learned from bashing my head against a wall

Avishai Ish-Shalom (@nukemberg)

A long long time ago...

I needed to write an init script for a jvm service.

This is how it looks in Bash

#!/bin/sh
#
# /etc/init.d/tomcat6 -- startup script for the Tomcat 6 servlet engine
#
# Written by Miquel van Smoorenburg <miquels@cistron.nl>.
# Modified for Debian GNU/Linux	by Ian Murdock <imurdock@gnu.ai.mit.edu>.
# Modified for Tomcat by Stefan Gybas <sgybas@debian.org>.
# Modified for Tomcat6 by Thierry Carrez <thierry.carrez@ubuntu.com>.
# Additional improvements by Jason Brittain <jason.brittain@mulesoft.com>.
#
### BEGIN INIT INFO
# Provides:          tomcat6
# Required-Start:    $local_fs $remote_fs $network
# Required-Stop:     $local_fs $remote_fs $network
# Should-Start:      $named
# Should-Stop:       $named
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start Tomcat.
# Description:       Start the Tomcat servlet engine.
### END INIT INFO

set -e

PATH=/bin:/usr/bin:/sbin:/usr/sbin
NAME="$(basename "$0" | sed 's/^[KS][0-9]\{2\}//')"
DESC="Tomcat servlet engine"
DEFAULT=/etc/default/$NAME
JVM_TMP=/tmp/tomcat6-$NAME-tmp

if [ `id -u` -ne 0 ]; then
	echo "You need root privileges to run this script"
	exit 1
fi
 
# Make sure tomcat is started with system locale
if [ -r /etc/default/locale ]; then
	. /etc/default/locale
	export LANG
fi

. /lib/lsb/init-functions

if [ -r /etc/default/rcS ]; then
	. /etc/default/rcS
fi

# The following variables can be overwritten in $DEFAULT

# Run Tomcat 6 as this user ID and group ID
TOMCAT6_USER=tomcat6
TOMCAT6_GROUP=tomcat6

# this is a work-around until there is a suitable runtime replacement 
# for dpkg-architecture for arch:all packages
# this function sets the variable OPENJDKS
find_openjdks()
{
        for jvmdir in /usr/lib/jvm/java-7-openjdk-*
        do
                if [ -d "${jvmdir}" -a "${jvmdir}" != "/usr/lib/jvm/java-7-openjdk-common" ]
                then
                        OPENJDKS=$jvmdir
                fi
        done
        for jvmdir in /usr/lib/jvm/java-6-openjdk-*
        do
                if [ -d "${jvmdir}" -a "${jvmdir}" != "/usr/lib/jvm/java-6-openjdk-common" ]
                then
                        OPENJDKS="${OPENJDKS} ${jvmdir}"
                fi
        done
}

# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not
# defined in $DEFAULT)

OPENJDKS=""
find_openjdks
JDK_DIRS="/usr/lib/jvm/default-java ${OPENJDKS} /usr/lib/jvm/java-6-sun /usr/lib/jvm/java-1.5.0-sun /usr/lib/j2sdk1.5-sun /usr/lib/j2sdk1.5-ibm"

# Look for the right JVM to use
for jdir in $JDK_DIRS; do
    if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
	JAVA_HOME="$jdir"
    fi
done
export JAVA_HOME

# Directory where the Tomcat 6 binary distribution resides
CATALINA_HOME=/usr/share/tomcat6

# Directory for per-instance configuration files and webapps
CATALINA_BASE=/var/lib/$NAME

# Use the Java security manager? (yes/no)
TOMCAT6_SECURITY=no

# Default Java options
# Set java.awt.headless=true if JAVA_OPTS is not set so the
# Xalan XSL transformer can work without X11 display on JDK 1.4+
# It also looks like the default heap size of 64M is not enough for most cases
# so the maximum heap size is set to 128M
if [ -z "$JAVA_OPTS" ]; then
	JAVA_OPTS="-Djava.awt.headless=true -Xmx128M"
fi

# End of variables that can be overwritten in $DEFAULT

# overwrite settings from default file
if [ -f "$DEFAULT" ]; then
	. "$DEFAULT"
fi

if [ ! -f "$CATALINA_HOME/bin/bootstrap.jar" ]; then
	log_failure_msg "$NAME is not installed"
	exit 1
fi

POLICY_CACHE="$CATALINA_BASE/work/catalina.policy"

if [ -z "$CATALINA_TMPDIR" ]; then
	CATALINA_TMPDIR="$JVM_TMP"
fi

# Set the JSP compiler if set in the tomcat6.default file
if [ -n "$JSP_COMPILER" ]; then
	JAVA_OPTS="$JAVA_OPTS -Dbuild.compiler=\"$JSP_COMPILER\""
fi

SECURITY=""
if [ "$TOMCAT6_SECURITY" = "yes" ]; then
	SECURITY="-security"
fi

# Define other required variables
CATALINA_PID="/var/run/$NAME.pid"
CATALINA_SH="$CATALINA_HOME/bin/catalina.sh"

# Look for Java Secure Sockets Extension (JSSE) JARs
if [ -z "${JSSE_HOME}" -a -r "${JAVA_HOME}/jre/lib/jsse.jar" ]; then
    JSSE_HOME="${JAVA_HOME}/jre/"
fi

catalina_sh() {
	# Escape any double quotes in the value of JAVA_OPTS
	JAVA_OPTS="$(echo $JAVA_OPTS | sed 's/\"/\\\"/g')"

	AUTHBIND_COMMAND=""
	if [ "$AUTHBIND" = "yes" -a "$1" = "start" ]; then
		JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true"
		AUTHBIND_COMMAND="/usr/bin/authbind --deep /bin/bash -c "
	fi

	# Define the command to run Tomcat's catalina.sh as a daemon
	# set -a tells sh to export assigned variables to spawned shells.
	TOMCAT_SH="set -a; JAVA_HOME=\"$JAVA_HOME\"; source \"$DEFAULT\"; \
		CATALINA_HOME=\"$CATALINA_HOME\"; \
		CATALINA_BASE=\"$CATALINA_BASE\"; \
		JAVA_OPTS=\"$JAVA_OPTS\"; \
		CATALINA_PID=\"$CATALINA_PID\"; \
		CATALINA_TMPDIR=\"$CATALINA_TMPDIR\"; \
		LANG=\"$LANG\"; JSSE_HOME=\"$JSSE_HOME\"; \
		cd \"$CATALINA_BASE\"; \
		\"$CATALINA_SH\" $@"

	if [ "$AUTHBIND" = "yes" -a "$1" = "start" ]; then
		TOMCAT_SH="'$TOMCAT_SH'"
	fi

	# Run the catalina.sh script as a daemon
	set +e
	touch "$CATALINA_PID" "$CATALINA_BASE"/logs/catalina.out
	chown $TOMCAT6_USER "$CATALINA_PID" "$CATALINA_BASE"/logs/catalina.out
	start-stop-daemon --start -b -u "$TOMCAT6_USER" -g "$TOMCAT6_GROUP" \
		-c "$TOMCAT6_USER" -d "$CATALINA_TMPDIR" -p "$CATALINA_PID" \
		-x /bin/bash -- -c "$AUTHBIND_COMMAND $TOMCAT_SH"
	status="$?"
	set +a -e
	return $status
}

case "$1" in
  start)
	if [ -z "$JAVA_HOME" ]; then
		log_failure_msg "no JDK found - please set JAVA_HOME"
		exit 1
	fi

	if [ ! -d "$CATALINA_BASE/conf" ]; then
		log_failure_msg "invalid CATALINA_BASE: $CATALINA_BASE"
		exit 1
	fi

	log_daemon_msg "Starting $DESC" "$NAME"
	if start-stop-daemon --test --start --pidfile "$CATALINA_PID" \
		--user $TOMCAT6_USER --exec "$JAVA_HOME/bin/java" \
		>/dev/null; then

		# Regenerate POLICY_CACHE file
		umask 022
		echo "// AUTO-GENERATED FILE from /etc/tomcat6/policy.d/" \
			> "$POLICY_CACHE"
		echo ""  >> "$POLICY_CACHE"
		cat $CATALINA_BASE/conf/policy.d/*.policy \
			>> "$POLICY_CACHE"

		# Remove / recreate JVM_TMP directory
		rm -rf "$JVM_TMP"
		mkdir -p "$JVM_TMP" || {
			log_failure_msg "could not create JVM temporary directory"
			exit 1
		}
		chown $TOMCAT6_USER "$JVM_TMP"

		catalina_sh start $SECURITY
		sleep 5
        	if start-stop-daemon --test --start --pidfile "$CATALINA_PID" \
			--user $TOMCAT6_USER --exec "$JAVA_HOME/bin/java" \
			>/dev/null; then
			if [ -f "$CATALINA_PID" ]; then
				rm -f "$CATALINA_PID"
			fi
			log_end_msg 1
		else
			log_end_msg 0
		fi
	else
	        log_progress_msg "(already running)"
		log_end_msg 0
	fi
	;;
  stop)
	log_daemon_msg "Stopping $DESC" "$NAME"

	set +e
	if [ -f "$CATALINA_PID" ]; then 
		start-stop-daemon --stop --pidfile "$CATALINA_PID" \
			--user "$TOMCAT6_USER" \
			--retry=TERM/20/KILL/5 >/dev/null
		if [ $? -eq 1 ]; then
			log_progress_msg "$DESC is not running but pid file exists, cleaning up"
		elif [ $? -eq 3 ]; then
			PID="`cat $CATALINA_PID`"
			log_failure_msg "Failed to stop $NAME (pid $PID)"
			exit 1
		fi
		rm -f "$CATALINA_PID"
		rm -rf "$JVM_TMP"
	else
		log_progress_msg "(not running)"
	fi
	log_end_msg 0
	set -e
	;;
   status)
	set +e
	start-stop-daemon --test --start --pidfile "$CATALINA_PID" \
		--user $TOMCAT6_USER --exec "$JAVA_HOME/bin/java" \
		>/dev/null 2>&1
	if [ "$?" = "0" ]; then

		if [ -f "$CATALINA_PID" ]; then
		    log_success_msg "$DESC is not running, but pid file exists."
			exit 1
		else
		    log_success_msg "$DESC is not running."
			exit 3
		fi
	else
		log_success_msg "$DESC is running with pid `cat $CATALINA_PID`"
	fi
	set -e
        ;;
  restart|force-reload)
	if [ -f "$CATALINA_PID" ]; then
		$0 stop
		sleep 1
	fi
	$0 start
	;;
  try-restart)
        if start-stop-daemon --test --start --pidfile "$CATALINA_PID" \
		--user $TOMCAT6_USER --exec "$JAVA_HOME/bin/java" \
		>/dev/null; then
		$0 start
	fi
        ;;
  *)
	log_success_msg "Usage: $0 {start|stop|restart|try-restart|force-reload|status}"
	exit 1
	;;
esac

exit 0
  • Not portable
  • Horribly long and complex
  • Doesn't actually daemonize
  • Re-invent the wheel

Just switch to runit?

  • If only I could
  • Not an option for packaging
  • Not everything is init.d

So I tried

Interlude

OMG it's everywhere

  • initramfs
  • network management
  • sysv/bsd init
  • about 14% of (my) /usr/bin

 

(systemd to the rescue)

Wait, what's wrong with bash?

EVERYTHING

Just kidding. mostly.

Shell loves errors

  • set -e (!!!!????)
  • nobody checks $?, ${PIPESTATUS[x]}
  • where did my stderr go?
  • rm -rf $ROOT/share (set -u)
  • syntax (spaces, quoting)
  • text parsing

Gimme an API!

Modularity

  • No modules
  • Subshells (yuck)
  • Functions - stdout, stderr, $?
  • Variable scope
  • byref? byval?
upvar() {
    if unset -v "$1"; then           # Unset & validate varname
        if (( $# == 2 )); then
            eval $1=\"\$2\"          # Return single value
        else
            eval $1=\(\"\${@:2}\"\)  # Return array
         fi
    fi
}

What can possible go wrong with eval...

Debugging, anyone?

We have

  • set -e -x
  • extdebug mode
  • RETURN, DEBUG traps

But...

  • Lots of setopts, hidden state
  • Can't inspect properly
  • Hard to deal with processes/pipes

Actually, a properly written bash script solves most of these problems

I have never ever seen one.

Bash is (overly) complex

So, python

#!/usr/bin/python

import sys, os, yaml, pwd, resource

def getuid(user):
    if type(user) is int: return user
    return pwd.getpwnam(user).pw_uid

def close_fds():
    for fd in xrange(resource.getrlimit(resource.RLIMIT_NOFILE)):
        try:
            os.close(fd)
        except OSError:
            pass

def daemonize(conf):
    pid = os.fork()
    if pid == 0:
        sys.stdin.close()
        sys.stdout.close()
        sys.stderr.close()
        os.setsid()
        pid = os.fork()
        if pid == 0:
            if "user" in conf:
                uid = getuid(conf["user"])
                os.setreuid(uid, uid)
            os.chdir(conf.get('chdir', '/'))
            os.umask(0)
            close_fds()
            os.open("/dev/null", os.O_RDWR)
            os.dup2(1, 0)
            os.dup2(2, 0)
            write_pidfile(conf['pidfile'], os.getpid())
            env = conf.get('environment', {})
            command = conf['command']
            if type(command) is list or type(command) is tuple:
                command = command.split()
            args = command[1:]
            command = command[0]
            os.execvpe(command, args, env)
        else:
            sys.exit(0)
    else:
        return pid

def write_pidfile(pidfile, pid):
    with open(pidfile, 'w') as f:
        f.write(pid)

def read_pidfile(pidfile):
    pass

def status(conf):
    pass

def start(conf):
    daemonize(conf['start_command'])

def stop(conf):
    pid = read_pidfile(conf['pidfile'])
    if pid in os.listdir("/proc"):
        cmdline = open("/proc/%d/cmdline" % pid)

def restart(conf):
    if status(): stop()
    start()

def usage():
    print "%s [start|stop|restart|status]" % sys.argv[0]

if __name__ == '__main__':
    svc_name = os.path.basename(sys.argv[0])
    with open("/etc/%s/init.yaml" % svc_name, 'r') as f:
        conf = yaml.load(f.read())

    command = sys.argv()[1]
    if command == "start":
        start(conf)
    elif command == "status":
        status(conf)
    elif command == "stop":
        stop(conf)
    elif command == "restart":
        restart(conf)
    else:
        print >> sys.stderr, "Unknown command."
        usage()
        sys.exit(1)

Sooo much better

  • Cross platform (posix)
  • Generic
  • Simple
  • Debuggable
  • Testable
  • Shorter, more readable
  • Dependencies

So if it's good for init scripts, why not elsewhere?

I started writing EVERYTHING in Python

  • Git hooks
  • Init scripts
  • Maintenance scripts
  • Cron jobs

And many more

What have I learned?

Strong cultural resistance

"Scripts" not viewed as "code"

  • Define "script"
  • Results in fragile scripts
  • Unmaintainable scripts
  • VCS, tests, docs, libraries
  • Bad education to engineers

Scripts should have minimal deps

  • Python included in most distros
  • Simple scripts
  • pex
  • python script.zip

Environment changed

  • A lot of APIs
  • How many curl calls in your scripts?
  • Serialized data (e.g. json)
  • Binary data (from files mostly)
  • Parallelization (it's a thing)

Not a good fit everywhere

  • Embedded
  • Initramfs
  • Existing codebase
  • Team skills

Epilogue

  • I still write bash
  • But only small, ad-hoc stuff
  • Heavy use of functions
  • Started factoring out to libraries
  • Now view bash scripts as "code"

Alternative:

Lua

Things I learned from bashing my head against a wall

By Avishai Ish-Shalom

Things I learned from bashing my head against a wall

  • 2,178