Skip to main content

Introduction to Shell Script.

Shell Script is often used on Linux and Unix machines to automate tasks. For stand-alone or one-off operations, it is not usually necessary to build production-quality code. Who cares if it fails if the file system is full or temporary files are left lying around? The problem is that this devil-may-care attitude pervades a lot of shell script designs and these scripts can sometimes end up either in production code or delivering production code. The latter case is a common example of a shell script in production – deployment tools often call out to scripts to perform some operation during the deployment process.

Indeed, tools such as DeployHub can deliver the script to the target server and execute it automatically. Now, the script becomes a critical part of the production infrastructure. It is important to make sure that security is not compromised, that the script runs fast and consumes as few resources as possible, and that it can detect and report failures properly.

In this post, I’ll cover some common problems I’ve seen over the years and explain how to write scripts that run quickly and cleanly, that can detect and report errors properly and that leave as small a footprint as possible.

So this brings us to the first topic which is:

Use the Right Shell

Most UNIX and Linux distros come with a multitude of shells that you can pick from. At the very least you can expect to find the Korn Shell, the ‘C’ shell and the original Bourne shell. Linux distributions have Korn, ‘C’, ash (A Shell) and bash (the Bourne Again Shell). Mac has ksh, csh, bash and sh.

For a lot of shell scripts Korn and Bash shells are more or less the de-facto standard these days. AIX, for example, has /bin/sh linked to its Korn shell (if you really want Bourne shell you have to use /bin/bsh). The public domain Korn shell (as supplied with Linux) is more or less identical to commercial versions with a few minor differences. Best of all is that Korn shell is mostly compatible with scripts written for the original Bourne shell.

For shell script on Linux and Mac, the Bash shell is close enough to the Korn shell to make no difference. There are a few implementation differences to do with pipelines and sub-shells but you can generally work around those with some minor refactoring.

If you are writing production code then Korn shell is almost certainly the shell to use. For simple scripts, there is nothing to separate Bourne, Korn, Bash or any similar shell. However, the ‘C’ shell is a total anachronism and should be avoided at all costs.

There are many reasons why ‘C’ shell is not suitable for the writing of a production shell script – it’s buggy, it doesn’t work in an intuitive way and it does not share the same syntax as all the other shells.

If you want to read the ins and outs of C-shell’s shortcomings then Tom Christiansen wrote a (now very famous) article entitled ‘Use of C Shell Considered Harmful’. You can find this on a number of places on the web but here’s a link that should work:

https://www-uxsup.csx.cam.ac.uk/misc/csh.html

The rest of this post contains example code. None of it will work with C shell. Be warned.

Use modern syntax

One powerful technique in writing a shell script is to invoke a command, capture its standard out and substitute it in the command line. In early shells this was done by surrounding the command to be executed with grave accents (`). So, for example, to set a variable to the hostname of the local machine you would use:

Hostname=`hostname`

Korn, bash and a few others have replaced grave accents with a new syntax that identifies the command by surrounding it with
$(…)

This syntax is infinitely preferable. Quite aside from the difficulties of finding where the grave accent actually is on a modern keyboard it’s also a lot clearer. You can also nest commands which the shell finds incredibly difficult with grave accents.

For example, suppose our shell script wants to check if our hostname entry is correct:

res=`ping `hostname``

In this case the shell cannot determine what it’s supposed to do. Are you trying to set res to the result of a command ping (with no parameters) followed by the word “hostname” followed by the result of a null command?

However, recode the same command with $(…) syntax and the meaning becomes clear:

res=$(ping $(hostname))

In all the examples that follow we will use this syntax.

Comment everything.

Well, now, that’s obvious, isn’t it? Well, yes, but you’d be surprised how few people bother to comment their shell script to the correct degree.

UNIX is extremely powerful but its critics always level the same accusation: that its very power makes it difficult to understand. Look at the following shell function. Its job is to return the mount point for a specified file (whether that file exists or not).

function FindMountPoint
{
        mps=$(df -k | awk 'NR>1 {print "|"$NF"|"}')
        t=$1            # File under test
        while [[ -n $t ]]
        do
                echo "$mps" | grep -q "|$t|"
                [[ $? -eq 0 ]] && break;
                t=${t%/*}
                [[ -z $t ]] && t="/"
        done
        echo $t
}

Okay you can probably figure out how this works. Now compare the same code with suitable comments:
function FindMountPoint
{
        # Returns (to standard output) the mount point
        # for the specified file.
        #
        # Step 1 – derive a list of mount points.
        # ---------------------------------------
        #
        # mps will be set to [/mp1] [/mp2] etc..
        #
        # a) do a df to get the mount points
        # b) ignore the heading line.
        # c) print the mount point surrounded by [..]
        #
        #       a           b              c
        #     -----        ----        ----------
        mps=$(df -k | awk 'NR>1 {print "|"$NF"|"}')
        #
        # Step 2 – find the mount point
        # -----------------------------
        # Reduce the filename a directory at a time
        # until the mount point is found in the “mps”
        # variable.
        #
        # For example if the filename is /export/home/bin/test
        #
        # t=/export/home/bin/test
        # t=/export/home/bin
        # t=/export/home                found in $mps
        #
        t=$1            # File under test
        #
        # Loop until found
        #
        while [[ -n $t ]]
        do
                echo "$mps" | grep -q "|$t|"
                [[ $? -eq 0 ]] && break;        # Found!
                t=${t%/*}
                [[ -z $t ]] && t="/"
        done
        echo $t # Send result to standard out.
}

Much easier to understand isn’t it? Don’t worry about performance through the use of comments – although they will slow down the execution of the script they will not have that much of an impact. Of much more significance is the number of processes that the script invokes and we will cover this later.

Use of Temporary Files

Let me make my position quite clear: temporary files are a force for evil in the world of shell scripts and should be avoided if at all possible. If you want justification for this stance just considers the following:

  • If the script runs as root (from a deployment process for example) then the use of temporary files can be a major security risk.
  • Temporary files need to be removed whenever the script exits. That means you have to trap user exits such as CTRL-C. But what if the script is killed with SIGKILL?
  • Temporary files can fill the filesystem. Even worse, if they are created by a redirect from an application’s standard output then the application may not realize that the output has failed due to the filesystem being full.

Let’s look at these problems in detail:

Problem 1: What happens to a temporary file if the shell script exits unexpectedly?

This problem can happen if the shell script is interactive and the user halts it with CTRL-C. If this is not trapped then the script will exit and the temporary file will be left lying around.

The only solution to this is to write a “trap” function to capture user-generated signals such as CTRL-C and to remove the temporary files. Here’s an example:

function Tidyup
{
    rm –f $tempfile
    exit 1
}

trap Tidyup 1 2 3 15

However, if the script exits for any other reason then the tidy up script is never invoked. A kill –9 on the script will kill it stone dead and leave the temporary files in existence.

Problem 2: What happens if the filesystem fills up?

Okay, so you’re busy writing your temporary file when the filesystem hits 100% full. What happens now? Well, your script will fail won’t it? Well maybe but perhaps not in the way you think. Some third-party applications do not check for error returns when they write to standard output.

Be honest, if you’ve written ‘C’ code do you check the return code from “printf” (or write or whatever) to see if it has been written correctly? Probably not – what can go wrong writing to the screen? Not a lot presumably, but if you’ve redirected standard out then output is not going to the screen – it’s going to a file. You’d be amazed how many commercial applications fall victim to this.

The net result is that a command such as third_party_command > /tmp/tempfile may not return a fail condition even if /tmp is full. You then have no way of knowing that the command failed but /tmp/tempfile does not contain the full output from third_party_command. What happens next depends on what your script does but it’s likely to be sub-optimal. We will discuss some workarounds for this later.

Problem 3: Beware of redirection attacks.

Most temporary files are created in /tmp and given a filename containing a $$ sequence. The shell replaces the $$ sequence with the current process id thus creating a unique filename.

However, if the script is run as root and you create the file implicitly like this:

echo “first line” > /tmp/mytempfile$$

then it is possible that the script could be used inadvertently to attack the UNIX system itself. The reason for this is that most shells – on parsing the redirection operator – create the output file without using the E_EXCL flag. If this flag were specified it would cause open to return an error if the file already existed. Without this flag, if the file does exist and it’s a link then the link will be followed to the existing file. Therein lies the danger.

For example, suppose some unscrupulous individual created a symbolic link in /tmp like this:

ln –s /some_vital_file /tmp/mytempfile12345

Now, what happens when the script runs as PID 12345? The script writes to a file that it thinks it’s going to create. However, what it actually does is overwrite /some_vital_file. Imagine if this file were the kernel boot image – doesn’t bear thinking about it?

By creating lots of symbolic entries in /tmp an unscrupulous user could cause a badly written root script like this to crash the entire machine. The only solution would be to write code to ensure that the file does not exist before a redirect is made to it.

Avoiding temporary files

Avoiding temporary files can be difficult but is not necessarily impossible. A lot of UNIX commands will read standard input (or send output to standard output) as well as to named files. Using pipes to connect such commands together will normally give the desired result without recourse to temporary files.

What if you have two separate commands from which you want to merge and process the output? Let’s assume that we’re going to build some form of control file for the program process_file. The control file is built from some header lines, the actual body of the control file (which our script will generate), and some tail lines to finish the whole thing off.

A common way of building this sort of file is this:

echo “some header info” >  /tmp/tempfile.$$
process_body            >> /tmp/tempfile.$$
echo “some tailer info” >> /tmp/tempfile.$$
process_file /tmp/tempfile.$$
rm –f /tmp/tempfile.$$

However, this code is susceptible to all the problems outlined above.

If we rewrite the code as:

{
echo “some header info”
process_body
echo “some tailer info”
} | process_file

then this brackets all the relevant commands into a list and performs a single redirection of the list’s standard out into the process_file program. This avoids the need to build a temporary file from the various components of the desired input file.

What if process_file is an application that is incapable of taking its input from standard input? Surely then we have to use a temporary file?

Well, you can still avoid temporary files but it takes a bit more effort. Here’s what the code looks like. We’ll examine it line by line.

mknod /tmp/mypipe.$$ p # 1
if [ $? –ne 0 ]
then
    echo “Failed to create pipe” >&2
    exit 1
fi
chmod 600 /tmp/mypipe.$$ # 2
process_file /tmp/mypipe.$$ & # 3
(
   echo “some header info”
   process_body
   echo “some tailer info”
) > /tmp/mypipe.$$ # 4
wait $! # 5
rm –f /tmp/mypipe.$$ # 6

  1. First we create a named pipe. A named pipe is exactly the same as any other pipe except that we can create it explicitly and that it appears in the filesystem. (In other words you can see it with an ls). Now strictly speaking this is a temporary file. However, it is of zero length and therefore will not fill the filesystem. Also, if it cannot be created for any reason (including the file already existing) there is an error return. Therefore redirect attacks are useless. Of course, it is left around by an untrappable kill but we can’t have everything.
  2. We change the access mode so only the user running the script can read or write to it. If we want the pipe to be created with the correct permissions then we can invoke umask 066 before we call mknod.
  3. We set our process_file program running, reading its input from this named pipe. Now, since there is nothing on the pipe (yet) the program’s read call will block. Therefore process_file will hang awaiting input.
  4. We construct the control file for process_file as before except that this time we redirect it to our named pipe. At this point, process_file will unblock and start reading the data just as if it had come from a conventional file.
  5. The wait call will block until the specified child process has exited. The $! is a shell sequence meaning the process ID of the last background process. Since this is the PID of process_file our script will wait until process_file has been completed, just as if it had been invoked in the foreground.
  6. We remove the named pipe.

Some commercial versions of Korn Shell provide another technique you can employ which is to launch process_file with:

process_file |&

This syntax tells ksh to launch the process in background and connect a pipe to it. This pipe is handled internally by the shell itself. To write to the pipe (in other words to the standard input of process_file) you use
write –p "some data"

and to read from the pipe (to read data from process_file’s standard output) you use:
read –p variable

which then sets variable to the next line of process_file’s standard output.

Using this feature of Korn shell you can avoid creating a named pipe. This is better since an untrappable kill will not leave the pipe lying around. However, be careful of when you do the read –p call since if there is no data on the pipe the call will block.

If you have no choice but to use temporary files then use the following techniques:

  • Do not implicitly create the file with a single redirection operator. Instead, ensure the filename does not already exist first (with a –f test operation), then set your umask appropriately so that only the owner of the file can access it. Better still, set up a function to generate a unique filename, create it, and set the access permissions.
  • Perform sanity checking on the temporary file to ensure that it has been successfully written. Remember that checking $? may not be adequate since the application may not be checking error returns from writes to standard output. Try the following technique instead:
    process_file | tee $output_file > /dev/null
    if [ $? –ne 0 ]
    then
    …
    fi

    since this will make tee responsible for writing to the filesystem and it should flag errors properly. Note that the $? operator will be checking the exit status of tee and not process_file. If you want to know if process_file has worked correctly use the list operator described previously:
    {process_file;res=$?} | tee $output_file > /dev/null

    When this has completed, res will be the result of the process_file command and $? will be the result of the tee. Note if you use round brackets instead of curly then the commands will be executed in a subshell and you will not be able to access the value of res.
  • Create tidy up functions to remove the temporary file(s) if the user aborts the script. Call the same functions on controlled exit (either normal or error) from the script.

Use as few processes as possible.

This would seem to be obvious but it is surprising how many production scripts use more processes than they need to perform their chosen function.

Each program called in a pipeline requires a fork/exec cycle. This can be very expensive in terms of CPU and disk activity. Each process loaded into core reduces the amount of physical memory available for other processes and makes swapping more likely. Reducing the number of commands in a pipeline can aid performance not only of the script itself but of other processes running on the same machine.

For example, this piece of code is often seen – far too often seen to be honest:

cat $input_file | grep ‘^mypattern’ | awk ‘{print $2}’

Although this works it is not particularly efficient. This code will cause the shell to launch 3 child processes – a “cat”, a “grep” and an “awk”. The following code does exactly the same thing but only involves a single child process:
awk ‘/^mypattern/ {print $2}’ $input_file

This uses awk’s pattern matching mechanism to replace the grep and lets awk read the input file directly so avoiding a call to cat.

Let’s take another example. We have a control file that we are going to process. Assume that the control file looks like this:

INPUT:file_a
OUTPUT:file_b
ERRORS:file_c

Now we want our script to parse this control file and set up variables for each of the input, output and error files indicated therein. What follows is an actual example of code I’ve seen running in a production system:
ipfile=$(cat $file | grep ‘^INPUT’ | cut –d: -f2)
opfile=$(cat $file | grep ‘^OUTPUT’ | cut –d: -f2)
errfile=$(cat $file | grep ‘^ERRORS’ | cut –d: -f2)

Okay, so it works. However, the file has been scanned three times and 12 processes have been invoked just to set up the three environment variables (3 sub shells each of which then launches a further 3 processes)

However, the following code is much better.

awk '
BEGIN {FS=":"}
/^INPUT/ {ipfile=$2}
/^OUTPUT/ {opfile=$2}
/^ERRORS/ {errfile=$2}
END {print ipfile " " opfile " " errfile}' $control_file | \
read ipfile opfile errfile

In this case, a single “awk” script has parsed the file once and has outputted all three values at the end of its processing. A single shell “read” then copies the “awk” variables into the shell variables. This means that only one process (“awk”) is launched (since “read” is a shell built-in it does not need a process to be launched).

Note, this works fine under ksh but not in bash due to the way bash handles pipelines (in bash the final read is done in a sub shell script so the values of ipfile, opfile and errfile are lost). However, there is a technique you can use to avoid the call to read and get awk to set the environment variables directly, thus:

eval $(awk '
BEGIN {FS=":"}
/^INPUT/ {print "ipfile="$2}
/^OUTPUT/ {print "opfile="$2}
/^ERRORS/ {print "errfile="$2}' $control_file)

This is slightly more obscure, however, since we are using “awk” to generate shell commands which are then interpreted by the “eval”.  Such code, although ingenious, requires liberal commenting if anybody else is to understand it!

Use Shell Built-in Commands.

For performance reasons it is often desirable to use shell built-in functions rather than external programs. For example in the function FindMountPoint shown earlier we used the Korn shell pattern substitution rather than the external program dirname.

Thus:

t=${t%/*}

is functionally identical to:
t=$(dirname $t)

but is a lot quicker. If this were in a loop (as it was in FindMountPoint) then a noticeable improvement can be made to the execution time of the shell script.

Similarly, one often sees loop counters implemented like this:

k=`expr $k + 1`

in old shell scripts. This is such a common technique that it tends to appear even in Korn shell scripts. But if you use a shell built in:
k=$((k+1))

then it avoids a fork/exec cycle. This can have a dramatic difference in performance if the code appears in a loop.

For example, consider this bit of script which counts down from 1000 to 1:

#!/bin/bash
k=1000
while [ $k -ge 1 ]
do
k=$(expr $k - 1)
done

Let’s run this on a top-of-the-range MacBook Pro with 16Gb of RAM:
$ time /tmp/t

real  0m2.493s
user  0m1.022s
sys  0m0.853s

So that took 2.493 seconds to execute. Let’s modify the code to remove the call to expr and replace it with a shell built-in:
#!/bin/bash
k=1000
while [ $k -ge 1 ]
do
k=$((k-1))
done

Executing this on the same box:
$ time /tmp/t

real  0m0.015s
user  0m0.010s
sys  0m0.003s

Using the built-in makes the code execute in 0.015 seconds: that’s 166 times quicker.

Security Considerations

Be extra careful when writing shell scripts that run as root – all sorts of problems can occur.

Don’t trust the environment.

Here is an example of a piece of code that I’ve seen running in a production environment. A shell script ran periodically checking for the change from British Summer Time to Greenwich Mean Time (and vice versa). In order to change the system clock it had to run as root. The script is invoked through a batch scheduler system. In effect, this means that the shell script is set-UID root since any user can ask the scheduler to invoke it.

Now, the first line of the shell script executes another shell script in order to set up environment variables and such like:

. $ENV_SCRIPTS/setup.ksh

The shell script inherits the environment variable ENV_SCRIPTS from the invoking shell. So all a malicious user has to do is:

  • Set ENV_SCRIPTS to point to a local directory
  • Write a malicious script called setup.ksh and put it in that local directory
  • Invoke the calling script through the batch scheduler so that it runs as root.

 

When this is done the malicious setup.ksh will run as root.

Wrapping Up:

Shell Script in Unix and Linux are powerful development tools but when they’re running in production environments they can expose systems to security and performance risks. The techniques outlined above will help to mitigate those risks and allow you to create a powerful and fast shell script.

Learn more about DeployHub’s microservice catalog to deliver microservices at scale.