apache


I want to enumerate a variety of ways to configure apache (v2.2.9) with/without modperl (v2.0.4) for passing requests to perl code. I want to cover both the situations where we have full access to configure apache (in-house server) and those situations where we don’t (third-party server).

To set a base line, we’ll assume there are other apache-driven perl applications on the same server, and the application we’re configuring has multiple modules. I’d like to avoid URL rewriting, but have URLs as clean as is practical. When running from modperl, I’d like to maximise the setup done at server start-up and minimise the work done per hit. I’ll give consideration to efficiency, scalability, and maintainability.

The application is a specialisation of CGI::Application, resides in /srv/trial and has the dir structure suggested at Layout for a CGI::Application. This means we need to prepend ‘/srv/trial/mod:/srv/trial/lib:‘ to the @INC path.


Pieces of the puzzle

First we’ll go through the pieces of the jigsaw before looking at how they are typically put together.

Interpreter start up and state isolation

PerlOptions Clone

PerlOptions +Clone directive will share the parent (apache) perl interpreter but specific to the VirtualHost. This is most useful if the hosts load distinct (large) modules or load the same modules with distinct parameters. For example, one uses catalyst and the other uses CGI::Application. Or one loads use Cari::Mysql (cnfdir => '/var/local/auth/abc'); while the other loads use Cari::Mysql (cnfdir => '/var/local/auth/xyz');.

PerlOptions Parent

PerlOptions +Parent directive will create a new parent perl interpreter (for this scope). This is very similar to Clone above, but it does not inherit from above and the scope can be more specific than the VirtualHost.
Example from mod_perl2 docs

<Location /trial>
  PerlOptions +Parent
  PerlSwitches -I/srv/trial/mod -I/srv/trial/lib
  PerlInterpStart 1
  PerlInterpMax 4
</Location>
<Location /trial2>
  PerlOptions +Parent
  PerlSwitches -I/srv/trial2/mod -I/srv/trial2/lib
  PerlInterpStart 1
  PerlInterpMax 2
</Location>

[NB: Although that example is lifted from the modperl docn, it doesn’t work for me]

Set modules search path

PerlOptions +Parent
PerlSwitches -I/srv/trial/mod
PerlSwitches -I/srv/trial/lib

Which is equivalent to the following (note the path declarations are swapped).

PerlOptions +Parent
PerlSwitches -Mlib=/srv/trial/lib
PerlSwitches -Mlib=/srv/trial/mod

Alternatively, you can use a startup script

PerlPostConfigRequire /srv/trial/cfg/startup.pl

Passing to the handler

The usual handler type is perl-script. It takes care of setting up and isolating %ENV and ties STDIN and STDOUT to make request object IO easy. To make life simple, always use perl-script when returning a response body.
In those rare situations where you don’t need that support, you can gain a little performance by using instead modperl. Using this route, the only %ENV vars are MOD_PERL, MOD_PERL_API_VERSION, PATH, TZ.
If your handler is written in OO style (ie expects class/object as first param) then you have a choice between

<Location /gateway>
  PerlResponseHandler Gateway
</Location>
package Gateway;
sub handler : method {
    my ($proto, $r) = @_;

or writing the call within the apache configuration

<Location /gateway>
  PerlResponseHandler Gateway->handler
</Location>
package Gateway;
sub handler {
    my ($proto, $r) = @_;

Other pieces

We haven’t discussed decisions/consequences of MPM choice. (I stick to prefork when I can.)


Example scenarios

Now we’ve seen the key pieces of the puzzle, here are some sample ways of putting them together for various scenarios.

Dirty CGI

The following will spawn a new child perl per request; that is expensive but ensures state changes in the code can’t bleed out to other code, nor even subsequent hits on the same code.
Example from mod_perl2 docs

<Location /cgi-bin>
  PerlOptions +Parent
  PerlInterpMaxRequests 1
  PerlInterpStart 1
  PerlInterpMax 1
  PerlResponseHandler ModPerl::Registry
</Location>

[NB: Although that example is lifted from the modperl docn, it doesn’t work for me]

CGI directory

ScriptAlias /trial/ /srv/trial/cgi/
<Directory /srv/trial/cgi>
  AllowOverride None
  Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
</Directory>

Vanilla CGI setup

<Location /perl>
  SetHandler perl-script
  PerlHandler ModPerl::Registry
  Options ExecCGI
  PerlOptions +ParseHeaders
</Location>
<Location /cgi-bin>
  SetHandler perl-script
  PerlHandler ModPerl::PerlRun
  Options ExecCGI
  PerlOptions +ParseHeaders
</Location>

CGI::Application

DocumentRoot /srv/ebdb/www
<Directory /srv/ebdb/www>
  Options -Indexes -Multiviews +FollowSymLinks
  AllowOverride None
<Directory>
PerlOptions +Parent
PerlSwitches -I/srv/ebdb/mod -I/srv/ebdb/lib
<Location /ebdb>
  PerlInterpStart 1
  PerlInterpMinSpare 1
  PerlInterpMaxSpare 4
  SetHandler perl-script
  PerlHandler C::Dispatch
  PerlSetVar DISPATCH_DEBUG 1
</Location>

Directives scope

Server scoped directives

  • PerlSwitches
  • PerlPostConfigRequire
  • PerlModule
  • PerlInterpStart
  • PerlInterpMax

Directory scoped directives

  • PerlOptions
  • PerlSetVar
  • PerlAddVar
  • PerlSetEnv
  • PerlResponseHandler

It’s easy (in debian 6.0 at least) to have more than one instance of apache2 running because the management scripts check whether they were invoked as “somethingapache2” or as “somethingapache2-something”. In the notes below I’m using ‘b’ as the suffix, so my paths will end with ‘apache2-b’.

[Before launching into this, check that your apache start/stop script /etc/init.d/apache2 includes the line

DIR_SUFFIX="-${0##*/apache2-}"

somewhere near the top. (In v2.2.15 it’s at line 15.) If not, these notes will be pretty much no help to you, and you should consider upgrading to a version that has it.]

First identify what needs to be copied or linked from your original instance of apache2, so you need a list of what paths are included in your current instance. Generally I use such package info a lot, so on debian I do (as root):

cd / && ln -s -nf var/lib/dpkg/info dinfo

This means that the path info for package <pkg> is available at /dinfo/<pkg>.list

Paths to be copied

Once I’ve checked the output of the following is sensible, I change the ‘echo‘ to ‘cp -a‘.

#!/bin/bash
DIR_SUFFIX="b"
for p in $(grep apache2$ /dinfo/apache2.2-common.list \
| grep -Fv share \
| grep -Fv init.d \
| grep -Fv lib); do
    echo $p "${p}-$DIR_SUFFIX"
done

On my current setup, that results in the following paths being duplicated with the suffix.

  1. /etc/apache2
  2. /etc/cron.daily/apache2
  3. /etc/default/apache2
  4. /etc/logrotate.d/apache2
  5. /var/cache/apache2
  6. /var/log/apache2


(The dir /var/run/apache2-b will be created automatically.) If you know you’ll never use apache cache, you can skip 3 & 5. Scripts in 1—4 then need to be edited to change ‘apache2’ to ‘apache2-b’. I prefer to do this in vim using ‘:%s/apache2/apache2-b/gc‘ to step through each edit.
Then check that no ports/sites clash with the original.

Paths to be linked

Once I’ve checked the output of the following is sensible, I change the ‘echo‘ to ‘ln -s -nf‘.

#!/bin/bash
DIR_SUFFIX="b"
for p in $(grep -E "sbin/|init.d/" /dinfo/apache2.2-common.list \
| grep -F 2); do
    d=${p%/*}
    f=${p##*/}
    if [ $p != "$d/$f" ]; then
        echo "Paths got mangled ($d) ($f)" >&2
        exit 1
    fi
    (cd $d && echo $f "${f}-$DIR_SUFFIX")
done

On my current setup, that results in the following paths being linked with the suffix.

  1. /etc/init.d/apache2
  2. /usr/sbin/a2dismod
  3. /usr/sbin/a2dissite
  4. /usr/sbin/a2enmod
  5. /usr/sbin/a2ensite
  6. /usr/sbin/apache2ctl

If you want the new instance to run as a different user, just edit /etc/apache2-b/envvars.
Then invoke /etc/init.d/apache2-b start
and ps should then show that /usr/sbin/apache2 -d /etc/apache2-b -k start is running.

More paths to be linked

The above all works a treat… until you upgrade the ‘main’ instance of apache and your sibling instances are left out in the cold, possibly broken. Until some handy scripts turn up, the answer is to copy less and link more.

#!/bin/bash
DIR_SUFFIX="b"
for d in conf.d mods-available; do
    (cd /etc/apache2/$d \
        && find * -maxdepth 0 -type f) \
    | \
    (cd /etc/apache2-$DIR_SUFFIX/$d \
        && while read f; do
            ln -s -nf ../../apache2/$d/$f
        done)
done

cd /etc/apache2-$DIR_SUFFIX
# Can add envvars to following list if running as same user
for f in apache2.conf httpd.conf; do
    ln -s -nf ../apache2/$f
done

Custom changes to new instance

So now you have the flexibility to run the new instance as a separate user, to use a separate perl binary, to use the same perl binary but with a completely different modperl environment, and so on. It’s advantageous to keep apache2.conf a symbolic link and never edit it. If you want to have custom settings, eg the number of child processes, just add/edit a file under conf.d.
/etc/apache2-b/conf.d/threads:

<IfModule mpm_prefork_module>
    StartServers 2
    MinSpareServers 2
    MaxSpareServers 4
    MaxClients 150
    MaxRequestsPerChild 0
<IfModule>

So a huge thank-you to the apache folk for removing what was a big headache.