One of the things that attracts (some) people to CGI::Application is the availability to have stripped down super efficient web applications. One of the strengths of it (CA) is that it is versatile enough to provide different strengths to different people, so in fact, a good proportion of CA users don’t actually care about what I call lean programming — as long as you’re not doing anything fancy you can get away with creating everything for every request that comes in. So that’s why some users advocate ignoring efficiency in the early stages; CGI::Application is already a pretty lean perl application framework, so any deployment that works has a good chance of being fairly efficient.

However, it turns out there are other users who require and cater for lean programming, including many of the plugin authors. The technique is very simple — you put your persistent objects/data in class storage and your per-request objects/data in your per-request CA object. (There are many tweaks to that model, including using traditional RDBMS, no-sql tuple pools, memcache variants, CPAN wrapper modules, etc, but usually the simple flavour is all you need and is easy to implement in CGI::Application.) Look at the CGI::Application::Plugin::Authentication source code if you want to see an example of the inner workings, but that’s not necessary (and the logic flow is not terribly transparent in that plug-in). The pattern goes:

  1. Have a MyApp::Base which is a sub-class of CGI::Application and which all your MyApp (request handler) modules sub-class
  2. In that class, define your cgiapp_init method
  3. In cgiapp_init initialise all your application-wide configuration into class storage (eg %MyApp::Base::__Stash) using a guard to ensure you only do this once per perl interpreter:
    unless ($MyApp::Base::__Stash{config}) {
        $MyApp::Base::__Stash{config} = {};
        # Initialise application-wide values
        # Initialise plug-ins, using those values
        MyApp::Base->authen->config(
            # ...
        );
    }
    
  4. For each plug-in you use, check whether it caters for class access in addition to object access. eg for CGI::Application::Plugin::Authentication, use MyApp::Base->authen->config rather than $myapp->authen->config and it will do the rest for you.
  5. Then (still within your cgiapp_init) initialise any per-request data into your $myapp object:
        # Per-request initialisation
        $self->param(
            foo => 'bar',
            dbh => $MyApp::Base::__Stash{db_accessor}->connect,
            # ...
        );
    

You can take this further by ensuring as much as possible is done in the initial (parent) perl interpreter, for example doing initialisation before the first fork if using mod_perl2 with the ‘prefork’ MPM, but that’s the subject of a separate story.

Advertisements

This is an example “way of doing things” with CGI::Application. It can’t claim to be ‘best practice’ since there are some better sites out there that happen to do things differently. It’s mainly as a reminder to myself and collaborators of how things should hang together, but if you’re starting out with CGI::Application you could do a lot worse than to copy these suggestions; your aims and requirements will be at least a bit different, but it should be easy to extend or take these notes on a different tack.

Directory Layout

These would usually be under /srv/<app>

bin
Batch scripts (executed by sysadmin)
cfg
Configuration files. eg cfg/app.cnf
cgi
Apache-invoked files for where we can’t configure apache. eg cgi/app.cgi
data
Initialisation or run-time data that is kept in files
doc
Documentation for everything. eg doc/man/man3/Cari::Mysql.3pm.gz
lib
Third-party (upstream) modules. eg lib/Cari/Mysql.pm
log
Log files. eg log/app_err.log
mod/C
Application controller modules. egs mod/C/Mod1.pm, mod/C/Mod1/Submod1.pm
mod/M
Application model modules. eg mod/M/User.pm
mod/V
Application view modules. eg mod/V/Dropdown.pm
test
Test scripts
tmpl
Template files. eg tmpl/mod1/rm1.html
www
Static web-accessible files. eg www/favicon.ico
www/css
Style sheets. eg www/css/app.css
www/img
Icons and images. eg www/img/mod1/banner.jpg
www/js
Javascript. eg www/js/jquery-1.4.2.min.js

I want to enumerate a variety of ways to configure apache (v2.2.9) with/without modperl (v2.0.4) for passing requests to perl code. I want to cover both the situations where we have full access to configure apache (in-house server) and those situations where we don’t (third-party server).

To set a base line, we’ll assume there are other apache-driven perl applications on the same server, and the application we’re configuring has multiple modules. I’d like to avoid URL rewriting, but have URLs as clean as is practical. When running from modperl, I’d like to maximise the setup done at server start-up and minimise the work done per hit. I’ll give consideration to efficiency, scalability, and maintainability.

The application is a specialisation of CGI::Application, resides in /srv/trial and has the dir structure suggested at Layout for a CGI::Application. This means we need to prepend ‘/srv/trial/mod:/srv/trial/lib:‘ to the @INC path.


Pieces of the puzzle

First we’ll go through the pieces of the jigsaw before looking at how they are typically put together.

Interpreter start up and state isolation

PerlOptions Clone

PerlOptions +Clone directive will share the parent (apache) perl interpreter but specific to the VirtualHost. This is most useful if the hosts load distinct (large) modules or load the same modules with distinct parameters. For example, one uses catalyst and the other uses CGI::Application. Or one loads use Cari::Mysql (cnfdir => '/var/local/auth/abc'); while the other loads use Cari::Mysql (cnfdir => '/var/local/auth/xyz');.

PerlOptions Parent

PerlOptions +Parent directive will create a new parent perl interpreter (for this scope). This is very similar to Clone above, but it does not inherit from above and the scope can be more specific than the VirtualHost.
Example from mod_perl2 docs

<Location /trial>
  PerlOptions +Parent
  PerlSwitches -I/srv/trial/mod -I/srv/trial/lib
  PerlInterpStart 1
  PerlInterpMax 4
</Location>
<Location /trial2>
  PerlOptions +Parent
  PerlSwitches -I/srv/trial2/mod -I/srv/trial2/lib
  PerlInterpStart 1
  PerlInterpMax 2
</Location>

[NB: Although that example is lifted from the modperl docn, it doesn’t work for me]

Set modules search path

PerlOptions +Parent
PerlSwitches -I/srv/trial/mod
PerlSwitches -I/srv/trial/lib

Which is equivalent to the following (note the path declarations are swapped).

PerlOptions +Parent
PerlSwitches -Mlib=/srv/trial/lib
PerlSwitches -Mlib=/srv/trial/mod

Alternatively, you can use a startup script

PerlPostConfigRequire /srv/trial/cfg/startup.pl

Passing to the handler

The usual handler type is perl-script. It takes care of setting up and isolating %ENV and ties STDIN and STDOUT to make request object IO easy. To make life simple, always use perl-script when returning a response body.
In those rare situations where you don’t need that support, you can gain a little performance by using instead modperl. Using this route, the only %ENV vars are MOD_PERL, MOD_PERL_API_VERSION, PATH, TZ.
If your handler is written in OO style (ie expects class/object as first param) then you have a choice between

<Location /gateway>
  PerlResponseHandler Gateway
</Location>
package Gateway;
sub handler : method {
    my ($proto, $r) = @_;

or writing the call within the apache configuration

<Location /gateway>
  PerlResponseHandler Gateway->handler
</Location>
package Gateway;
sub handler {
    my ($proto, $r) = @_;

Other pieces

We haven’t discussed decisions/consequences of MPM choice. (I stick to prefork when I can.)


Example scenarios

Now we’ve seen the key pieces of the puzzle, here are some sample ways of putting them together for various scenarios.

Dirty CGI

The following will spawn a new child perl per request; that is expensive but ensures state changes in the code can’t bleed out to other code, nor even subsequent hits on the same code.
Example from mod_perl2 docs

<Location /cgi-bin>
  PerlOptions +Parent
  PerlInterpMaxRequests 1
  PerlInterpStart 1
  PerlInterpMax 1
  PerlResponseHandler ModPerl::Registry
</Location>

[NB: Although that example is lifted from the modperl docn, it doesn’t work for me]

CGI directory

ScriptAlias /trial/ /srv/trial/cgi/
<Directory /srv/trial/cgi>
  AllowOverride None
  Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
</Directory>

Vanilla CGI setup

<Location /perl>
  SetHandler perl-script
  PerlHandler ModPerl::Registry
  Options ExecCGI
  PerlOptions +ParseHeaders
</Location>
<Location /cgi-bin>
  SetHandler perl-script
  PerlHandler ModPerl::PerlRun
  Options ExecCGI
  PerlOptions +ParseHeaders
</Location>

CGI::Application

DocumentRoot /srv/ebdb/www
<Directory /srv/ebdb/www>
  Options -Indexes -Multiviews +FollowSymLinks
  AllowOverride None
<Directory>
PerlOptions +Parent
PerlSwitches -I/srv/ebdb/mod -I/srv/ebdb/lib
<Location /ebdb>
  PerlInterpStart 1
  PerlInterpMinSpare 1
  PerlInterpMaxSpare 4
  SetHandler perl-script
  PerlHandler C::Dispatch
  PerlSetVar DISPATCH_DEBUG 1
</Location>

Directives scope

Server scoped directives

  • PerlSwitches
  • PerlPostConfigRequire
  • PerlModule
  • PerlInterpStart
  • PerlInterpMax

Directory scoped directives

  • PerlOptions
  • PerlSetVar
  • PerlAddVar
  • PerlSetEnv
  • PerlResponseHandler

It’s easy (in debian 6.0 at least) to have more than one instance of apache2 running because the management scripts check whether they were invoked as “somethingapache2” or as “somethingapache2-something”. In the notes below I’m using ‘b’ as the suffix, so my paths will end with ‘apache2-b’.

[Before launching into this, check that your apache start/stop script /etc/init.d/apache2 includes the line

DIR_SUFFIX="-${0##*/apache2-}"

somewhere near the top. (In v2.2.15 it’s at line 15.) If not, these notes will be pretty much no help to you, and you should consider upgrading to a version that has it.]

First identify what needs to be copied or linked from your original instance of apache2, so you need a list of what paths are included in your current instance. Generally I use such package info a lot, so on debian I do (as root):

cd / && ln -s -nf var/lib/dpkg/info dinfo

This means that the path info for package <pkg> is available at /dinfo/<pkg>.list

Paths to be copied

Once I’ve checked the output of the following is sensible, I change the ‘echo‘ to ‘cp -a‘.

#!/bin/bash
DIR_SUFFIX="b"
for p in $(grep apache2$ /dinfo/apache2.2-common.list \
| grep -Fv share \
| grep -Fv init.d \
| grep -Fv lib); do
    echo $p "${p}-$DIR_SUFFIX"
done

On my current setup, that results in the following paths being duplicated with the suffix.

  1. /etc/apache2
  2. /etc/cron.daily/apache2
  3. /etc/default/apache2
  4. /etc/logrotate.d/apache2
  5. /var/cache/apache2
  6. /var/log/apache2


(The dir /var/run/apache2-b will be created automatically.) If you know you’ll never use apache cache, you can skip 3 & 5. Scripts in 1—4 then need to be edited to change ‘apache2’ to ‘apache2-b’. I prefer to do this in vim using ‘:%s/apache2/apache2-b/gc‘ to step through each edit.
Then check that no ports/sites clash with the original.

Paths to be linked

Once I’ve checked the output of the following is sensible, I change the ‘echo‘ to ‘ln -s -nf‘.

#!/bin/bash
DIR_SUFFIX="b"
for p in $(grep -E "sbin/|init.d/" /dinfo/apache2.2-common.list \
| grep -F 2); do
    d=${p%/*}
    f=${p##*/}
    if [ $p != "$d/$f" ]; then
        echo "Paths got mangled ($d) ($f)" >&2
        exit 1
    fi
    (cd $d && echo $f "${f}-$DIR_SUFFIX")
done

On my current setup, that results in the following paths being linked with the suffix.

  1. /etc/init.d/apache2
  2. /usr/sbin/a2dismod
  3. /usr/sbin/a2dissite
  4. /usr/sbin/a2enmod
  5. /usr/sbin/a2ensite
  6. /usr/sbin/apache2ctl

If you want the new instance to run as a different user, just edit /etc/apache2-b/envvars.
Then invoke /etc/init.d/apache2-b start
and ps should then show that /usr/sbin/apache2 -d /etc/apache2-b -k start is running.

More paths to be linked

The above all works a treat… until you upgrade the ‘main’ instance of apache and your sibling instances are left out in the cold, possibly broken. Until some handy scripts turn up, the answer is to copy less and link more.

#!/bin/bash
DIR_SUFFIX="b"
for d in conf.d mods-available; do
    (cd /etc/apache2/$d \
        && find * -maxdepth 0 -type f) \
    | \
    (cd /etc/apache2-$DIR_SUFFIX/$d \
        && while read f; do
            ln -s -nf ../../apache2/$d/$f
        done)
done

cd /etc/apache2-$DIR_SUFFIX
# Can add envvars to following list if running as same user
for f in apache2.conf httpd.conf; do
    ln -s -nf ../apache2/$f
done

Custom changes to new instance

So now you have the flexibility to run the new instance as a separate user, to use a separate perl binary, to use the same perl binary but with a completely different modperl environment, and so on. It’s advantageous to keep apache2.conf a symbolic link and never edit it. If you want to have custom settings, eg the number of child processes, just add/edit a file under conf.d.
/etc/apache2-b/conf.d/threads:

<IfModule mpm_prefork_module>
    StartServers 2
    MinSpareServers 2
    MaxSpareServers 4
    MaxClients 150
    MaxRequestsPerChild 0
<IfModule>

So a huge thank-you to the apache folk for removing what was a big headache.