next up previous contents index
Next: 4. Inside Apache Up: 3. The Apache HTTP Previous: 3.2 Using Apache   Contents   Index

Subsections


3.3 Extending Apache: Apache Modules (G)

3.3.1 Introduction

Modules are pieces of code which can be used to provide or extend functionality of the Apache HTTP Server. Modules can either be statically or dynamically included with the core. For static inclusion, the module's source code has to be added to the server's source distribution and to compile the whole server. Dynamically included modules add functionality to the server by being loading as shared libraries during start-up or restart of the server. In this case the module mod_so provides the functionality to add modules dynamically. In a current distribution of either Apache 2.0 or Apache 1.3, all but very basic server functionality has been moved to modules.

Modules interact with the Apache server via a common interface. They register handlers for hooks in the Apache core or other modules. The Apache core calls all registered hooks when applicable, that means when triggering a hook. Modules on the other hand can interact with the server core via the Apache API. Using that API each module can access the server's data structures, for example for sending data or allocating memory.

Figure 3.3: Apache 2 Module structure (View PDF)
module_internals-Apache2_BD.gif

Each module contains a module-info, which contains information about the handlers(G) provided by the module and which configuration directives the module can process. The module info is essential for module registration by the core.

All Apache server tasks, be it master server or child server, contain the same executable code. As the executable code of an Apache task consists of the core, the static modules and the dynamically loaded ones, all tasks contain all modules.

Figure 3.4: Interaction of Core and Modules (View PDF)
modules_and_core_BD.gif

As you can see in figure 3.4, Modules and the Core can interact in two different ways. The server core calls module handlers registered in its registry. The modules on the other hand can use the Apache API for various purposes and can read and modify important data structures like the request/response record request_rec and allocate memory in the corresponding pools.

3.3.2 Types of Handlers

A module can provide different kinds of handlers:


3.3.2.1 Handlers for hooks (G)

A hook is a transition in the execution sequence where all registered handlers will be called. It's like triggering an event which results in the execution of the event handlers. The implementation of a hook is a hook function named ap_run_HOOKNAME which has to be called to trigger the hook.

Two types of calling handlers for a hook can be distinguished:

Each module has to register its handlers for the hooks with the server core first before the server core can call them. Handler registration is different in Apache 1.3 and 2.0. Apache 1.3 provided 13 predefined hooks. Registration of the module's handlers was done automatically by the core by reading the module info while loading modules. In Apache 2.0, the module info only contains references to four handlers for predefined hooks used for configuration purposes only. All other hook handlers are registered by calling the register_hooks function each module has to provide.This makes it easier to provide new hooks without having to alter the Apache module interface. A module can provide new hooks for which other new modules can register hooks as well.

Figure 3.5: Apache 2.0 hook mechanism (View PDF)
apache2_0-hook-mechanism_BD.gif

Figure 3.5 shows how hooks and handlers interact in Apache: A hook ABC has to be defined by some C macros (AP_DECLARE_HOOK, etc, see bottom line}. This results in the creation of a registration procedure ap_hook_ABC, a hook caller procedure ap_run_ABC and an entry in the hook handler registry which keeps information about all registered handlers for the hook with their modules and their order. The module (meta) info at the top points to the hook handler registration procedure (register_hooks) which registers the handlers for the hooks calling the ap_hook_xxx procedures. At the bottom, an agent called ``request processing controler'' is a representative of all agents triggering hooks by calling the ap_run_xxx procedures which read the hook handler registry and call all or one registered handler.

The order of calling handlers for a hook can be important. In Apache 1.3, the order of the module registration determined the order in which their handlers would be called. The order could be altered in the configuration file but was the same for all 13 hooks. In Apache 2, this has changed. The hook registry can store an individual order of handlers for each hook. By registering a handler for a hook using the ap_hook_xxx procedure, a module can supply demands for its position in the calling sequence. It can name modules that's handlers have to be called first or afterwards, or it can try to get the first or the last position.


3.3.2.2 Handlers for Configuration Directives

A module can provide an own set of directives which can be used in the configuration files. The configuration processor of the core therefore delegates the interpretation of a directive to the corresponding command handler which has been registered for the directive. In figure 3.5 the module (meta) info at the top points to the configuration management handlers of the module (create-dir-config, merge-dir-config, etc.) and to the command table which contains configuration directives and the pointers to the corresponding command handlers.

The configuration management handlers have the purpose to allocate memory for configuration data read by the command handlers and to decide what to do if configuration parameters differ when hierarchically merging configuration data during request processing.

3.3.2.3 Optional functions

An Apache 2.0 module can also register optional functions and filters. Optional Functions are similar to hooks. The difference is that the core ignores any return value from an optional function. It calls all optional functions regardless of errors. So optional functions should be used for tasks that are not crucial to the request-response process at all.

3.3.3 Content Handling

The most important step in the request-response loop is calling the content handler which is responsible for sending data to the client.

In Apache 1.3, the content handler is a handler very much like any other. To determine which handler to call Apache 1.3 uses the type_checker handler which maps the requested resource to a mime-type or a handler. Depending on the result, the Apache Core calls the corresponding content handler which is responsible for successfully completing the response. It can write directly to the network interface and send data to the client. That makes request handling a non-complex task but has the disadvantage that usually only one module can take part in handling the request. If more than one content handler have been determined for the resource, the handler that was registered first is called. It is not possible that one handler can modify the output of another without additional changes in the source code.

Apache 2.0 extends the content handler mechanism by output filters . Altough still only one content handler can be called to send the requested resource, filters can be used to manipulate data sent by the content handler. Therefore multiple modules can work cooperatively to handle one request. During the mime-type definition phase in Apache 2.0 multiple filters can be registered for one mime-type together with an order in which they are supposed to handle the data. Each mime-type can be associated with a different set of modules and a differing filter order. Since a sequenced order is defined, these filters form a chain called the output filter chain.

When the Content handler is called, Apache 2.0 initiates the output filter chain. Within that chain a filter performs actions on the data and when finished passes that data to the next filter. That way a number of modules can work together in forming the response. One example is a CGI content handler handing server side include tags down the module chain so that the include module can handle them.


3.3.4 Apache 2 Filters

Apache 2 Filters(G) are handlers for processing data of the request and the response. They have a common interface and are interchangeable.

Figure 3.6: Apache Filters: The Input / Output filter chain (View PDF)
filter_chains_BD.gif

In figure 3.6 you see two example filter chains: The input filter chain to process the data of the request and the output filter chain to process the data of the response (provided by the content handler). The agent ``Request processing'' triggers the input filter chain while reading the request. An important use of the input filter chain is the SSL module providing secure HTTP (HTTPS) communication.

The output filter chain is triggered by the content handler. In our example, the Deflate output filter compresses the resource depending on its type.

Figure 3.7: Apache Filters: A Brigade contains a series of buckets (View PDF)
filters_bucket-brigade_BD.gif

Figure 3.8: Apache Filters: A Filter chain using Brigades for data transport (View PDF)
filter_chain_bucket-brigade_BD.gif

To improve performance, filters work independently by splitting the data into buckets and brigades (see figure 3.7) and just handing over references to the buckets instead of writing all data to the next filter's input (see figure 3.8). Each request or response is split up into several brigades. Each brigade consists of a number of buckets.One filter handles one bucket at a time and when finished hands the bucket on to the next filter. Still the order in which the filters hand on the data is kept intact.

Besides separating filters into input and output filters, 3 different categories can be distinguished:

  1. Resource and Content Set Filters
    Resource Filters alter the content that is passed through them. Server Side Includes (SSI) or PHP scripting are typical examples.
    Content Set Filters alter the content as a whole, for example to compress or decompress it (Deflate).
  2. Protocol and Transcode Filters
    Protocol Filters are used to implement the protocol's behavior (HTTP, POP, ...). That way future versions of HTTP could be supported.
    Transcode Filters alter the transport encoding of request or response. For example, the chunk output filter splits a resource into data chunks which it sends to the client one after another .
  3. Connection and Network Filters
    Connection Filters deal with establishing and releasing a connection. For example, establishing an HTTPS connection requires a special handshake between client and server. They may also alter the content, in the HTTPS example by encrypting and decrypting data.
    Network Filters are responsible for interacting with the operating system to establish network connections and complete associated tasks. To support protocols other than TCP/IP, only a module implementing an input and output filter for the specific connection protocol is needed.

3.3.5 Predefined Hooks

Even though in Apache 2.0 handlers for hooks are registered differently from Apache 1.3, the predefined hooks are very alike in both versions and can be distinguished into 3 different categories by their purpose and their place in the runtime sequence:

  1. Managing and processing configuration directives
  2. Start-up, restart or shutdown of the server
  3. Processing HTTP requests

3.3.5.1 1. Configuration Management Hooks

During start-up or restart, the Apache master server reads and processes the configuration files. Each modules can provide a set of configuration directives. The configuration processor of the core will call the associated command handler every time it encounters a directive belonging to a module.To prepare resources for storing configuration data, a module can register handlers for the following hooks:

Configuration data is organized hierarchically. Rules have to be defined in which cases a configuration parameter of a lower (more specific) level may override the parameter of a higher level. The ``merge'' handlers can be used for this task.

For more information about Apache configuration and the configuration processor, consult sections 3.2.1 and 4.5.

3.3.5.2 2. Start-up, restart or shutdown of the server

Apache is a multitasking server. During start-up and restart, there is only one task performing initialization and reading configuration. Then it starts spawning child server tasks which will do the actual HTTP request processing. Depending on the multiprocessing strategy chosen, there may be a need for another initialization phase for each child server to access resources needed for proper operation, for example connect to a database. If a child server terminates during restart or shutdown, it must be given the opportunity to release its resources.

For more information about the multitasking server strategies, MPMs and master and child server tasks, take a look at section 4.3 in the next chapter.

3.3.5.3 3. Receiving and Processing HTTP requests

All following handlers actually deal with request handling and are part of the request-response loop. Figures 3.9 and 3.10 illustrate the sequential structure of that process.

3.3.5.3.1 Establish a connection and read the request

Figure 3.9 shows the behavior of Apache during the request-response loop. Most of the hooks shown here didn't exist in Apache 1.3.

Figure 3.9: Apache request-response loop with module callbacks (View PDF)
modules+request-response-loop-Apache2_PN.gif

3.3.5.3.2 Process a request and send a response

Figure 3.10 shows how Apache processes an HTTP request. The special case of internal requests will not be explained further.

Figure 3.10: Apache request processing with module callbacks (View PDF)
modules+request-processing-Apache2_PN.gif

3.3.6 Inside a Module: mod_cgi

This module is discussed in detail to illustrate the structure of Apache Modules by a practical example.

Both distributions of Apache 1.3 and 2.0 include mod_cgi. This module is used to process CGI programs that can create dynamic web content. Due to the architectural differences between versions 1.3 and 2.0 discussed in the previous chapter the two versions of the module are different.

3.3.6.1 Mod_cgi in Apache 1.3

3.3.6.1.1 Module Info and referenced functions

Usually the module info can be found at the end of the main source file for the specific module. In mod_cgi for Apache 1.3 the module info contains references to 2 handlers:

module MODULE_VAR_EXPORT cgi_module =

{

  STANDARD_MODULE_STUFF,

  NULL,              /* initializer */

  NULL,              /* dir config creater */

  NULL,              /* dir merger - default is to override */

  create_cgi_config, /* server config */

  merge_cgi_config,  /* merge server config */

  cgi_cmds,          /* command table */

  cgi_handlers,      /* handlers */

  NULL,              /* filename translation */

  NULL,              /* check_user_id */

  NULL,              /* check auth */

  NULL,              /* check access */

  NULL,              /* type_checker */

  NULL,              /* fixups */

  NULL,              /* logger */

  NULL,              /* header parser */

  NULL,              /* child_init */

  NULL,              /* child_exit */

  NULL               /* post read-request */

};

The first line within the module struct references a macro called "standard_module_stuff" which expands to the information each module has to provide. Two functions referenced in here are create_cgi_config and merge_cgi_config. The corresponding hooks for these handlers are create server config and merge server config. If you have a look at the two functions you will see that the first allocates and initializes memory for configuration data and the second merges the data stored for each virtual host with data stored for the master server.

3.3.6.1.2 The command table

static const command_rec cgi_cmds[] =

{

    {"ScriptLog", set_scriptlog, NULL, RSRC_CONF, TAKE1,

     "the name of a log for script debugging info"},

    {"ScriptLogLength", set_scriptlog_length, NULL, RSRC_CONF, TAKE1,

     "the maximum length (in bytes) of the script debug log"},

    {"ScriptLogBuffer", set_scriptlog_buffer, NULL, RSRC_CONF, TAKE1,

     "the maximum size (in bytes) to record of a POST request"},

    {NULL}

};

The references for command table and content handler do not point to functions but to structs. The command table struct contains references to the functions used to process the different directives that can be used for configuring mod_cgi. Within the command table each function is referenced with the additional keyword TAKE1 which tells the core that only one parameter is accepted.

3.3.6.1.3 The content handlers

static const handler_rec cgi_handlers[] =

{

    {CGI_MAGIC_TYPE, cgi_handler},

    {"cgi-script", cgi_handler},

    {NULL}

};

The struct for the content handler registers the CGI mime-type as well as the "cgi-script" handler string with the function cgi_handler, which is the function called by the core for the content handler. Using that struct a module can register functions for more than one handler.

When the type_checker decided that the mod_cgi module should handle a request and then the core calls the content handler, it actually calls the function cgi_handler.

Cgi_handler first prepares for executing a CGI by checking some pre conditions, like "Is a valid resource requested? ". Then it creates a child process by calling ap_bspawn_child that will execute the CGI program. Parameters for that function are among others the name of the function to be called within the process, here cgi_child, and a child_stuff struct that contains the whole request record. Child_cgi itself then prepares to execute the interpreter for the script and calls ap_call_exec, which is a routine that takes the different operating systems into account and uses the exec routines working for the currently used operating system. After that all output by the script is passed back to the calling functions until it reaches the cgi_handler handler function that then sends the data to the client including the necessary HTTP header.

3.3.6.2 Mod_cgi in Apache 2.0

3.3.6.2.1 Module Info and referenced functions

In the version for Apache 2.0 the module info is much smaller. Most references to handlers for hooks are now replaced by the reference to the function register_hooks. All handlers except the handlers for the configuration management hooks are now dynamically registered using that function.

module AP_MODULE_DECLARE_DATA cgi_module =

{

    STANDARD20_MODULE_STUFF,

    NULL,               /* dir config creater */

    NULL,               /* dir merger -- default is to override */

    create_cgi_config,  /* server config */

    merge_cgi_config,   /* merge server config */

    cgi_cmds,           /* command apr_table_t */

    register_hooks      /* register hooks */

};

Even though syntax may vary, semantically the functions for configuration and the command table perform the same actions as in Apache 1.3. Having a look at the register_hooks function you can see an example how to influence the order to process the handler. While the cgi_post_config function shall be called absolutely first when the hook post_config is triggered, the cgi_handler should be called somewhere in the middle when the content handler hook is triggered.

static void register_hooks(apr_pool_t *p)

{

    static const char * const aszPre[] = { "mod_include.c", NULL };

    ap_hook_handler(cgi_handler, NULL, NULL, APR_HOOK_MIDDLE);

    ap_hook_post_config(cgi_post_config, aszPre, NULL, APR_HOOK_REALLY_FIRST);

}

3.3.6.2.2 Request handling employing filters

In mod_cgi for Apache 2.0, the function cgi_handler is the start of the output filter chain. At first it behaves very much like its Apache 1.3 pendant. It prepares to start a process to execute the CGI program. It then retrieves the response data from that process. Most of the execution is done in the cgi_child function.

After it has got the response from the program, its task is to hand the just created brigade down the filter chain. That is done at the end of the function with a call to ap_pass_brigade. For example, it is now possible for a cgi program to output SSI (server-side includes) commands which are then processed by the include module. In that context the include module must have registered a filter that now gets the data from mod_cgi. Of course that depends on the configuration for the corresponding MIME type.


3.3.7 The Apache API

The Apache API summarizes all possibilities to change and enhance the functionality of the Apache web server. The whole server has been designed in a modular way so that extending functionality means creating a new module to plug into the server. The previous chapter covered the way in which modules should work when the server calls them. This chapter explains how modules can successfully complete their tasks.

Basically, all the server provides is a big set of functions that a module can call. These functions expect complex data structures as attributes and return complex data structures. These structures are defined in the Apache sources.

Again the available features differ between the two major Versions of Apache 1.3 and 2.0. Version 2.0 basically contains all features of 1.3 and additionally includes the Apache Portable Runtime (APR) which enhances and adds new functionality.


3.3.7.1 Memory management with Pools

Apache offers functions for a variety of tasks. One major service apache offers to its modules is memory management. Since memory management is a complex task in C and memory holes are the hardest to find bugs in a server, Apache takes care of freeing all used memory after a module has finished its tasks. To accomplish that, all memory has to be allocated by the apache core. Therefore memory is organized in pools. Each pool is associated with a task and has a corresponding lifetime. The main pools are the server, connection and request pool. The server pool lives until the server is shut down or restarted, the connection pool lives until the corresponding connection is closed and the request pool is created upon arrival and destroyed after finishing a request. Any module can request any type of memory from a pool. That way the core knows about all used memory. Once the pool has reached the end of its lifetime, the core deallocates all memory managed by the pool. If a module needs memory that should even have a shorter lifetime than any of the available pools a module can ask Apache to create a sub pool. The module can then use that pool like any other. After the pool has served its purpose, the module can ask Apache to destroy the pool. The advantage is that if a module forgets to destroy the sub pool, the core only has to destroy the parent pool to destroy all sub pools.

Additionally Apache offers to take care of Array and Table management, which again makes memory management easier. Arrays in Apache can grow in size over time and thus correspond to Vectors in Java and Tables contain key/value pairs and thus are similar to hash tables. Section 4.6 provides further informations about the pool technology.

3.3.7.2 Data types

Apache offers a variety of functions that require special parameters and return values of special structure. Therefore it is essential to know about the Data types used by the Apache API. Most fields contained in any of these records should not be changed directly by a module. Apache offers API functions to manipulate those values. These functions take necessary precautions to prevent errors and ensure compatibility in later versions.

3.3.7.3 API Functions

Besides memory management Apache assists the modules in various ways. The main target of the API is full abstraction from the operating system and server configuration. Apache offers functions for manipulating Apache data structures and can take precautions the module does not need to know about. The core can also perform system calls on behalf of the module and will use the routines corresponding to the operating system currently in use. That way each module can be used in any operating systems environment. For example the Apache API includes functions for creating processes, opening communication channels to external processes and sending data to the client. Additionally Apache offers functions for common tasks like string parsing.

Within the Apache API, functions can be classified into the following groups:

For further information on the Apache 1.3 API and how to write Apache Modules see [4].


3.3.7.4 Apache 2.0 and the Apache Portable Runtime (APR)

Version 2.0 of Apache introduces the Apache Portable Runtime, which adds and enhances functionality to the Apache API. Due to the APR Apache can be considered a universal network server that could be enhanced to almost any imaginable functionality. That includes the following platform independent features:

A goal ot the APR is to set up a common framework for network servers. Any request processing server could be implemented using the Apache core and the APR.


next up previous contents index
Next: 4. Inside Apache Up: 3. The Apache HTTP Previous: 3.2 Using Apache   Contents   Index
Apache Modeling Portal Home Apache Modeling Portal
2004-10-29