next up previous contents index
Next: 4.6 Memory and resource Up: 4. Inside Apache Previous: 4.4 The Request-Response Loop   Contents   Index

Subsections


4.5 The Configuration Processor


4.5.1 Where and when Apache reads configuration

Figure 4.15: Configuration processing components of Apache (View PDF)
Aufbau-Konfigurationsdateien.gif

After Apache has been built, the administrator can configure it at start-up via command line parameters and global configuration files. Local configuration files (usually named .htaccess) are processed during each request and can be modified by web authors.

Figure 4.15 shows the system structure of Apache focusing on the configuration processor. At the bottom, we see the administrator modifying global configuration files like httpd.conf, srm.conf, access.conf and local configuration files ht1 to ht5 (.htaccess). The administrator starts Apache passing command line parameters. The config reader then reads and processes the global configuration files and the command line parameters and stores the result in the config data storage, which holds the internal configuration data structures.
For each request sent from a browser, the request processor advises the 'per request config generator' to generate a per-request config valid for this request. The per request config generator has to process the .htaccess files it finds in the resource's path and merges it with the config data. The request processor now knows how to map the request URI to a resource and can decide if the browser is authorized to get this resource.

Figure 4.16: Points in time when Apache reads configuration (View PDF)
Ablauf-Gesamt.gif

Figure 4.16 shows the situations when Apache processes configuration -- the diagram is an excerpt of figure 4.7. After the master server has read the per-server configuration, it enters the 'Restart Loop'. It deletes the old configuration and processes the main configuration file again as it does it on every restart.

In the 'Request-Response Loop', a child server generates the per-request configuration for each request. As the configuration data concerning the request is tied to the per-request data structure request_rec, it is deleted after the request has been processed.

In the next parts, we will first take a look on the data structures that are generated when processing the global configuration files. After that we take a look at the source code responsible for doing this. Then we describe the processing of configuration data on requests.


4.5.2 Internal data structures

We will discuss the internal configuration data structures using figure 4.17 which gives an overview on the data structures in an Entity Relationship Diagram, and figure 4.18 which shows an example structure with the module mod_so.

Figure 4.17: Apache data structures concerning configuration (View PDF)
ER-Datenstruktur.gif

Figure 4.18: Apache configuration data structures in detail (View PDF)
Datenstrukturen-V2.gif

Each virtual host has exactly one per_server_config containing per-server information and one per_directory_config containing defaults for the per-directory configuration (lookup_defaults). per_server_config and per_directory_config are implemented as pointers to arrays of pointers.

For each module, the per_server_config array contains an entry. The entry for the core module (core_server_conf) has again a per_directory_config for each Directory and Location directive found in the configuration files.

Each per_directory_config array again contains an entry for each module. The entry for the core module (core_per_dir_conf) has again a per_directory_config for each Files directive found in the configuration files.

Figure 4.18 shows the data structure generated by the core module for each virtual host. server_conf points to the actual virtual server, the appropriate one is found by traversing the list of hosts via the 'next'-pointers in the server_recs. These also contain other per-server configuration like server_admin, server_hostname and pointers to the module_config and lookup_defaults data structures.
Both point to an array whose size is the total number of modules plus the number of dynamically loadable modules, i.e one entry for each possibly available module. Each field of the array points to a data structure defined by the accordant module (via the per-server config handlers).

In figure 4.18 the data structures of the core module and mod_so are shown, the one for mod_so just being the list of loaded modules. The focus of the diagram is on the data structure of the core module, as it is essential for the configuration of Apache, which is composed of document_root, access_name, sec_url and sec.

sec points to an array consisting of pointers, one pointer for each directory section found in the configuration files and ordered by the number of slashes in the path, shortest to longest. Each of them in turn points to an array, which contains an entry for each module. Hence, the command_handlers of each module have the possibility to create data structures and to make entries in the corresponding section of each directory (through the per-directory config handlers). The data structure for the core module of each directory section again contains a sec pointer to a similar data structure as before, this time for the files sections.

Beneath sec the module data structure of the core module contains a pointer called sec_url. This one also points to a similar data structure as the other sec pointers mentioned before, but for the location sections in the configuration files.

So, every module gets the opportunity to build its own data structure for each directory, files and location directive. The per-directory config handlers are also responsible for the entries corresponding to the file and location sections.

The data structure which lookup_defaults points to is again similar to the already explained ones. However, its sec pointer just points to an array for file sections. This is because it contains data of files sections, which are not in a specific directory context.

Additionally, you can see a part of the module structures at the bottom of the diagram, showing the command_rec containing some of the directives of the core and mod_so modules.


4.5.3 Processing global configuration data at start-up

On start-up, Apache eventually processes the main configuration file twice (see figure 4.16). The first pass is necessary, for example, for syntax checking of the global configuration file (command line parameter -t) or if Apache runs in inetd mode. The second pass is necessary, as Apache needs to get the actual configuration on every restart.

Apache calls the function ap_read_config() for processing the configuration when starting or restarting. The function is called for the first time in main() and afterwards in the 'Restart Loop'.

4.5.3.1 Processing global configuration data

Figure 4.19: Structure of the configuration processor (focusing on the data flow) (View PDF)
Aufbau_read_config.gif

Figure 4.20: Reading configuration data: Layering of function calls (View PDF)
Schichtung-ap_read_config1.gif

Figure 4.19 shows the data flow in the structure of the global configuration processor:
The agent process_command_config is responsible for reading command line parameters from the storages ap_server_pre_read_config and ap_server_post_read_config, while the agent process_resource_config reads the global configuration files. Both agents pass their data to the Line-by-line configuration file processor (ap_srm_command_loop). This is the heart of the configuration processor and it schedules the processing of a directive to the corresponding command handlers in the modules.

Figure 4.20 shows the layering of function calls regarding configuration4.7 (Note: Only the most important procedures are covered.):

ap_read_config() calls the procedures process_command_config() and ap_process_resource_config().

process_command_config() processes directives that are passed to Apache at the command line (command line options -c or -C). The arguments are stored in the arrays ap_server_pre_read_config and ap_server_post_read_config when reading command line options in main(), depending on if they should be processed before or after the main configuration file. These arrays are now handled like configuration files and are passed to the function ap_build_config() (ap_srm_command_loop() in Apache 1.3) in a cmd_parms data structure, which contains additional information like the affected server, memory pointers (pools) and the override information (see also figure 4.20).

ap_process_resource_config() actually processes the main configuration file. Apache has the ability to process a directory structure of configuration files, in case a directory name instead of a filename was given for the main configuration file. The function calls itself recursively for each subdirectory and so processes the directory structure. For each file that has been found at the recursion endpoint, a cmd_parms structure containing a handle to the configuration file is initialized and passed to ap_build_config() (ap_srm_command_loop() in Apache 1.3).

4.5.3.2 Processing a Directive

ap_build_config() (ap_srm_command_loop() in Apche 1.3) processes the directives in the configuration file on a line by line basis. To accomplish that, it uses the function ap_cfg_get_line() (ap_cfg_get_line() in Apache 1.3) which returns one line of the configuration file it has parsed, removing leading and trailing white space, deleting backslashes for line continuation and so on.

Afterwards, this line is passed to ap_build_config_sub() (ap_handle_command() in Apache 1.3), which just returns doing nothing if the line is empty or a commentary line. Otherwise it uses ap_find_command_in_modules() (ap_find_command_in_modules() respectively) to find the first module in the list of modules that could handle the command whose name it has extracted from the line before. The returned command_rec structure (see section 3.3 on modules for additional information on the command_rec structure) is passed to the procedureexcecute_now() which in turn excecutes invoke_cmd() (only invoke_cmd()in Apache 1.3). If excecute_now() (or invoke_cmd()) returns a DECLINED, ap_find_command_in_modules() is called again to get the command_rec of the next module that could handle the command.

invoke_cmd() is the procedure that actually invokes the function in the module via a function-pointer. Depending on the information in the command_rec, the adequate number of arguments is extracted (ap_get_word_conf() in 2.0 or ap_get_word_conf() in 1.3) and passed to the called function, as are the cmd_parms structure and the module_config. The cmd_parms structure contains the information where the handler can write its configuration information.

4.5.3.3 Processing Directory, Files and Location sections

Figure 4.20 also shows some of the command handlers of the core module (Links in Brackets show the Apache 1.3 version:

dirsection (dirsection), filesection (filesection) and urlsection (urlsection) are the corresponding functions to the <Directory>, <Files> and <Location> directives. Again they use ap_build_config() (ap_srm_command_loop()) to handle the directives inside a nested section.

Figure 4.21: Processing configuration files with section directives (View PDF)
Ablauf_process_config.gif

As an example, we take a look at how Apache processes a <Directory> directive by invoking the command handler dirsection(). This can be seen in figure 4.21. Now dirsection() calls for ap_build_config() (ap_srm_command_loop()) to process all directives in this nested sections line by line.

If Apache detects the directive </Directory>, it invokes the corresponding command handler, which returns the found </Directory> string as an error message, so the processing of lines is stopped and ap_build_config() (ap_srm_command_loop()) returns. If it returns NULL it has finished processing the configuration file and has not found the corresponding end section tag. The calling dirsection() function returns a 'missing end section' error. Otherwise, it adds a per-directory configuration array to the data structures of the corresponding server (virtual host). The end_nested_section() function knows for which section end it has to look because the name is stored in the cmd_parms structure which is passed to the command handlers.

The <VirtualHost> directive works similarly, the Include directive just calls again ap_process_resource_config() to process an additional configuration file.


4.5.4 Processing configuration data on requests

Figure 4.22: Structure of the per-request configuration processor: The walk procedures (View PDF)
Aufbau-Walkprozeduren.gif

4.5.4.1 Affected data structures

Apache reads configuration data at start-up and stores it in internal data structures. Whenever a child server processes a request, it reads the internal configuration data and merges the data read from the .htaccess files to get the appropriate configuration for the current request.

Figure 4.22 presents the system structure of the configuration processor and its storages containing internal configuration data structures for one virtual host. These configuration data structures have been generated at start-up and are presented in detail in figures 4.17 and 4.18. Here, the sec, sec_url, server_rec and lookup_defaults structures are shown. The name of the files that are to be processed on a request is also stored in the core_server_config and is .htaccess by default. The configuration for the request is generated in the request_rec data structure, which is represented on the right side of the diagram and also provides other required information to the walk functions.

4.5.4.2 Invoked functions

The child server invokes process_request_internal() to process a request. It first retrieves the configuration for the URI out of the sec_url structure, by calling the location_walk() procedure and passing it the request_rec as a parameter. This is done before a module handler gets the possibility to translate the request URI (ap_translate_name()), because it can influence the way the URI is translated.

After translating the request URI, the child server calls the three walk procedures in the order directory-, file- and location-walk (see figure 4.14).


4.5.4.3 The walk procedures

The directory_walk() procedure clears the per_dir_config, so the first location-walk before URI translation does not influence the further processing here.

The walk procedures all work similarly. They traverse the list of entries in the corresponding data structures and check for matching directory-, files-, or URI-sections.

However, the directory-walk is somewhat more complicated and will therefore be presented in more detail:

Figure 4.23: The directory walk (without error handling) (View PDF)
Ablauf_dir_walk.gif

The directory-walk goes down the directory hierarchy searching for directory names, which apply to the name in the request_rec and merges the corresponding entries in the per_dir_config of the request_rec (using ap_merge_per_dir_configs()). Figure 4.23 shows what happens during the directory-walk. (Error handling has been left out.):

First, the lookup_defaults structure is assigned to per_dir_defaults and therefore taken as the basis for further processing. Later, all matching directory sections are merged to the per_dir_defaults (in a pool in the request_rec).

Mainly, there are three important ways the directory-walk can take, depending on the content of filename. Each path results in the assignment of per_dir_defaults to the request_rec, thus, enabling the file-walk to work on a data structure that contains all relevant file section entries.

  1. The left path is taken if there is no filename given and just sets the URI as filename.
  2. The second path is the one taken if filename is not starting with a '/'. Here, the directory-walk just loops through the directory section entries and compares filename to the entries.
    For comparing it uses either the entry fnmatch, using a compare function specified in POSIX, a regular expression entry or simply a string compare with filename.
    It loops through the array of entries and tests each section if it matches and merges it on a hit in per_dir_defaults.
  3. The third path uses a nested loop. Here, the order of the entries is of importance (see ap_core_reorder_directories() in http_core.c). The directory sections are ordered such that the 1-component sections come first, then the 2-component, and so on, finally followed by the 'special' sections. A section is 'special' if it is a regex (regular expression), or if it doesn't start with a '/'.
    The outer loop runs as long as the number of slashes in filename is larger than i (a counter which is incremented on each pass). If i is larger the possibly matching sections are already passed.
    The nested loop actually walks through the entries, memorizing its position in j. If the actual entry is a regular expression or if the directory name is not starting with a '/', the inner loop breaks because it has entered the 'special' sections and the outer loop is finished, too. Regular expressions are compared later on in a separate loop.
    If the inner loop breaks because the number of slashes in the directory name of the entry is larger than i, the entry is skipped and the .htaccess file of the corresponding directory is included if allowed. Then the outer loop starts a new cycle. This way, all relevant .htaccess files are included.
    If no break occurs we are in the right place. In the inner loop the directory name of the entry is compared with fnmatch or strncmp. On a match the result is merged in per_dir_defaults.
    The override information is applied and where a .htaccess file has the permission to override anything, the method tries to find one.
    If a .htaccess file has to be parsed, ap_parse_htaccess() is invoked. This procedure in turn calls ap_build_config() (see figure 4.20), which works the same way as at start-up for the main configuration files, but this time on the per_dir_config structure of the request_rec.
file_walk() works only on the per_dir_config of the request_rec because the structures for the file directives are already copied from the core_server_config's and lookup_defaults' file section tables to the per_dir_config of the request_rec by the directory-walk. The filename of the file to look for is provided by the request_rec.

location_walk() uses the sec_url and the URI provided by the request_rec to work on the per_dir_config of the request. As the other walk-functions, it loops through the entries of its corresponding data structure (sec_url) and merges matching entries to the request_rec.



Footnotes

... configuration4.7
Layer diagram of function calls: A line crossing another line horizontally in a circle means the box where the line starts contains the name of the calling procedure, whereas the box with the line that crosses vertically contains the name of the called one.

next up previous contents index
Next: 4.6 Memory and resource Up: 4. Inside Apache Previous: 4.4 The Request-Response Loop   Contents   Index
Apache Modeling Portal Home Apache Modeling Portal
2004-10-29