Alájar

Masters of Engineering Project (CS 790)

by Max Attar Feingold (maf6@cornell.edu)

Project Supervisor:   Professor Bruce Land (bruce@tc.cornell.edu)

Download the code.

Introduction

Alájar is an extensible TCP/IP-based web server for Windows NT that implements a subset of the HTTP 1.1 protocol and provides a foundation upon which web-based network applications can be built.

The objective of Alájar is to make it easy for developers to develop interactive web content.  Currently, the most commonly used method of developing web applications that generate dynamically generated HTML content in response to form submissions from web clients is CGI scripting.  CGI scripts can be implemented in a variety of languages, and the results are both powerful and flexible.  However, CGI suffers from an important drawback:  a lack of scalability.  Each form submission sent by the client requires the server to launch a new process to handle it.  Most CGI scripts execute and terminate rapidly, so the cost of each new process is quite high when compared to the actual work that it performs.  For two or three clients the server will perform reasonably well, but when the server is under stress from many different simultaneous hits, the overhead of CGI scripting can rapidly become ridiculously high.

Different solutions to this problem have been proposed by different web server vendors.  One of the most popular is Microsoft Corp.'s Active Server Pages, which use server-side scripting in various languages (currently VBScript and JavaScript) to provide interactive content.  The problem with this solution is that interpretation of these scripting languages is slow enough that it is usually faster to call out to external compiled COM objects written in C++ or Visual Basic than to perform calculations in the scripts themselves.  This both complicates the development process and makes execution sluggish because of the cost of marshalling parameters and switching context from interpreted to compiled code.  However, the advantage of scripting languages over CGI is that no new processes need to be created, as the interpretation is handled in-process.

The solution proposed by the development of Alájar is to combine the best of both worlds:  under the Alájar programming model, compiled application code is executed in-process after being dynamically linked to through DLL's that implement the standard Alájar interface.  In this way the application can control exactly what the web server will send back to the client, and the full power and flexibility of CGI can be obtained without suffering its costs.

Origin

Before taking on life of its own, Alájar was originally conceived as a light-weight HTTP front end to Almonaster, a web-based multi-player war game I wrote using IIS and Active Server Pages to provide a user interface. Alájar is a complete enough product to be considered an independent entity, but in the future I will use it primarily to develop Almonaster 2.0 to the point where it can be run as a stand-alone application without the need for a third party web server or database.

Features

Alájar can operate both as a standard web server and as a framework for developers to write custom web applications.  Through the implementation of a standard interface, web administrators can create virtual domains (hereafter referred to as page sources) under which incoming requests from clients will be passed along to application code for processing.  The page sources can respond to requests by providing the server with the data that should be sent to the client in response to the request.  Using this scheme, Alájar allows interactive HTML pages to be built on the fly in a fast and scaleable manner.

It should be noted that both Java applets and JavaScript can be used with Alájar, simply because they are primarily client side technologies that are merely delivered by the server through text or binary files.  Legacy CGI code is not excluded from the Alájar development model either;  future versions of Alájar will include a page source named "cgi-bin" that will launch external applications with the parameters provided in the requested URI.

The services offered by Alájar are the following:

Implementation details

Alájar is written in C++, using the standard C and C++ library for most things, templates for certain data structures and my own OSAL (Operating System Abstraction Layer) libraries for more sophisticated functionality such as mutex locks, threads, unique temporary files, sockets, etc.

The heart of Alájar is the HttpServer object, which contains the core functionality of the web server.  Each HttpServer object contains a FileCache object that provides it with file I/O, a main socket that delegates requests to new execution threads and a table of page sources, as well as all the logic that handles the different configuration options of each page source.

The FileCache provides a standard GetFile() / ReleaseFile() interface for its clients.   The backend is a hash table containing File objects corresponding to the files in the cache.  The filecache has two low-priority threads running in the background;   one removes files in FIFO order from the cache when its maximum size is exceeded and the other checks files in the cache to make sure they are up to date.

The HttpServer object uses two special classes to communicate with the page sources:   HttpRequest and HttpResponse.  An HttpRequest object is generated for each new client connection;  it contains logic that, given an open socket, decodes the HTTP headers from the client and makes the information available to its clients with functions like GetURI(), GetMethod() or GetBrowserName().  An HttpResponse object is created each time a page source is asked for a response;  it provides functions like SetStatus() or SetMimeType() for page sources to use when specifying the type of response to send back to the client.  The response can be an error code, a URL redirect, a character buffer with its associated mime type or a file name to be opened by the server and sent over the network.  Response objects also contain lists of cookies to be associated with the client or deleted.

Execution of Alájar proceeds as follows:  an HttpServer object is created and the Start() function is called, which spawns the main server thread and returns.  The main function then blocks until a termination signal is received.   Then it calls the Stop() function and terminate the server prior to deleting the object.

When the server starts it configures itself according to the configuration files (opening the page sources specified in the configuration directory) and opens up a socket to listen on the specified port number.  When a new connection is made, a new thread is spawned.  The thread creates a HttpRequest object that decodes the request and calls the appropriate functions to handle it.  If the request falls under the aegis of a page source (as determined by the first directory name referenced in the URI), then the page source is called and its response is passed on to the client.  Otherwise, the request is handled like a normal HTTP request, the response being a file or an error message.

When data is POST'ed by a client, the server passes the data on to the appropriate page source for processing.  If a file is submitted via a <input type="file"> form, then the file is saved as a temporary file and the name is passed to the page source.  If no page source can be identified, then the server discards the submitted data and returns a standard error message.

The page source interface

The interface that every page source must support is the following:

int OnInitialize();
int OnBasicAuthenticate (char* pszLogin, char* pszPassword);
int OnGet (HttpRequest* pHttpRequest, HttpResponse* pHttpResponse);
int OnPost (HttpRequest* pHttpRequest, HttpResponse* pHttpResponse);
int OnFinalize();

OnInitialize() is called when the page source is first initialized.  OnFinalize() is called when the web server is shutting down.  These two functions allow page sources (which may contain exceedingly complex data structures and file I/O schemes) to execute initialization and termination code when appropriate.  Both functions return error codes.

OnBasicAuthenticate() is called when basic authentication is requested by a page source and a client requires authentication to proceed.  It should return 1 if the login and password are accepted or 0 if not.

OnGet() and OnPost() are called when the corresponding events are generated by the client.  All available information about the request is provided with the HttpRequest object, and all information about the page sources' response is provided to the server by calling the HttpResponse object's functions.  The value returned should be an error code.

The HttpRequest object must be used in the following manner:

1) The SetOverride() function must be called to specify whether the page source is providing dynamically-generated data or the request should be treated as usual by the server.  If the value given is false, then no more must be done.

2) The SetType() function must be called to specify the type of data being returned.   The allowable types of data are  RESPONSE_BUFFER or RESPONSE_FILE.  If the latter is used then SetFileName() must be called to specify the name of the file this should be sent.

3)The SetMimeType() function must be called to specify the mime type of the outgoing data, be it buffer or file.

4) The SetStatus() function must be called to specify the HTTP error code of the response.  Allowable codes are:

5) If the data sent is a buffer, then the SetData() / WriteData() methods can be used.   SetData() takes an entire buffer as an argument, whereas WriteData() appends the given data to the currently existing buffer.

6) SetCookie() and DeleteCookie() can be used to create and delete cookies on the client.

Aside from these constraints, the page sources can do whatever they want.  Certain care must be taken, however, since the page source code runs with the same priority as the Alájar executable itself.

Development tools

To facilitate the development of web applications with Alájar, a textual preprocessor called Asfpp (Alájar Scripting Format PreProcessor) is being developed.  Asfpp will take as input an .asf text file and output compilable C++ code.  To illustrate this procedure, the file test.asf with the following content:

<% #include <stdio.h> %>

<html>

<% for (int i = 0; i < 100; i ++) { %>
<p>
<% } %>

</html>

... would produce the following C++ code:

#include "HttpRequest.h"
#include "HttpResponse.h"

#include <stdio.h>

int RenderTest (HttpRequest* pHttpRequest, HttpResponse* pHttpResponse) {

    pHttpResponse->WriteData ("\n\n<html>\n\n");
    for (int i = 0; i < 100; i ++) {
        pHttpResponse->WriteData ("\n<p>\n");
    }
    pHttpResponse->WriteData ("\n\n</html>");
}

The ASF format was designed to resemble other popular scripting languages and facilitate conversion of code from these languages to Alájar.  The "<%" and "%>" markers are used to encapsulate C++ code, while the rest of the text is the HTML to be sent to the Response object.

Asfpp has not been finished at the time of writing.

Installation notes

A configuration file called Alajar.conf must exist in the same directory as the Alájar executable.  This file must contain the following parameters (the values given are examples), where order does not matter, spaces before and after the = sign are forbidden and no slashes at the end of directory names should be used:

Port=80
MaxFileCacheSize=1024
CounterPath=D:\mfeingol\Alajar\Counters
LogPath=D:\mfeingol\Alajar\Logs
PageSourcePath=D:\mfeingol\Alajar\PageSources
ConfigPath=D:\mfeingol\Alajar\Config

The directories specified must exist on the computer running Alájar

In the ConfigPath directory, a file named Default.conf must contain the following parameters:

Name=Default
401File=D:\mfeingol\alajar\401.html
403File=D:\mfeingol\alajar\403.html
404File=D:\mfeingol\alajar\404.html
500File=D:\mfeingol\alajar\500.html
501File=D:\mfeingol\alajar\501.html
AllowDirectoryBrowsing=1
BasePath=d:\mfeingol\alajar\www_root
DefaultFile=index.html
PageSourceLibrary=
PageSourceNameSpace=
OverrideGet=0
OverridePost=0
UseBasicAuthentication=0
UseDailyLogs=1
UseDefaultFile=1
UsePageSource=0
UseLogging=1
UseSSI=1

These parameters are used to configure the default page source, which is the one used when no other page source matches the request URI.  Each page source must have a file with identical contents in the ConfigPath directory (*.conf) in order to be registered with Alájar upon startup.

Future enhancements