Thalassa CMS logo

Thalassa CMS

Pages served by Thalassa CGI

Contents:

What the pages are, actually

First things first: if you haven't read the short introduction to pages given within the Thalassa CGI overview, please be sure to read before you continue.

When you see an URL like

  http://www.example.com/foo/thalcgi.cgi/bar/buzz.html

you might suspect that the page available there is not a real HTML file but something generated by a “CGI script” (which has nothing to do with scripts) named thalcgi.cgi, but this is only because you know what thalcgi.cgi actually is. From the point of view of an unaware observer, this is just an URL of another HTML page, nothing special. Furthermore, client software, including browsers, is not aware of CGIs, too, so it doesn't handle such URLs in any special way.

Browsers are even sure that a trailing “/” in an URL means it is a directory. Suppose you've got a link on your page, given as an unqualified file name, like <a href="buzz.html">. If your page is requested as http://example.com/foo/thalcgi.cgi/bar/, the browser will assume the link points to http://example.com/foo/thalcgi.cgi/bar/buzz.html, but in case the same page is requested as http://example.com/foo/thalcgi.cgi/bar (note that the trailing slash is absent), the same link will be assumed to point to http://example.com/foo/thalcgi.cgi/buzz.html, because the “bar” path component is now considered a file name rather than a directory name. It doesn't matter that in reality there's no file nor directory.

Hereinafter, we'll use the term “path” to denote the part of an URI which stands to the right from the name of the CGI, starting with the leading slash — unless it is explicitly stated that the word “path” is used in some other meaning.

You can configure your thalcgi.cgi installation to serve as many different pages as you want. They are configured with ini file sections of the page group, headed like [page ID]. For the ID, there are exactly two possibilities: it must be either a “full path”, starting with the “/”, like /bar/buzz.html, or a “tree id”, like bar”, which must contain no slashes at all. Sections identified with a full path serve that exact path only, while sections that have a tree id as their IDs serve all possible paths with the given first component. E.g., the [page bar] section will serve all paths like /bar, /bar/buzz.html, /bar/abra/schwabra/cadabra.html and so on. In this case, macros will provide you with access to other path components, and there's even a parameter that allows to choose which paths to accept and which to reject (with either “404 path not found” or “403 forbidden” error). Such page sections are called multipath hereinafter.

It is undefined what will happen in case the ID contains slashes but doesn't start with one (like foo/bar). In the present version such page sections will be silently ignored, but this is likely to change in the future.

The [page ] ini section group

The following parameters are recognized by Thalassa GCI within sections that belong to the [page ] group: session_required, embedded, post_allowed, path_predicate, template, selector, check_fnsafe, post_content_limit, post_param_limit, action, reqargs and reqard. Besides that, parameters with arbitrary names can be added; their values will be accessible with the %[page: ] macro.

The session_required, embedded and post_allowed are effectively boolean values, they may be set to yes or no (actually, anything but yes is considered to be equal to no); they specify, respectively, does the page require a work session to be established, is the page intended to be embedded somewhere (e.g., into an iframe tag on a statically-generated page) and does the page accept POST requests. Macroprocessing is not done on these three parameters.

In the present version the embedded parameter's value is only used by the %[page:ifembedded:] macro function and doesn't affect the functioning of the CGI program in any other way. It may me useful if you prefer to have common templates for your HTML header and footer, but don't want to display some elements, such as the site's heading, on embedded pages.

The path_predicate parameter is only used for multipath page sections, that is, sections headed like [page name] where name doesn't contain any slashes. Let's recall that such sections serve all paths with the given first component, such as /name/foo, /name/bar, /name/foo/bar/buzz.html and so on. In case the path_predicate parameter is specified, its value is passed through the macroprocessor, and it should result in either “yes”, “no” or “reject”. If the macroprocessing results in any other string, it is considered equal to “no”. If the value is “yes”, the request is processed; for any other values the CGI program rejects the request, which effectively means it displays the error page; the error is “403 forbidden” if the macroexpansion resulted in the “reject” string, otherwise it is “404 path not found”.

Within the path_predicate parameter's value, actual path components may be referred to as %1%, %2% etc.; e.g., for the /name/foo/bar/buzz.html path the %1% will expand to foo, %2% to bar and %3% to buzz.html (the first path component, which is the name of the section, name in this example, may be accessed as %0%, but it is not recommended to rely on this). The same is true for all other parameters within multipath [page ] ini sections.

There's another check for acceptability of a particular path, activated by setting the check_fnsafe parameter. It is useful in case some of the path components are going to be used as file names or file name parts. The value of the parameter, if it is set, gets passed through the macroprocessor; the result is broken down to words, using the apostrophe “'” and the doublequote “"” as grouping symbols (both an apostrophe within doublequotes and a doublequote within apostrophes are considered as plain chars), much like in the Shell command line. Every “word” is then tested if it can safely serve as a filesystem path component, which means it doesn't contain any whitespace nor control characters, any characters from the !"#$%&'()*,/<>=?[\]^`{|}~ set, any characters with codes greater than 126, and doesn't start with either “.” or “-”. In case the test is failed, the CGI refuses to continue, and the error page is displayed with the “406 path not acceptable”.

WARNING: Setting the check_fnsafe parameter correctly is critical for security; failure to do so can result in someone getting unauthorized access to your server's filesystem, which might have horrible consequences.

The template parameter is perhaps the simplest to explain: it defines the page's content to be displayed to the user. The value is passed through the macroprocessor.

The selector parameter's value, after macroexpanding, is used as a specifiers for values of “unrecognized” parameters available through the %[page: ] macro; see the description of the macro for details.

The rest of the recognized parameters are only used with POST requests. The post_content_limit and post_param_limit are relatively easy to explain: they set page-specific values for the same POST data size limits for which the default (global) values are set by parameters of the same names within the [global] ini section (follow the link for the explanation).

The parameters reqargs and reqarg will be described later in a separate section.

The action parameter is hard to explain without the introduction to POST requests handling in general, so we postpone its description for the webforms handling documentation.

The %[page: ] macro

Once the CGI program analysed the path and selected which particular [page ] ini section to use, the %[page: ] macro becomes available.

The macro requires at least one argument. If the first argument is one of ifsessionrequired, ifpostallowed or ifembedded (which are the names of the three supported functions), the macro takes two more arguments, for then and else. All the three functions check a certain condition and return the then argument if the condition is met, the else argument otherwise. The conditions are, respectively, whether the page being served is configured to require an active work session, does it accept POST requests and is it marked as embedded. See the session_required, embedded and post_allowed page section parameters' description for details.

If the first argument is not one of the three function names mentioned above, it should be the only argument (the rest of arguments, if there are any, will be ignored). The argument is then used as a name of a parameter within the same [page ] section. Briefly (and incompletely) speaking, the macro call in this case expands to the value of that parameter, after the value is passed through the macroprocessor. This allows one to have more HTML snippets right within the page section.

In contrast with %[html:] snippets, these can not be used as parametrized templates. This is because within a [page ] ini section, these “positionals” (that is, numerical macros like %1%, %12% etc.) are used to refer to the path components so no way is left to refer to template arguments if they were there.

However, these parameters are there not because we want to have more snippets, although that can be convenient in itself. Their primary purpose is to make it easier to specify different versions of the same page, and it is achieved with the help of the selector parameter.

Once the [page ] ini section is choosen basing on the path from the request being served, the selector parameter's value, if any, gets passed through the macroprocessor and the result is memorized. The result should be a simple identifier, or else something may go wrong. This identifier is used by the %[page: ] macro when accessing arbitrary parameters of the [page ] ini section, as the specifier.

For example, if the following parameters are set:

  [page /foo.html]
  selector = buzz
  motto = Whatever hits the fan will not be distributed evenly
  motto:buzz = It is not cheating, it's a team work
  slogan = Life is short, smile while you still have teeth
  slogan:usb = Life is too short to remove USB safely
  slogan:lazy = I am not lazy, I am just on my energy saving mode

and the /foo.html path is requested, %[page:motto] will expand to “It is not cheating, it's a team work”, while %[page:slogan] will become “Life is short, smile while you still have teeth”.

Please note this example is just to show how the things work. In real life configurations, a constant selector (like as in this example) makes no sense at all; to get any profit, it must be an expression containing macros, specially conditionals, and sometimes the expression becomes pretty complicated.

The “request arguments” facility

This section's content may be hard to understand just because it is unclear what it's all about and how to use it. You will not need all this stuff until you begin with the user comments facility. It is absolutely safe to skip this section now and return to it right before reading about user comments.

The so-called “request arguments” are actually values one can set within a [page ] ini section and access within other sections, typically the [comments] section. The most obvious use for them is when outside of the [page ] section (typically a multipath one) you need to access some information passed via the URI: within the [page ] section that information is available via the %1%, %2% and their family, but these “positional arguments” are not available in other places of the configuration file.

Setting request arguments

A [page ] ini section may include two parameters related to the “request argument” facility: reqargs and reqarg.

The reqargs parameter controls which request arguments are to be set; its value is a whitespace-separated list of identifiers you'd like to use to identify your request arguments. The value for each of the arguments is set by adding a reqarg parameter with the argument's identifier as the specifier. For example:

  reqargs = alpha beta gamma
  reqarg:alpha = the value for the “alpha” request arg
  reqarg:beta = this one is for “beta”
  reqarg:gamma = and “gamma” goes here

In a real-life situation, these values will perhaps use macros, specially these “positionals” (%1%, %2% etc.)

The %[reqarg: ] macro

The %[reqarg: ] macro is used to access the request arguments. The macro always accepts exactly one argument — the identifier of the request argument to be queried. It returns the respective value, or an empty string in case there's no request argument with such name.

© Andrey V. Stolyarov, 2023, 2024