id: thalcgi_pages
type: listitem
list: userdoc
tags: userdoc
title: Pages served by Thalassa CGI
format: texbreaks


Contents:<ul>
<li><a href="#what_pages_are">What the pages are, actually</a></li>
<li><a href="#page_ini_section">The <code>[page&nbsp;]</code>
                                ini section group</a></li>
<li><a href="#page_macro">The <code>%[page:&nbsp;]</code> macro</a></li>
<li><a href="#request_args">The &ldquo;request arguments&rdquo; facility</a></li>
<ul>
  <li><a href="#reqarg_parameters">Setting request arguments</a></li>
  <li><a href="#reqarg_macro">The <code>%[reqarg:&nbsp;]</code> macro</a></li>
</ul>
</ul>


<h2 id="what_pages_are">What the pages are, actually</h2>

First things first: if you haven't read the short
<a href="thalcgi_overview.html#simple_pages">introduction to pages</a>
given within the <a href="thalcgi_overview.html">Thalassa CGI overview</a>,
please be sure to read before you continue.

When you see an URL like

<pre>
  http://www.example.com/foo/thalcgi.cgi/bar/buzz.html
</pre>

you might suspect that the page available there is not a real HTML file but
something generated by a &ldquo;CGI script&rdquo; (which has nothing to do with
scripts) named <code>thalcgi.cgi</code>, but this is only because you know
what <code>thalcgi.cgi</code> actually is.  From the point of view of an
unaware observer, this is <em>just an URL of another HTML page</em>,
nothing special.  Furthermore, client software, including browsers, is not
aware of CGIs, too, so it doesn't handle such URLs in any special way.

<p class="remark">
Browsers are even sure that a trailing &ldquo;<code>/</code>&rdquo; in an URL means
it is a directory.  Suppose you've got a link on your page, given as an
unqualified file name, like <code>&lt;a href="buzz.html"&gt;</code>.
If your page is requested as
<code>http://example.com/foo/thalcgi.cgi/bar/</code>, the browser will
assume the link points to
<code>http://example.com/foo/thalcgi.cgi/bar/buzz.html</code>, but
in case <em>the same page</em> is requested as
<code>http://example.com/foo/thalcgi.cgi/bar</code> (note that the trailing
slash is absent), the same link will be assumed to point to
<code>http://example.com/foo/thalcgi.cgi/buzz.html</code>, because the
&ldquo;<code>bar</code>&rdquo; path component is now considered a file name rather
than a directory name.  It doesn't matter that in reality there's no file
nor directory.
</p>

Hereinafter, we'll use the term &ldquo;path&rdquo; to denote the part of an URI which
stands <em>to the right</em> from the name of the CGI, starting with the
leading slash &mdash; unless it is explicitly stated that the word &ldquo;path&rdquo;
is used in some other meaning.

You can configure your <code>thalcgi.cgi</code> installation to serve as
many different pages as you want.  They are configured with ini file
sections of the <code>page</code>
<a href="ini_basics.html#sectiongroups">group</a>, headed like
<code>[page&nbsp;<em>ID</em>]</code>.  For the <em>ID</em>, there are
exactly two possibilities: it must be either a &ldquo;full path&rdquo;, starting with
the&nbsp;&ldquo;<code>/</code>&rdquo;, like <code>/bar/buzz.html</code>, or a &ldquo;tree
id&rdquo;, like <span id="multipath"></span>&ldquo;<code>bar</code>&rdquo;, which must
contain no slashes at all.  Sections identified with a full path serve that
exact path only, while sections that have a tree id as their IDs serve all
possible paths with the given first component.  E.g., the <code>[page
bar]</code> section will serve all paths like <code>/bar</code>,
<code>/bar/buzz.html</code>, <code>/bar/abra/schwabra/cadabra.html</code>
and so on.  In this case, macros will provide you with access to other path
components, and there's even a parameter that allows to choose which paths
to accept and which to reject (with either &ldquo;<code>404 path not
found</code>&rdquo; or &ldquo;<code>403 forbidden</code>&rdquo; error).  Such
<code>page</code> sections are called <em>multipath</em> hereinafter.

<strong>It is undefined what will happen in case the <em>ID</em> contains
slashes but doesn't start with one (like <code>foo/bar</code>).  In the
present version such page sections will be silently ignored, but this is
likely to change in the future.</strong>



<h2 id="page_ini_section">The <code>[page&nbsp;]</code> ini section group</h2>

The following parameters are <em>recognized</em> by Thalassa GCI within
sections that belong to the <code>[page&nbsp;]</code> group:
<code>session_required</code>, <code>embedded</code>,
<code>post_allowed</code>, <code>path_predicate</code>,
<code>template</code>, <code>selector</code>, <code>check_fnsafe</code>,
<code>post_content_limit</code>, <code>post_param_limit</code>,
<code>action</code>, <code>reqargs</code> and <code>reqard</code>.
Besides that, <strong>parameters with arbitrary names can be added; their
values will be accessible with the
<a href="#page_macro"><code>%[page:&nbsp;]</code> macro</a></strong>.

The <span id="page_type_params"></span><code>session_required</code>,
<code>embedded</code> and <code>post_allowed</code> are effectively boolean
values, they may be set to <code>yes</code> or <code>no</code> (actually,
anything but <code>yes</code> is considered to be equal to
<code>no</code>); they specify, respectively, does the page require a work
session to be established, is the page intended to be embedded somewhere
(e.g., into an <code>iframe</code> tag on a statically-generated page) and
does the page accept POST requests.  Macroprocessing is
<strong>not</strong> done on these three parameters.

<p class="remark">In the present version the <code>embedded</code>
parameter's value is only used by the <code>%[page:ifembedded:]</code> macro
function and doesn't affect the functioning of the CGI program in any other
way.  It may me useful if you prefer to have common templates for your HTML
header and footer, but don't want to display some elements, such as the
site's heading, on embedded pages.</p>

The <code>path_predicate</code> parameter is only used for
<a href="#multipath">multipath</a> page sections, that is, sections headed
like <code>[page name]</code> where <code>name</code> doesn't contain any
slashes.  Let's recall that such sections serve all paths with the given
first component, such as <code>/name/foo</code>, <code>/name/bar</code>,
<code>/name/foo/bar/buzz.html</code> and so on.  In case the
<code>path_predicate</code> parameter is specified, its value is passed
through the macroprocessor, and it <em>should</em> result in either
&ldquo;<code>yes</code>&rdquo;, &ldquo;<code>no</code>&rdquo; or &ldquo;<code>reject</code>&rdquo;.
If the macroprocessing results in any other string, it is considered equal
to &ldquo;<code>no</code>&rdquo;.  If the value is &ldquo;<code>yes</code>&rdquo;, the request
is processed; for any other values the CGI program rejects the request,
which effectively means it displays the
<a href="thalcgi_baseconf.html#error_page">error page</a>; the error is
&ldquo;<code>403 forbidden</code>&rdquo; if the macroexpansion resulted in the
&ldquo;<code>reject</code>&rdquo; string, otherwise it is
&ldquo;<code>404 path not found</code>&rdquo;.

Within the <code>path_predicate</code> parameter's value, actual path
components may be referred to as <code>%1%</code>, <code>%2%</code> etc.;
e.g., for the <code>/name/foo/bar/buzz.html</code> path the
<code>%1%</code> will expand to <code>foo</code>, <code>%2%</code> to
<code>bar</code> and <code>%3%</code> to <code>buzz.html</code> (the first
path component, which is the name of the section, <code>name</code> in this
example, may be accessed as <code>%0%</code>, but it is not recommended to
rely on this).  <strong>The same is true for all other parameters within
multipath <code>[page&nbsp;]</code> ini sections</strong>.

There's another check for acceptability of a particular path, activated by
setting the <code>check_fnsafe</code> parameter.  It is useful in case some
of the path components are going to be used as file names or file name
parts.  The value of the parameter, if it is set, gets passed through the
macroprocessor; the result is broken down to words, using the apostrophe
&ldquo;<code>'</code>&rdquo; and the doublequote &ldquo;<code>"</code>&rdquo; as grouping
symbols (both an apostrophe within doublequotes and a doublequote within
apostrophes are considered as plain chars), much like in the Shell command
line.  Every &ldquo;word&rdquo; is then tested if it can safely serve as a filesystem
path component, which means it doesn't contain any whitespace nor control
characters, any characters from the
<code>!"#$%&amp;'()*,/&lt;&gt;=?[\]^`{|}~</code> set, any characters
with codes greater than 126, and doesn't start with either
&ldquo;<code>.</code>&rdquo; or &ldquo;<code>-</code>&rdquo;.  In case the test is failed, the
CGI refuses to continue, and the error page is displayed with the
&ldquo;<code>406 path not acceptable</code>&rdquo;.

<strong>WARNING: Setting the <code>check_fnsafe</code> parameter correctly
is critical for security; failure to do so can result in someone getting
unauthorized access to your server's filesystem, which might have horrible
consequences.</strong>

The <code>template</code> parameter is perhaps the simplest to explain: it
defines the page's content to be displayed to the user.  The value is
passed through the macroprocessor.

The <span id="selector_param"></span><code>selector</code> parameter's
value, after macroexpanding, is used as a
<a href="ini_basics.html#parameter_specifiers">specifiers</a> for
values of &ldquo;unrecognized&rdquo; parameters available through the
<code>%[page:&nbsp;]</code> macro; see the
<a href="#page_macro">description of the macro</a> for details.

The rest of the <em>recognized</em> parameters are only used with POST
requests.  The <code>post_content_limit</code> and
<code>post_param_limit</code> are relatively easy to explain: they set
page-specific values for the same POST data size limits for which the
default (global) values are set by parameters of the same names within the
<a href="thalcgi_baseconf.html#global_conf"><code>[global]</code> ini
section</a> (follow the link for the explanation).

The parameters <code>reqargs</code> and <code>reqarg</code> will be
described later in a <a href="#reqarg_parameters">separate section</a>.

The <code>action</code> parameter is hard to explain without the
introduction to <code>POST</code> requests handling in general, so we
postpone its description for the <a href="thalcgi_webforms.html">webforms
handling documentation</a>.



<h2 id="page_macro">The <code>%[page:&nbsp;]</code> macro</h2>

Once the CGI program analysed the path and selected which
particular <code>[page&nbsp]</code> ini section to use, the
<code>%[page:&nbsp;]</code> macro becomes available.

The macro requires at least one argument.  If the first argument is one of
<code>ifsessionrequired</code>, <code>ifpostallowed</code> or
<code>ifembedded</code> (which are the names of the three supported
functions), the macro takes two more arguments, for <em>then</em> and
<em>else</em>.  All the three functions check a certain condition and
return the <em>then</em> argument if the condition is met, the
<em>else</em> argument otherwise.  The conditions are, respectively,
whether the page being served is configured to require an active work
session, does it accept POST requests and is it marked as embedded.
See the <a href="#page_type_params"><code>session_required</code>,
<code>embedded</code> and <code>post_allowed</code></a> page section
parameters' description for details.

If the first argument is not one of the three function names mentioned
above, it should be the only argument (the rest of arguments, if there are
any, will be ignored).  The argument is then used as a name of a parameter
within the same <code>[page&nbsp;]</code> section.  Briefly (and
incompletely) speaking, the macro call in this case expands to the value of
that parameter, after the value is passed through the macroprocessor.  This
allows one to have more HTML snippets right within the page section.

<p class="remark">In contrast with <code>%[html:]</code> snippets, these
can not be used as parametrized templates.  This is because within a
<code>[page&nbsp;]</code> ini section, these &ldquo;positionals&rdquo; (that is,
numerical macros like <code>%1%</code>, <code>%12%</code> etc.) are used to
refer to the path components so no way is left to refer to template
arguments if they were there.</p>

However, these parameters are there not because we want to have more
snippets, although that can be convenient in itself.  Their primary purpose
is to make it easier to specify <em>different versions</em> of the same
page, and it is achieved with the help of the
<a href="#selector_param"><code>selector</code> parameter</a>.

Once the <code>[page&nbsp]</code> ini section is choosen basing on the path
from the request being served, the <code>selector</code> parameter's value,
if any, gets passed through the macroprocessor and the result is memorized.
The result should be a simple identifier, or else <em>something may go
wrong</em>.  This identifier is used by the <code>%[page:&nbsp;]</code>
macro when accessing arbitrary parameters of the <code>[page&nbsp]</code>
ini section, as the
<a href="ini_basics.html#parameter_specifiers">specifier</a>.

For example, if the following parameters are set:
<pre>
  [page /foo.html]
  selector = buzz
  motto = Whatever hits the fan will not be distributed evenly
  motto:buzz = It is not cheating, it's a team work
  slogan = Life is short, smile while you still have teeth
  slogan:usb = Life is too short to remove USB safely
  slogan:lazy = I am not lazy, I am just on my energy saving mode
</pre>

and the <code>/foo.html</code> path is requested,
<code>%[page:motto]</code> will expand to &ldquo;<code>It is not cheating, it's
a team work</code>&rdquo;, while <code>%[page:slogan]</code> will become
&ldquo;<code>Life is short, smile while you still have teeth</code>&rdquo;.

Please note this example is just to show how the things work.  In real life
configurations, a constant <code>selector</code> (like as in this example)
makes no sense at all; to get any profit, it must be an expression
containing macros, specially
<a href="common_macros.html#conditionals">conditionals</a>, and sometimes
the expression becomes pretty complicated.



<h2 id="request_args">The &ldquo;request arguments&rdquo; facility</h2>

<p class="remark">This section's content may be hard to understand just
because it is unclear what it's all about and how to use it.  You will not
need all this stuff until you begin with the
<a href="thalcgi_comments.html">user comments</a> facility.  It is
absolutely safe to skip this section now and return to it right
before reading about user comments.</p>

The so-called &ldquo;request arguments&rdquo; are actually values one can set within a
<code>[page&nbsp;]</code> ini section and access within other sections,
typically the <code>[comments]</code> section.  The most obvious use for
them is when outside of the <code>[page&nbsp;]</code> section (typically a
multipath one) you need to access some information passed via the
URI: within the <code>[page&nbsp;]</code> section that information is
available via the <code>%1%</code>, <code>%2%</code> and their family, but
these &ldquo;positional arguments&rdquo; are not available in other places of the
configuration file.



   <h3 id="reqarg_parameters">Setting request arguments</h3>

A <code>[page&nbsp;]</code> ini section may include two parameters related
to the &ldquo;request argument&rdquo; facility: <code>reqargs</code> and
<code>reqarg</code>.

The <code>reqargs</code> parameter controls <em>which</em> request
arguments are to be set; its value is a whitespace-separated list of
identifiers you'd like to use to identify your request arguments.
The value for each of the arguments is set by adding a <code>reqarg</code>
parameter with <em>the</em> argument's identifier as the
<a href="ini_basics.html#parameter_specifiers">specifier</a>.
For example:

<pre>
  reqargs = alpha beta gamma
  reqarg:alpha = the value for the &ldquo;alpha&rdquo; request arg
  reqarg:beta = this one is for &ldquo;beta&rdquo;
  reqarg:gamma = and &ldquo;gamma&rdquo; goes here
</pre>

In a real-life situation, these values will perhaps use macros, specially
these &ldquo;positionals&rdquo; (<code>%1%</code>, <code>%2%</code> etc.)


   <h3 id="reqarg_macro">The <code>%[reqarg:&nbsp;]</code> macro</h3>

The <code>%[reqarg:&nbsp;]</code> macro is used to access the request
arguments.  The macro always accepts exactly one argument &mdash; the
identifier of the request argument to be queried.  It returns the
respective value, or an empty string in case there's no request argument
with such name.
