Contents:
[sess: ]
macroOriginally, HTTP is request-based, which means that client (browser) establishes a connection with server, sends exaclty one request, receives the server's response and closes the connection. Recent implementations of both browsers and servers are able to keep the connection to save some traffic in case several requests from the same client to the same server are needed in a row, which is a typical situation as a page can contain images, and downloading each of them takes another request. However, even if several requests are performed using one TCP session, each of them is still considered kinda self-containing, that is, from the server's point of view, single requests have no relationship between them.
From the user's point of view, the picture is very different. Web sites tend to “remember” what the user did, and even if they don't, it is a rare case when the user fills a single web form and gets what (s)he wants right from a single request. More usual situation is when the site offers the user several steps of a dialog.
Well, the most common situation nowadays is that the page containing the form changes right under the user's hands, and often it does so despite the user didn't request anything like that. Okay, all this is done with client-side scripting, which we not only don't use, actually we believe that publishing sites with client-side scripts should be considered criminal and pusnished with several years in prison.
What's even more important is that sometimes users somehow identify theirselves to a site, and all subsequent actions are assumed to be done by the same user. Also there are other good reasons (in contrast with bad reasons, like tracking) for a site to remember something about the user, like, as it is true for Thalassa, in case the user solved a CAPTCHA test once, it looks fair not to force the user to do it again and again.
So, it is obvious that web sites sometimes need to identify requests coming from the same user, or, to say it another way, to separate requests coming from different users. Such requests, known to come from the same client, are exactly what “work session” actually is. There are two ways to maintain sessions: the site needs either to use a cookie to store the session ID, or to embed the ID into the URL of each page.
The solution with session ID within the URL has several fundamental flows. First, users tend to share URLs of what they see, and an average user will not bother stripping off any session IDs from what (s)he copy/pastes to another messenger to send to a friend. The server may get confused seeing the same session continuing from two different locations, and, what's more serious, the “friend” who received the URL containing the session ID, may do some actions on behalf of the user who sent the URL (and the one who sent will hardly realize that (s)he shared not only the URL but also the unintended “procuracy”).
Another fundamental flow of keepig session ID in the URL is that it simply doesn't work for static HTML pages. Unlike various dynamically generated content, static pages are, well, static. They can “accept” (and ignore) query parameters in an URL, but once the user follows a link from a static page, that link obviously won't contain these query parameters, so the session ID, if any, will be lost. This is not a problem for these “modern” sites that hardly contain a single static page, but for sites made with Thalassa, which consist primarily of static pages (generated beforehand, not when a request is received), this makes URL-based session control effectively impossible.
So, the only thing left to us is a cookie. When a web-server sends a reponse to its client, it can add one or more cookies to the headers of the response. A cookie is effectively a pair of strings, first is the name, and the second is the value of the cookie. Sending another request to the same web site, the client (unless it is instructed otherwise by the user) adds to the request's header all cookies earlier set by this site. Actually, cookies are intended to make longer sessions out of otherwise-isolated requests.
Unfortunately, cookies often trigger some paranoia, and it has its reasons, as cookies can also be used for tracking and other bad things. It remains unclear though why all these people afraid of cookies but don't give a damn to JavaScript which is deadly dangerous, much much more dangerous and harmful than cookies. Anyway, there's no other option. At least Thalassa only sets a cookie when explicitly requested to do so by the user.
The thalassa
program itself is a static content generator, so
obviously neither the program itself, nor the content it generates, need
any sessions. Hence, whenever we talk about work sessions in Thalassa, it
must be clear that we only mean the CGI program, thalcgi.cgi
.
The CGI program can do some things without a session, too. The session
must be already established for any POST
request, if only it
is not the request that actually establishes a new session.
A GET
request will proceed normally without a session
unless the respective virtual page
is configured to require a
session. In case the session_required
parameter is set to
yes
for the requested page, the CGI program will not send the
page to the user; instead, it will send a
special page configured
with the [nocookiepage]
section.
The intention here is to display a webform on a page, and direct to the
same URL the POST
request that contains the data which the
user entered into the form. In such configuration it makes sense to only
display the form when the session is already established, otherwise a user
may waste some time filling the form just to get a “no session” error
right after pressing the submit button. Furthermore, it is an intended (by
design) scenario when a user follows a link that points to a webform, gets
the [nocookiepage]
special page instead of the form, reads
the text about the cookie, decides to set the cookie (and establish the
session), does so by solving the CAPTCHA test, and finally gets to the
desired webform because all this time the user stayed at the same virtual
location (that is, the URL in the browser's address line didn't change).
Establising a session requires to solve the CAPTCHA test; in the present version the CAPTCHA can't be disabled, and it's unlikely such an option will appear in the future. This is because technically every session is a file within the user database directory, so there's an obvious risk of a DoS attack; the CAPTCHA test reduces this risk, because the session is actually created after the test is successfully passed.
The good news for users is that, at least in the present version, establishing the working session is the only situation when the CAPTCHA is to be solved. Once the session is active, the user is assumed to be the same (human).
As we mentioned already, the [nocookiepage]
is a special page,
which means it has no dedicated path (URI); to see it, a user with no
active session must point the browser to any of the pages configured to
require a session. After that, if everything is configured properly, the
user receives the special page (with a CAPTCHA form), solves the CAPTCHA,
submits the solution, optionally (in case either the solution is not
correct, or something else went wrong) gets the
[retrycaptchapage]
, which contains an explanation of what went
wrong and another CAPTCHA picture, finally submits the correct solution,
and after that receives the page (s)he initially requested, as the
URL didn't change. But there's one thing to mention here: this is
the moment when the session actually starts to exist, and
together with the initially requested page, the user receives the cookie
that contains the session ID.
From the client's point of view, things look very simple: there's exactly
one cookie, named thalassa_sessid
(the name is hardcoded in
the present version, which is likely to change). The cookie has a value
that looks somewhat like PJBKANFHFJGBNJNM_PPIGKEIGKBOFHDKL
,
and some attentive users may even notice that the first part of the value
(the chars before the underscore) remains the same, while the second part
(after the underscore) changes every time when another page served by the
CGI is loaded by the browser.
Let's now explain something. First of all, if for any reason you get
confused by these characters: these latin letters are actually hexadecimal
digits, but instead of the traditional 0..9,A..F
, just the
first 16 letters from the Latin alphabet (A..P
) are used,
A
for zero, P
for 15. Well, actually this
doesn't really matter, it is only the way choosen for Thalassa CMS to
generate random identifier strings: first, the desired amount of random
bytes is acquired, and then these bytes are translated, each to two
letters, as we've just explained. Thalassa doesn't use the fact these
letters are hex digits, e.g., it never converts them back to numbers, they
are just random strings of a certain length, that's all.
What's more important is that the real session identifier is the
part of the cookie's value which doesn't change — that part of the
string which is before the underscore (PJBKANFHFJGBNJNM
in our
example). The part after the underscore is a so-called
token; Thalassa uses it for an additional check against session
identifier interception.
The active sessions are stored in the subdirectory named
_sessions
under the
user database directory, where
exactly one file is created for each session, and the session ID is used as
the file name.
Don't worry much. Before trying to access the file in any way, the CGI program checks whether it is acceptable. In the present version, this acceptability means the name only consists of uppercase latin letters, digits, the underscore char, and is exactly 16 chars long; so hardly it is possible to perform any injection here.
The session file is a text file containing
several
NAME = VALUE
pairs, like this:
token = PPIGKEIGKBOFHDKL oldtoken = ADHFMBOOBGDOIDMB created = 1674939409 expire = 1675256475
If the user logs in or at least tells his/her username without actually logging in (e.g., when requesting new passwords), more pairs get added to the session file, so it can look like this:
token = OMBBNIGHNPJKPOOF oldtoken = JGDGEIINIAPHKHAF created = 1674939409 expire = 1675256475 user = charlie logged_in = yes login_time = 1675038541
Furthermore, if the site is configured to allow anonymous comments, the user, while remaining not logged in and even nameless (in the sense of login name), still may use some arbitrary string as his/her “visible name” when posting a comment; this name is stored in the session file as an additional name-val pair, like this:
realname = Johnny the Great
It is more or less clear that date/time is stored as a “unix date” value
here. Expiration time is the last request time plus 72 hours, which is
hardcoded, too (yes. yes, it definitely will change in next versions). The
token
is well, the token, that is, the part of the cookie
value, to the right from the underscore — that part which changes.
And now the oldtoken
: it's the previous value of the token.
How exactly Thalassa checks the token validity is as follows: if
the token from the cookie is equal either to the token
or the
oldtoken
field of the session file. This allows the session
to remain active in case the script generated the content to be sent to the
client, but something gone wrong with the connection between the client and
the server (the CGI will never know about it).
The CAPTCHA implementation in Thalassa uses a simple cryptographic trick explained here, so the CGI doesn't need to store any information locally on the server until the CAPTCHA test is actually passed. What it does need is an arbitrary but not easily guessable string which is kept private (the secret). If unsure, issue a command like the following:
dd if=/dev/urandom bs=16 count=1 status=none | shasum -b
and use the sum as your secret.
Besides the secret string, there's one more parameter to set for the CAPTCHA subsystem: the timeout value. The user can't change the time value sent to the client inside the cookie response form, because otherwise the MD5 hash will not match, so the CGI can check if it took too long for the user to solve the captcha.
Both the secret and the timeout are configured with the
[captcha]
ini section, in which two parameters are recognized:
secret
for the secret string and expire
for the
timeout value, in seconds. For example:
[captcha] secret = 11e61c749de5944b74770b2206c2bfd97472c9d3 expire = 300
The timeout in this example is 5 minutes (300 seconds).
Suppose the user wants to establish a session, so (s)he solves the CAPTCHA and submits the solution. It is important to understant that the Thalassa CGI program detects this situation by seeing the following conditions are met at once:
It worth mentioning that actually the “form input”
named command
is not used anywhere else in the Thalassa CGI's
work, and it is never used with any other value. So the pair
command=setcookie
should be considered a kind of magic word to
detect this (very special) type of request.
Besides the command
input value, the following inputs are
expected for a setcookie request:
captcha_ip
— the IP address of the client, in the
most traditional decimal notation, like 198.51.100.173
;captcha_time
— the time when the CAPTCHA was issued,
in the form of a unix date represented as a decimal, like
1685300179
;captcha_nonce
— the randomly-generated single-use
number, also known as nonce, represented as a hexadecimal number,
like C34A30B848CC2FC1
;captcha_token
— the MD5 hash of the string built as
a concatenation (in some unspecified order) of the IP, time, the correct
CAPTCHA answer and the configured secret value; the hash is represented in
base64, not in hex as you might be used to;captcha_response
— the response to the CAPTCHA
challenge entered by the user.Of all these “inputs”, only the captcha_response
is intended
to be a real input field for the user to fill. All the others clearly
should be hidden inputs; indeed, hardly a user can fill any of them.
Instead, the CGI program must “fill” them when a CAPTCHA-containing page
is generated. All pages Thalassa CGI outputs (with the exception for a
default error page) are derived from templates set in the configuration
file, and this is true for the CAPTCHA pages, too; so one needs some means
to access the respective values, as well as the CAPTCHA image. And here the
%[captcha: ]
macro enters the game.
The %[captcha: ]
macro has several functions; when it is
expanded for the first time during the particular run of the Thalassa CGI
program, no matter which of its functions is used, the macro generates a
brand new CAPTCHA challenge, the correct response for it, memorizes the
current time and the client's IP address; after that, all functions
of the macro simply return the memorized values. The macro has five
functions:
%[captcha:image]
— the CAPTCHA challenge image, as a
PNG picture, encoded with base64;%[captcha:ip]
— the IP address of the client;%[captcha:time]
— the unix time value;%[captcha:nonce]
— the nonce value;%[captcha:token]
— the MD5 hash to be the value for
the captcha_token
form input.For the present version of Thalassa, a CAPTCHA form should appear on two
different pages, both
“special”:
[nocookiepage]
and [retrycaptchapage]
.
The content of these pages is (well, should be) different, so it is
recommended to prepare the CAPTCHA response form as a snippet within the
[html]
ini
section. For example, the following may work:
[html] cookieform = +<img alt="captcha" style="float:right;" + src="data:image/jpg;base64,%[captcha:image]" /> +<form name="captcha" action="%[req:script]%[req:path]" method="POST"> +<input type="hidden" name="captcha_ip" value=%[q:%[captcha:ip]] /> +<input type="hidden" name="captcha_time" value=%[q:%[captcha:time]] /> +<input type="hidden" name="captcha_nonce" value=%[q:%[captcha:nonce]] /> +<input type="hidden" name="captcha_token" value=%[q:%[captcha:token]] /> +<label for="captcha_input">Please enter the string made by swapping + letters as shown at the picture to the right. There are digits and + latin letters only, case is ignored:</label><br/> +<input type="text" id="captcha_input" name="captcha_response" /><br/> +<input type="hidden" name="command" value="setcookie" /> +<input type="submit" value="Set cookie and create session" /> +</form> +
The No Cookie special page is the content to be displayed to the user in case the user requested a page which is configured as requiring an active session, and there's no active session.
The page is configured with a [nocookiepage]
ini section which
is presently supposed to contain only one parameter, named
template
. The parameter's value is passwd through the
macroprocessor to build the actual HTML document to be sent to the user.
Certainly the No Cookie page must contain the CAPTCHA solution form. The rest is generally up to you, but perhaps the page should display a brief text explaining what's going on, stating that we're going to set a cookie, as well as that all the site's features — except for the interactive — should work without cookies, and that the cookie can be removed from the user's browser at any time.
The Retry Captcha special page is the content displayed to the user in case
the user just tried to solve the CAPTCHA (that is, submitted the CAPTCHA
form in a request that meets the
conditions — BTW, this implies that the Retry Captcha page can
only be displayed in response to a POST
request, which may be
important), but for any reason the solution is rejected. Actually
there are five different reasons why it can be rejected, and it seems a
good idea to tell the user what went wrong. So the
[retrycaptchapage]
ini section, which configures the Retry
Captcha page, is a bit more complicated than for the two other special
pages.
Just like other sections defining pages, this section has the
template
parameter, whose value is passed through the
macroprocessor to build the content to send to the user; only for this
parameter's value there's an additional content-specific macro,
%errmessage%
, which expands to a short piece of text
explaining what's the actual reason the CGI rejected the user's submission.
The error messages themselves are configurable, too; for that, there's the
second parameter in the same section, named errmessage
. The
parameter should be set separately for each of the five predefined
specifiers, pretty
self-explanatory: ip_mismatch
, expired
,
broken_data
, wrong_answer
and
unknown
. Setting the values for this parameter, you can
choose your own wording and, e.g. translate the messages to a language
other that English; for English, the following may make you a good start:
[retrycaptchapage] errmessage:ip_mismatch = <strong>ip address mismatch</strong>. It is possible you reconnected to the Internet between you received the captcha and submitted the answer. errmessage:expired = <strong>time is over</strong> (5 minutes). errmessage:broken_data = <strong>broken data in the request</strong>. The most likely it is a server side problem; if it repeats, please contact the site owner. errmessage:wrong_answer = <strong>wrong captcha answer</strong>. Please try again. errmessage:unknown = <strong>unknown reason</strong>. This must never happen in normal circumstances. Please report this to the site administration.
For errmessage:expired
, in case you set the
[captcha]/expire
parameter to something different from
300
, be sure to replace “5 munutes
” with
whatever explains the expiration period you choose.
Remember that the page must contain the CAPTCHA solution form; the rest is up to you, even explanations are not necessary as the user can only endup seeing this page after reading the No Cookie page.
%[sess: ]
macroThe session status (including absense of a session), as well as all data
associated with the session in case it exists, can be examined with the
%[sess: ]
macro. When the session is associated with a
user account (whether logged in or unverified), the macro provides access
to the user account's properties as well. Furthermore, the same macro is
used to access the moderation queue.
As usual, the macro accepts at least one argument which must be the
function name, and for some functions it accepts more arguments. We'll be
documenting the sess
macro functions along with the features
they are intended to be used with. As of now, we'll only discuss the
functions related to work sessions as such.
%[sess:cookie]
expands to the cookie value, if the session
(and the cookie value) exists, otherwise it returns an empty string. It is
important to note that this is the cookie's value going to be sent to the
client in the HTTP headers — which means, from the client's point of
view, that this value is the same as the new value of the cookie.
The old value of the cookie — the one which was sent to the
server along with the request for this particular page — can be taken
with %[req:cookie:thalassa_sessid]
.
The ifvalid
, ifhasuser
and
ifloggedin
functions are conditional checkers; they all take
two additional arguments: the then value and the else
value, and return the former in case the condition is true and the latter
if it is false. For ifvalid
, the condition is if there is an
established work session, so it can be used like this:
%[sess:ifvalid:The session is active:You've got no session]
The ifhasuser
's condition is if the session has an associated
user name, and for ifloggedin
the condition is if not only
there's an associated user name, but the user entered a correct password
thus passing the authentification — that is, the user's identity is
confirmed. If this doesn't seem to be clear, please wait for the
detailed description of user accounts and associated procedures.
Functions related to user accounts are described together with user accounts. Functions related to the moderation queue are described along with comments.