Contents:
[pageset ]
ini file sections[li: ]
macroA page set is basically a directory within your sources (the so called page set directory), in which you create a headed text file or a subdirectory containing such file, and on the next generator run the file becomes an HTML page within your site. Sometimes it may be important that no modifications to your configuration files (ini files) is needed to add another page to the set; e.g., such addition can easily be done programmatically, without human intervention at all.
If a page is represented by a directory (as opposite to file), the directory can contain additional files, such as images. These files will be published, much like a collection.
There are some limitations for this kind of Thalassa objects, the most notable are the following:
index.html
(this name can be changed, but the
overall structure can not);Set pages may have comment sections.
By default, if a set page's source is a directory, it is generated as a subdirectory of the target directory, and if the source is a single file, the page itself is generated as a file (without making directories), too. This behaviour can be changed; for any particular set it is possible to force all its pages to be generated as files right within the target directory, or to be generated as subdirectories of the target directory, no matter what type of source is used for each page.
Thalassa scans the directory which is set as a source for a page set, and
thus determines what are the set items. Directory entries with names
starting with “.
” (dot; the so-called hidden files) and
“_
” (underscore) are ignored during this scan.
Files with names starting with “_
” residing in the page set
source directory are used for various service purposes, primarily to form
lists of set items.
Entries named starting with anything else are expected to be regular files and directories; symbolic links are resolved during the file type examination (and furhter actions), so they can point at files and directories located elsewhere, if you really need so. Files of all other types (devices, FIFOs and sockets) are silently ignored; regular files and directories found in the set source directory (with appropriate names) become list items, and their names are taken as item IDs.
In case a particular item in your page set is represented with a directory,
that directory must contain a file named content.txt
; there's no
way to change the name, in the present version it is hardcoded. This file
will play the role of (the real) source for your page.
Suppose, for example, that your source tree is located at
/home/lizzie/mysite/
, and you create a page set with source
directory named node
, so its full path will be
/home/lizzie/mysite/node
. If, after that, you want your set
to contain an item named foobar
, then you've got two
possibilities:
/home/lizzie/mysite/node/foobar
(and that's all);/home/lizzie/mysite/node/foobar/
, and inside that directory,
create a headed text file named content.txt
, so its full path
will be /home/lizzie/mysite/node/foobar/content.txt
. In this
case you can add more files (e.g., images, or whatever else you want) to
the directory along with the content.txt
, and these files will
be “published” into the same place within your site's tree, where the
generated page will reside.There's one more thing you should know: file names starting with
“_
” are not only ignored in the page set source directory,
they are also ignored within item directories (where that
content.txt
sits), so if you place such a file there, it will
not get published.
In the present version of Thalassa this has no particular meaning, but files with such names may start playing some special roles later. As of now, think of them as “reserved”.
The name of the file or directory is taken as the set item ID
. The same ID must be given as the value for theid:
header field in the item file; it is unspecified what will
happen in case they don't match (that is, even if in particular version of
Thalassa nothing happens, this can get changed in future versions).
The set item ID is used in many situations to refer to the particular item,
and the name of whatever is generated (either an HTML file, or a directory
within your site's tree) is also derived from the ID somehow. Please note
that, unless you intentionally want the dot “.
” to be in the
ID (and chances are that you don't), you shouldn't use it in the name of
the source file/directory. For directories this in most cases doesn't make
problems as people are not generally used to add “extensions” to
directory names; however, for no real reason a lot of people tend to give
“extensions” to plain files. Folks, please note one thing: in
Unix systems, there's no such thing as “file name extension”,
and even if you name your file like mypage.txt
or
mypage.src
, these .txt
and .src
ain't no damn “extensions”, they are just parts of names with no special
meaning of any kind.
Well, yes, you can name the source file of your page set item something
like mypage.txt
, but then the ID of the item will be
mypage.txt
, NOT mypage
, as many
people, for whatever odd reasons, expect. So it is
mypage.txt
, not mypage
, what you need to put in
your id:
field. And the page Thalassa generates for you will
likely have name mypage.txt.html
, not mypage.html
as you could hope.
So, at least when you create items in a page set, and you want a single
regular file, not a directory, to be the source of it, forget about
extensions. No matter if it is a file or a directory, name it
just mypage
(or how you like), but not
mypage.XXX
, where XXX is whatever extension
comes to your mind.
To better understand the rest of this documentation page you should recall that set pages are capable of having comment sections. Comments, as well as generation of sections displaying them, will be discussed on the dedicated page; however, there's one basic property of comment sections we have to mention right now, otherwise it will be hard to get what the most of pageset-related configuration parameters are all about. So, keep in mind there may be more than one comment section for a single published item (be it an item of a page set, or a list page item) in case there are more comments than it is desirable to display on one page.
For example, suppose you write another entry of your blog, and it makes your readers so much interested that they leave 520 comments, while you configured Thalassa to display only 100 comments per page. Actually, 6 pages will have to be generated for your single blog entry in this case: first 5 of them displaying 100 comments each, and the last displaying 20 comments.
What Thalassa does in this situation is repeating your item's content, be it a blog entry or whatever else, on each of the pages generated for that item. So, in the simplest possible case, the first page will contain your entry and the first 100 comments, the second page will contain your entry again and the comments from 101 through 200, etc. It is possible to tell Thalassa to display the comments in reverse order, newer first, and their placement on pages will differ, but the idea is always the same: a number of pages to be generated for a single source item, the item's content is displayed on every page, and each page has its own range of comments.
For the purpose of this, pageset configuration had to be made a bit
complicated, and it might look confusing on the first glance. First of
all, a template used for page generation is broken down to two parts, named
page_template
and page_tail_template
. Every
generated page consists of three parts: the result of macroprocessing for
page_template
, the comment section and the result of
macroprocessing for page_tail_template
. These three parts are
simply concatenated to make the whole page's content.
The second question is how to name the generated files. This is a bit
tricky and depends on whether the item being generated has its own
directory. For an entry named foobar
from our example above,
there are two possibilities. In case the item doesn't have its own
directory, perhaps (relatively to the target directory of the whole set)
the first (main) file will be named just foobar.html
, and
additional files (containing pages for the same entry, but dfferent
portions of comments) will be foobar_2.html
,
foobar_3.html
, ..., foobar_5.html
. In case of a
separate subdirectory, the main file will (by default) be
foobar/index.html
, and the additional pages will get names
foobar/c2.html
, foobar/c3.html
, etc. In both
cases the main page is considered to be Number One, but the number is
hidden (this changes slightly when the comments are placed in reverse
order; this will be discussed along with
comment sections).
One
more thing to keep in mind is that sometimes programs need to determine on which page a particular comment is located. Thethalassa
program during the process of generation has sufficient information to
deduce this, but the CGI program sometimes needs to know this too, and it
has no access to the site sources. Hence Thalassa generates an additional
file for every item that turned into several pages because of comments; the
file is known as a comment map.
Knowing all this, it must be easier to understand what some configuration parameters are for.
[pageset ]
ini file sectionsA page set is configured with an ini file section that belongs to
the pageset
group, e.g.
[pageset node]
The name of the section (node
in this example) identifies the
set as a whole. Both page set source directory and its target directory
default to the pageset name, but this can be overriden by setting
respective parameters. However, even if you don't use these defaults, and
even if you only have one page set, be sure to give it a good name because
it will likely appear in many other places of your configuration file.
During the generation period of a set page, Thalassa makes one special
macro available, which gives access to properties of the page being
generated, such as its ID, title, text (body), information from the header
and so
on. The macro is named %[li: ]
, for list item;
it is named so because exactly the same macro is used in generation of
list items. The macro will be
discussed later, and as of now, just keep in mind
that
%[li:id]
is the ID of the page being generated;%[li:text]
is the page's text (well, the body)
after applying all transformations related to
encoding and
format;Now we can discuss parameters that may appear within a
[pageset ]
section.
The actual content of generated pages is controlled by two parameters:
page_template
and page_tail_template
.
Their values are passed through the macroprocessor; then, the generator
concatenates the processing result for page_template
, the
comment section (if it exists for the page being generated) and the
processing result of page_tail_template
. The
“%[li: ]
” macro must be used to access the content of a
particular page; at the very least, you hardly can go without
%[li:text]
.
You can use different page templates in the same page set. To do so, add
the “type:
” header to source files of your pages. For
example, you can use
type: blog
for blog entries,
type: news
for news articles, etc. Then, these type identifiers (blog
and news
in the example, or anything else you want) may be
used as parameter
specifiers for both page_template
and
page_tail_template
. For example, you can use
page_template:blog
for your blog entries and
page_template
with no specifier for everything else.
The sourcedir
parameter sets the path for page set source
directory; as usual, the path may be either absolute or relative, and
“relative” means relative to Thalassa working directory. This parameter
defaults to the pageset ID, so if the ID is node
, as in our
example above, the sourcedir
parameter is not explicitly
specified (that is, omitted) and you run thalassa
in
/home/lizzie/mysite/
, then it takes items for your set from
/home/lizzie/mysite/node/
.
The setdirname
parameter tells thalassa
where to
place the items it generates. The value is taken relative to your site's
tree root, even if it begins with a slash “/
”. Just like
the previous parameter, setdirname
defaults
to the pageset ID.
The make_subdirs
parameter controls whether to make
subdirectories within the setdirname
directory for each item.
The recognized values are always
, never
and
bysource
, which is the default. In case parameter is omitted,
empty or its value is not recognized, it is taken as if it was
bysource
. The value is case-insensitive; always
means to make subdirectories for all items, even for items whose source is
a single file; never
means not to make any subdirs even for
items whose source is a directory with content.txt
(and
possibly other files) in it, and the default bysource
means to
create subdirs only for items whose source is a directory.
For items generated as a file, without a subdirectory
, thepagefilename
parameter sets the file name template. The
template should usually contain %[li:id]
macro call, so that
the name of the file contains the item ID, but this is not enforced in any
way, and it is possible (at least in theory) to use other techniques to
make the files have different names. Besides all macros defined in
thalassa
, the pagefilename
parameter can use the
index macros
(%idx%
, %_idx%
and %idx0%
),
which reflex index of a comment
page (generated for the same item). The “main” page always has special
index value — zero, even if it is considered “invisible number
one”.
Parameter defaults to “%[li:id]%[_idx].html
”; you can safely
omit it in most cases.
For items generated with their own subdirectories
, thepagefilename
parameter is ignored. Instead, three other
parameters are used:
pagedirname
sets the name for the subdirectory to be
created for the item; default is “%[li:id]
”;indexfilename
sets the name for the HTML file created
inside the directory to represent the item; default is
“index.html
” — please note this is the name of file
displayed by Apache by default when directory URI is requested;compagename
is a template for names of additional HTML
files, created because of multiple comment sections; index macros work
within the value of this parameter; default is
“c%idx%.html
”.The comments
parameter sets the comment sections style, the
path to the comment tree (the part of the “database” that holds comments
for the particular page) and some additional parameters available
in comment-related templates by macro calls. This parameter is closely
discussed along with comment sections.
By default this parameter's value is empty, which means no comment section
to be generated.
The commentmap
parameter is a template for
comment map file names. The name is relative
to your rootdir (the root directory of your site's tree). As of present
version, the default is empty, which means not to create map files (for a
given page set) at all. However, please note some important functions of
the CGI program will not work without maps, in case you use comment styles
that involve a maximum of comments per page (and multiple pages in case
there are more comments). Something like
“node/.__%[li:id].map
” will work; you can also make a
dedicated directory for all your maps, even outside of your tree (because
the value may start with “../
” and there's no limitation for
this), although comment maps don't hold any sensitive information and can
perhaps be safely left inside your web tree.
If you want the map file names to be constructed differently for the pages
generated as a directory and the pages generated without directories, you
can specify commentmap:nodir
separately, and it will be used
for pages without directories, while the value of commentmap
will in this case be used for separate-dir pages only.
Besides all that, [pageset ]
sections can contain all
parameters responsible for
publishing methods,
that is, publish_method
, publish_symlinks
,
publish_hidden
, publish_recursive
and
chmod
. They control how additional files (for the set
items sourced as directories) are getting into your web tree. Follow the link
for the description.
Since every item of a page set is initially represented as a headed text file, it is important to know what header fields are recognized by Thalassa and how they influence on the process of generation.
As it is mentioned in the general
description of headed files used in Thalassa, three fields are used by the
parser internally and are not accessible outside. These are
id
, encoding
and format
.
For page set items, the fields unixtime
, type
,
flags
, comments
, teaser_len
,
descr
, date
, title
and
tags
are recognized by Thalassa, which means their processing
is special in at least some sense.
First of all, the fields unixtime
, type
,
flags
, comments
and teaser_len
don't
get processed by format and/or encoding filters. Contrary to this, the
descr
field gets full processing, both for encoding and
format, just like the body of the page. All the other fields, including
those not listed, only get processed for encoding, but not for the format.
Please remember that in case a field is entered by a user, and you
display it on your pages, it is necessary to pass its contents through a
filter that strips off all tags and &-entities to avoid HTML
injections. Thalassa won't do that for you authomatically,
because it doesn't know if a particular field is entered by third parties,
nor if you're going to display it on a HTML page; but it provides
the %[ltgt: ]
macro for this purpose, so be sure to use it.
The unixtime
field, if it is present, must be
an integer representing a date (presumably, the date when the page was
created), as the well-known Unix Time value (the number of seconds since
Jan 01, 1970).
The type
field, if it is there, should contain an
identifier used as a specifier for your
page_template
and page_tail_template
. We'd like
to repeat it one more time that no encoding transformation is applied to
this field, and Thalassa itself is mostly codepage-agnostic, so you'd
better not try using any non-ASCII chars (and, even further, no chars other
than lowercase latin letters, digits and the underscore) in this field.
Those familiar with Drupal might notice this field is
designed to handle these Drupal's node types. If you never had to
deal with Drupal, you're, first of all, lucky, and, second, you don't want
to know more about the type
field.
The flags
field is designed to contain a
comma-separated list of identifiers, of which, in the present version, only
one is used by Thalassa itself. This flag is hidden
, and it
makes Thalssa simply skip generation of the page. So, if your source
file's header contains
flags: hidden
or even something like
flags: abra, hidden, cadabra, schwabra
then your page will not be generated, as if it didn't exist at all.
The comments
field determines whether comments are available
for this page. Possible values are enabled
disabled
and readonly
, and disabled
is the default; actually, any other value is silently considered equal to
disabled
. The thalassa
program only tries to
generate the comment section in case this field contains either
enabled
or readonly
, otherwise it concatenates
the two parts of the page template inserting nothing between them, even if
your “database” contains comments for this page. As of the CGI program,
if it is configured properly, it will only allow comments for pages that
have enabled
in this field (and if it is not properly
configured, then chances are it will not allow comments at all, but your
mileage may vary).
The teaser_len
and
descr
fields are both meant to
provide a shorter version of the page to be included in lists, be it named
a description or teaser. It is a bit tricky to explain
how exactly they work. In case the descr
field is given and
isn't empty, the teaser_len
field is completely ignored.
Otherwise Thalassa considers the first N bytes (that's it! not
characters, but bytes, so utf8 lovers can have a trouble here... well, in
case they try to compute the value manually, which it was never intended
for) of the body
to be the “description” (or “teaser”, no matter), where
N is the integral number given as the value for
teaser_len
. If this number happens to be greater than the
body's length, then the whole body is taken as the description. Some
obvious troubles can arise in case some HTML tags open within the
“teaser” but get closed after its end.
The date
field should contain the date
(presumably, again, the date when the page was created) in some
human-readable format. In case it is present, it will be displayed as the
date; otherwise, Thalassa will somehow convert the
unixtime
field to a human-readable
form. Actually, in the present version the rfc2822 date format is used
(see the rfcdate macro for
explanation), and it's even impossible to set the timezone, it is always
UTC. This is very likely to change.
The title
field is just that: a title. If some
of you prefer the term “subject”, we're really glad for you.
The tags
field contains a comma-separated list
of, well, tags. No special handling is done for them, but there's a macro
you can use to determine if a particular tag is there or not.
%[li: ]
macroAs already mentioned above, li
stands for list item,
which is because it is used with lists, too, and,
even more, sometimes page set items serve as list items theirselves. The
li
macro is available during the time period of a
particular set item generation, which means you can use it in your
page_template
and page_tail_template
parameters,
as well as in all things “called” from them, like snippets defined in the
[html]
configuration section.
The macro accepts at least one argument, and often more. The first
argument determines what you actually want from the macro; we'll call these
names “functions”; please note we already know two of the functions:
id
and text
. Let's repeat that
id
returns the set item ID (hereinafter, “returns”
effectively means that the macro call %[li:id]
or the like is
replaced with the ID), and text
returns your
item's body. Until the end of this section, we'll use phrases like
“the mumbo
function returns jumbo” in the sense that
macro call %[li:mumbo: ]
(possibly with more than one
argument) is macro-expanded to a thing explained by “jumbo”.
Besides the two, the most obvious functions are (likely):
title
— returns the title, as it is set by the
title
header field; are you surprized?
unixtime
returns the
unixtime
field, if it exists and is a
valid number, oterwise returns an empty string;
date
returns the date in human-readable form, either as
defined by the date
field, or derived
from the unixtime
field;
descr
returns the shorter version of the page text, as set
by teaser_len
and
descr fields;
tags
returns a comma-separated list of tags set by the
tags field; this list may differ from the actual
field's contents as a string, because internally the tags are
stored separately, and the whole string is reconstructed when requested by
the macro;
hf
(for header field) takes a name of a header
field as an additional agrument, and returns its value — that is,
%[li:hf:NAME]
returns the value of the header named
NAME; please note this only works for header fields not recognized
by Thalassa itself, which means you can't get the value of your title or
unixtime this way.
Several functions of the li
macro allow to check certain
condition and choose one of two strings depending on it. They may accept
one or more arguments for the condition itself (but most of them don't, as
the condition is fully defined by the function itself), and two more
arguments as the alternatives to return (i.e., “then” and “else”
alternative). For example, ifcomenabled
checks whether
comments are enabled or not on the page, which means the condition is true
if (and only if) the comments
field contains the word
enable
. So,
%[li:ifcomenabled:%[html:cmt_form]:<em>comments disabled</em>]
on pages where comments are enabled, will expand to whatever
%[html:cmt_form]
expands to, while on pages where comments are
not enabled (that is, either disabled or readonly), it will expand to
the string “<em>comments disabled</em>
”.
WARNING! Be sure to read the section devoted to eager computational model and its consequences.
In this particular example, the%[html:cmt_form]
call will be
expanded in any case, that is, even if comments aren't enabled, it will
expand anyway, but the result of the expansion will be dropped. Sometimes
this isn't a problem, but it is necessary to understand what actually
happens, or else you'll run into problems, sooner or later.
For the particular example, the obvious solution will be to write
%[html:%[li:ifcomenabled:cmt_form:com_disabled]]
and to define the snippet named com_disabled
in your
[html]
configuration section like this:
[html] com_disabled = <em>comments disabled</em> cmt_form = ... ...
The iffile
macro checks if the set item directory contains a
file with the given name among the files to be published. It takes three
arguments, for the file name, “then” and “else”. For example:
%[li:iffile:photo.png: <img src="photo.png" alt="the photo"/> : <strong>No photo</strong> ]
The li
macro also supports some functions intended to be used
with lists; some of them may (in rare curcumstances) be useful in sets that
don't act as source for any lists, but it would be hard to explain them
without the lists-related background anyway. These functions are:
prev
,
next
,
ifprev
,
ifnext
,
iflong
,
ifmore
,
listarraynum
,
iflistarraynum
,
listidx
,
iflistidx
. We'll discuss them along
with the lists.
If the second argument doesn't match any functions known to the macro, it
expands to [li:ARG?!]
, where ARG it the
second parameter's value.