Page sets

Contents:

What are page sets
Page set source directory and item IDs
User comments on page set items
The [pageset ] ini file sections
Header fields having special meaning
The [li: ] macro

What are page sets

A page set is basically a directory within your sources (the so called page set directory), in which you create a headed text file or a subdirectory containing such file, and on the next generator run the file becomes an HTML page within your site. Sometimes it may be important that no modifications to your configuration files (ini files) is needed to add another page to the set; e.g., such addition can easily be done programmatically, without human intervention at all.

If a page is represented by a directory (as opposite to file), the directory can contain additional files, such as images. These files will be published, much like a collection.

There are some limitations for this kind of Thalassa objects, the most notable are the following:

pages of the same set are always created in the same directory of your site's tree; each page is either created as an HTML file or as a directory containing index.html (this name can be changed, but the overall structure can not);
from Thalassa's point of view, page sets as such are unordered, so there's no possibility to enumerate members of a set in predictable manner; building lists of set pages involves additional files and configuration objects (well, lists of the “set-based source” kind).

Set pages may have comment sections.

By default, if a set page's source is a directory, it is generated as a subdirectory of the target directory, and if the source is a single file, the page itself is generated as a file (without making directories), too. This behaviour can be changed; for any particular set it is possible to force all its pages to be generated as files right within the target directory, or to be generated as subdirectories of the target directory, no matter what type of source is used for each page.

Page set source directory and item IDs

Thalassa scans the directory which is set as a source for a page set, and thus determines what are the set items. Directory entries with names starting with “.” (dot; the so-called hidden files) and “_” (underscore) are ignored during this scan.

Files with names starting with “_” residing in the page set source directory are used for various service purposes, primarily to form lists of set items.

Entries named starting with anything else are expected to be regular files and directories; symbolic links are resolved during the file type examination (and furhter actions), so they can point at files and directories located elsewhere, if you really need so. Files of all other types (devices, FIFOs and sockets) are silently ignored; regular files and directories found in the set source directory (with appropriate names) become list items, and their names are taken as item IDs.

In case a particular item in your page set is represented with a directory, that directory must contain a file named content.txt; there's no way to change the name, in the present version it is hardcoded. This file will play the role of (the real) source for your page.

Suppose, for example, that your source tree is located at /home/lizzie/mysite/, and you create a page set with source directory named node, so its full path will be /home/lizzie/mysite/node. If, after that, you want your set to contain an item named foobar, then you've got two possibilities:

either you create a headed text file with full path /home/lizzie/mysite/node/foobar (and that's all);
or, you make a directory with full path /home/lizzie/mysite/node/foobar/, and inside that directory, create a headed text file named content.txt, so its full path will be /home/lizzie/mysite/node/foobar/content.txt. In this case you can add more files (e.g., images, or whatever else you want) to the directory along with the content.txt, and these files will be “published” into the same place within your site's tree, where the generated page will reside.

There's one more thing you should know: file names starting with “_” are not only ignored in the page set source directory, they are also ignored within item directories (where that content.txt sits), so if you place such a file there, it will not get published.

In the present version of Thalassa this has no particular meaning, but files with such names may start playing some special roles later. As of now, think of them as “reserved”.

The name of the file or directory is taken as the set item ID. The same ID must be given as the value for the id: header field in the item file; it is unspecified what will happen in case they don't match (that is, even if in particular version of Thalassa nothing happens, this can get changed in future versions).

The set item ID is used in many situations to refer to the particular item, and the name of whatever is generated (either an HTML file, or a directory within your site's tree) is also derived from the ID somehow. Please note that, unless you intentionally want the dot “.” to be in the ID (and chances are that you don't), you shouldn't use it in the name of the source file/directory. For directories this in most cases doesn't make problems as people are not generally used to add “extensions” to directory names; however, for no real reason a lot of people tend to give “extensions” to plain files. Folks, please note one thing: in Unix systems, there's no such thing as “file name extension”, and even if you name your file like mypage.txt or mypage.src, these .txt and .src ain't no damn “extensions”, they are just parts of names with no special meaning of any kind.

Well, yes, you can name the source file of your page set item something like mypage.txt, but then the ID of the item will be mypage.txt, NOT mypage, as many people, for whatever odd reasons, expect. So it is mypage.txt, not mypage, what you need to put in your id: field. And the page Thalassa generates for you will likely have name mypage.txt.html, not mypage.html as you could hope.

So, at least when you create items in a page set, and you want a single regular file, not a directory, to be the source of it, forget about extensions. No matter if it is a file or a directory, name it just mypage (or how you like), but not mypage.XXX, where XXX is whatever extension comes to your mind.

User comments on page set items

To better understand the rest of this documentation page you should recall that set pages are capable of having comment sections. Comments, as well as generation of sections displaying them, will be discussed on the dedicated page; however, there's one basic property of comment sections we have to mention right now, otherwise it will be hard to get what the most of pageset-related configuration parameters are all about. So, keep in mind there may be more than one comment section for a single published item (be it an item of a page set, or a list page item) in case there are more comments than it is desirable to display on one page.

For example, suppose you write another entry of your blog, and it makes your readers so much interested that they leave 520 comments, while you configured Thalassa to display only 100 comments per page. Actually, 6 pages will have to be generated for your single blog entry in this case: first 5 of them displaying 100 comments each, and the last displaying 20 comments.

What Thalassa does in this situation is repeating your item's content, be it a blog entry or whatever else, on each of the pages generated for that item. So, in the simplest possible case, the first page will contain your entry and the first 100 comments, the second page will contain your entry again and the comments from 101 through 200, etc. It is possible to tell Thalassa to display the comments in reverse order, newer first, and their placement on pages will differ, but the idea is always the same: a number of pages to be generated for a single source item, the item's content is displayed on every page, and each page has its own range of comments.

For the purpose of this, pageset configuration had to be made a bit complicated, and it might look confusing on the first glance. First of all, a template used for page generation is broken down to two parts, named page_template and page_tail_template. Every generated page consists of three parts: the result of macroprocessing for page_template, the comment section and the result of macroprocessing for page_tail_template. These three parts are simply concatenated to make the whole page's content.

The second question is how to name the generated files. This is a bit tricky and depends on whether the item being generated has its own directory. For an entry named foobar from our example above, there are two possibilities. In case the item doesn't have its own directory, perhaps (relatively to the target directory of the whole set) the first (main) file will be named just foobar.html, and additional files (containing pages for the same entry, but dfferent portions of comments) will be foobar_2.html, foobar_3.html, ..., foobar_5.html. In case of a separate subdirectory, the main file will (by default) be foobar/index.html, and the additional pages will get names foobar/c2.html, foobar/c3.html, etc. In both cases the main page is considered to be Number One, but the number is hidden (this changes slightly when the comments are placed in reverse order; this will be discussed along with comment sections).

One more thing to keep in mind is that sometimes programs need to determine on which page a particular comment is located. The thalassa program during the process of generation has sufficient information to deduce this, but the CGI program sometimes needs to know this too, and it has no access to the site sources. Hence Thalassa generates an additional file for every item that turned into several pages because of comments; the file is known as a comment map.

Knowing all this, it must be easier to understand what some configuration parameters are for.

The `[pageset ]` ini file sections

A page set is configured with an ini file section that belongs to the pageset group, e.g.

  [pageset node]

The name of the section (node in this example) identifies the set as a whole. Both page set source directory and its target directory default to the pageset name, but this can be overriden by setting respective parameters. However, even if you don't use these defaults, and even if you only have one page set, be sure to give it a good name because it will likely appear in many other places of your configuration file.

During the generation period of a set page, Thalassa makes one special macro available, which gives access to properties of the page being generated, such as its ID, title, text (body), information from the header and so on. The macro is named %[li: ], for list item; it is named so because exactly the same macro is used in generation of list items. The macro will be discussed later, and as of now, just keep in mind that

%[li:id] is the ID of the page being generated;
%[li:text] is the page's text (well, the body) after applying all transformations related to encoding and format;

Now we can discuss parameters that may appear within a [pageset ] section.

The actual content of generated pages is controlled by two parameters: page_template and page_tail_template. Their values are passed through the macroprocessor; then, the generator concatenates the processing result for page_template, the comment section (if it exists for the page being generated) and the processing result of page_tail_template. The “%[li: ]” macro must be used to access the content of a particular page; at the very least, you hardly can go without %[li:text].

You can use different page templates in the same page set. To do so, add the “type: ” header to source files of your pages. For example, you can use

  type: blog

for blog entries,

  type: news

for news articles, etc. Then, these type identifiers (blog and news in the example, or anything else you want) may be used as parameter specifiers for both page_template and page_tail_template. For example, you can use page_template:blog for your blog entries and page_template with no specifier for everything else.

The sourcedir parameter sets the path for page set source directory; as usual, the path may be either absolute or relative, and “relative” means relative to Thalassa working directory. This parameter defaults to the pageset ID, so if the ID is node, as in our example above, the sourcedir parameter is not explicitly specified (that is, omitted) and you run thalassa in /home/lizzie/mysite/, then it takes items for your set from /home/lizzie/mysite/node/.

The setdirname parameter tells thalassa where to place the items it generates. The value is taken relative to your site's tree root, even if it begins with a slash “/”. Just like the previous parameter, setdirname defaults to the pageset ID.

The make_subdirs parameter controls whether to make subdirectories within the setdirname directory for each item. The recognized values are always, never and bysource, which is the default. In case parameter is omitted, empty or its value is not recognized, it is taken as if it was bysource. The value is case-insensitive; always means to make subdirectories for all items, even for items whose source is a single file; never means not to make any subdirs even for items whose source is a directory with content.txt (and possibly other files) in it, and the default bysource means to create subdirs only for items whose source is a directory.

For items generated as a file, without a subdirectory, the pagefilename parameter sets the file name template. The template should usually contain %[li:id] macro call, so that the name of the file contains the item ID, but this is not enforced in any way, and it is possible (at least in theory) to use other techniques to make the files have different names. Besides all macros defined in thalassa, the pagefilename parameter can use the index macros (%idx%, %_idx% and %idx0%), which reflex index of a comment page (generated for the same item). The “main” page always has special index value — zero, even if it is considered “invisible number one”.

Parameter defaults to “%[li:id]%[_idx].html”; you can safely omit it in most cases.

For items generated with their own subdirectories, the pagefilename parameter is ignored. Instead, three other parameters are used:

pagedirname sets the name for the subdirectory to be created for the item; default is “%[li:id]”;
indexfilename sets the name for the HTML file created inside the directory to represent the item; default is “index.html” — please note this is the name of file displayed by Apache by default when directory URI is requested;
compagename is a template for names of additional HTML files, created because of multiple comment sections; index macros work within the value of this parameter; default is “c%idx%.html”.

The comments parameter sets the comment sections style, the path to the comment tree (the part of the “database” that holds comments for the particular page) and some additional parameters available in comment-related templates by macro calls. This parameter is closely discussed along with comment sections. By default this parameter's value is empty, which means no comment section to be generated.

The commentmap parameter is a template for comment map file names. The name is relative to your rootdir (the root directory of your site's tree). As of present version, the default is empty, which means not to create map files (for a given page set) at all. However, please note some important functions of the CGI program will not work without maps, in case you use comment styles that involve a maximum of comments per page (and multiple pages in case there are more comments). Something like “node/.__%[li:id].map” will work; you can also make a dedicated directory for all your maps, even outside of your tree (because the value may start with “../” and there's no limitation for this), although comment maps don't hold any sensitive information and can perhaps be safely left inside your web tree.

If you want the map file names to be constructed differently for the pages generated as a directory and the pages generated without directories, you can specify commentmap:nodir separately, and it will be used for pages without directories, while the value of commentmap will in this case be used for separate-dir pages only.

Besides all that, [pageset ] sections can contain all parameters responsible for publishing methods, that is, publish_method, publish_symlinks, publish_hidden, publish_recursive and chmod. They control how additional files (for the set items sourced as directories) are getting into your web tree. Follow the link for the description.

Header fields having special meaning

Since every item of a page set is initially represented as a headed text file, it is important to know what header fields are recognized by Thalassa and how they influence on the process of generation.

As it is mentioned in the general description of headed files used in Thalassa, three fields are used by the parser internally and are not accessible outside. These are id, encoding and format.

For page set items, the fields unixtime, type, flags, comments, teaser_len, descr, date, title and tags are recognized by Thalassa, which means their processing is special in at least some sense.

First of all, the fields unixtime, type, flags, comments and teaser_len don't get processed by format and/or encoding filters. Contrary to this, the descr field gets full processing, both for encoding and format, just like the body of the page. All the other fields, including those not listed, only get processed for encoding, but not for the format.

Please remember that in case a field is entered by a user, and you display it on your pages, it is necessary to pass its contents through a filter that strips off all tags and &-entities to avoid HTML injections. Thalassa won't do that for you authomatically, because it doesn't know if a particular field is entered by third parties, nor if you're going to display it on a HTML page; but it provides the %[ltgt: ] macro for this purpose, so be sure to use it.

The unixtime field, if it is present, must be an integer representing a date (presumably, the date when the page was created), as the well-known Unix Time value (the number of seconds since Jan 01, 1970).

The type field, if it is there, should contain an identifier used as a specifier for your page_template and page_tail_template. We'd like to repeat it one more time that no encoding transformation is applied to this field, and Thalassa itself is mostly codepage-agnostic, so you'd better not try using any non-ASCII chars (and, even further, no chars other than lowercase latin letters, digits and the underscore) in this field.

Those familiar with Drupal might notice this field is designed to handle these Drupal's node types. If you never had to deal with Drupal, you're, first of all, lucky, and, second, you don't want to know more about the type field.

The flags field is designed to contain a comma-separated list of identifiers, of which, in the present version, only one is used by Thalassa itself. This flag is hidden, and it makes Thalssa simply skip generation of the page. So, if your source file's header contains

  flags: hidden

or even something like

  flags: abra, hidden, cadabra, schwabra

then your page will not be generated, as if it didn't exist at all.

The comments field determines whether comments are available for this page. Possible values are enabled disabled and readonly, and disabled is the default; actually, any other value is silently considered equal to disabled. The thalassa program only tries to generate the comment section in case this field contains either enabled or readonly, otherwise it concatenates the two parts of the page template inserting nothing between them, even if your “database” contains comments for this page. As of the CGI program, if it is configured properly, it will only allow comments for pages that have enabled in this field (and if it is not properly configured, then chances are it will not allow comments at all, but your mileage may vary).

The teaser_len and descr fields are both meant to provide a shorter version of the page to be included in lists, be it named a description or teaser. It is a bit tricky to explain how exactly they work. In case the descr field is given and isn't empty, the teaser_len field is completely ignored. Otherwise Thalassa considers the first N bytes (that's it! not characters, but bytes, so utf8 lovers can have a trouble here... well, in case they try to compute the value manually, which it was never intended for) of the body to be the “description” (or “teaser”, no matter), where N is the integral number given as the value for teaser_len. If this number happens to be greater than the body's length, then the whole body is taken as the description. Some obvious troubles can arise in case some HTML tags open within the “teaser” but get closed after its end.

The date field should contain the date (presumably, again, the date when the page was created) in some human-readable format. In case it is present, it will be displayed as the date; otherwise, Thalassa will somehow convert the unixtime field to a human-readable form. Actually, in the present version the rfc2822 date format is used (see the rfcdate macro for explanation), and it's even impossible to set the timezone, it is always UTC. This is very likely to change.

The title field is just that: a title. If some of you prefer the term “subject”, we're really glad for you.

The tags field contains a comma-separated list of, well, tags. No special handling is done for them, but there's a macro you can use to determine if a particular tag is there or not.

The `%[li: ]` macro

As already mentioned above, li stands for list item, which is because it is used with lists, too, and, even more, sometimes page set items serve as list items theirselves. The li macro is available during the time period of a particular set item generation, which means you can use it in your page_template and page_tail_template parameters, as well as in all things “called” from them, like snippets defined in the [html] configuration section.

The macro accepts at least one argument, and often more. The first argument determines what you actually want from the macro; we'll call these names “functions”; please note we already know two of the functions: id and text. Let's repeat that id returns the set item ID (hereinafter, “returns” effectively means that the macro call %[li:id] or the like is replaced with the ID), and text returns your item's body. Until the end of this section, we'll use phrases like “the mumbo function returns jumbo” in the sense that macro call %[li:mumbo: ] (possibly with more than one argument) is macro-expanded to a thing explained by “jumbo”.

Besides the two, the most obvious functions are (likely):

title — returns the title, as it is set by the title header field; are you surprized?
unixtime returns the unixtime field, if it exists and is a valid number, oterwise returns an empty string;
date returns the date in human-readable form, either as defined by the date field, or derived from the unixtime field;
descr returns the shorter version of the page text, as set by teaser_len and descr fields;
tags returns a comma-separated list of tags set by the tags field; this list may differ from the actual field's contents as a string, because internally the tags are stored separately, and the whole string is reconstructed when requested by the macro;
hf (for header field) takes a name of a header field as an additional agrument, and returns its value — that is, %[li:hf:NAME] returns the value of the header named NAME; please note this only works for header fields not recognized by Thalassa itself, which means you can't get the value of your title or unixtime this way.

Several functions of the li macro allow to check certain condition and choose one of two strings depending on it. They may accept one or more arguments for the condition itself (but most of them don't, as the condition is fully defined by the function itself), and two more arguments as the alternatives to return (i.e., “then” and “else” alternative). For example, ifcomenabled checks whether comments are enabled or not on the page, which means the condition is true if (and only if) the comments field contains the word enable. So,

  %[li:ifcomenabled:%[html:cmt_form]:<em>comments disabled</em>]

on pages where comments are enabled, will expand to whatever %[html:cmt_form] expands to, while on pages where comments are not enabled (that is, either disabled or readonly), it will expand to the string “<em>comments disabled</em>”.

WARNING! Be sure to read the section devoted to eager computational model and its consequences. In this particular example, the %[html:cmt_form] call will be expanded in any case, that is, even if comments aren't enabled, it will expand anyway, but the result of the expansion will be dropped. Sometimes this isn't a problem, but it is necessary to understand what actually happens, or else you'll run into problems, sooner or later.

For the particular example, the obvious solution will be to write

  %[html:%[li:ifcomenabled:cmt_form:com_disabled]]

and to define the snippet named com_disabled in your [html] configuration section like this:

  [html]
  com_disabled = <em>comments disabled</em>
  cmt_form = ...
  ...

The iffile macro checks if the set item directory contains a file with the given name among the files to be published. It takes three arguments, for the file name, “then” and “else”. For example:

  %[li:iffile:photo.png:
     <img src="photo.png" alt="the photo"/>
     :
     <strong>No photo</strong>
  ]

The li macro also supports some functions intended to be used with lists; some of them may (in rare curcumstances) be useful in sets that don't act as source for any lists, but it would be hard to explain them without the lists-related background anyway. These functions are: prev, next, ifprev, ifnext, iflong, ifmore, listarraynum, iflistarraynum, listidx, iflistidx. We'll discuss them along with the lists.

If the second argument doesn't match any functions known to the macro, it expands to [li:ARG?!], where ARG it the second parameter's value.

Thalassa CMS