Macroprocessor introduction

Contents:

Overview of macros in Thalassa CMS
The escape char
Three flavors of macro calls
Eager evaluation and its consequences

Overview of macros in Thalassa CMS

Generally speaking, macro is a kind of rule for replacing one text with another; a macro call (which is often confused with the macro itself) is a (small?) portion of text which gets automatically replaced with some other text according to the rule. The process of this replacement is called macro expansion, and the piece of program code which performs the expansion is called macroprocessor.

Macros are used heavily in Thalassa CMS configuration files; it is no surprise because, actually, what Thalassa does is turning one set of text files (your sources) into another set of text files (your site content).

From the very start it is important to keep in mind that macro expansion is only done in the ini files, and not everywhere, but only within values of some (honestly, most of) parameters. It is explicitly mentioned in documentation for every ini file section whether macroexpansion is done in all its parameters, only some of them or none.

Strictly speaking, all macros in Thalassa are built-in in the sense that you can't add new macros without hacking the source code of Thalassa itself. However, you can add various snippets, templates, options and other things accessible via the existing macros, and some of them accept parameters so effectively you can achieve all the same goals as if you could invent user-defined macros.

There are basic macros, which are available both for thalassa and thalcgi.cgi; there are as well macros specific for each of them; and there are even some macros local for a particular configuration file section or a particular parameter.

The escape char

Wherever macro expansion is performed, it uses the percent char “%” as the “escape” character. This means that once the macroprocessor sees the percent char, it expects there will be a macro call right after it. So, if you just need the percent char itself, you must double it, like this: %% — but only in case you write a text which will be passed through the macroprocessor; remember, not all parameter values go through the macroprocessor, so in case of any doubts be sure to take a look at the documentation for a particular configuration section.

Three flavors of macro calls

Thalassa uses the macroprocessor implemented by ScriptPlusPlus library (see http://www.croco.net/software/scriptpp/). This library provides three flavors of macro calls: simple, nesting and lazy. For a macro named foobar, the simple form of the call is written as %foobar%, the nesting form will be %[foobar], and the lazy form will be %{foobar}.

Before we discuss the difference between them, we need to introduce macro arguments. It is relatively a rare case when a macro is just called by its name without any additional information. Sometimes this thing happens; for example, macro named now returns the current time as a Unix datetime value (a decimal integer equal to the amount of seconds passed since Jan 01, 1970). We can call it both with %now% and %[now], in this (very simple) case it makes no difference. Another example of such a “parameter-less” macro is message, which is only available in thalcgi.ini and represents a short message describing the result of the action the user requested and the server has just performed (or at least tried to). Again, we can write either %message%, or %[message], there's no difference.

It is also possible to write %{now} or %{message}, and both will (highly likely) work, but this is strongly discouraged and may lead to serious security-related problems. Please don't use the “lazy” flavor of macro calls unless you're absolutely sure you know what you do.

In the whole Thalassa CMS there are no more macros accepting no arguments, only these two (strictly speaking, there are also some such macros specific to particualr configuration parameters, but let's ignore them for now, as they even aren't listed in the macro references, they are only mentioned in descriptions of the respective parameters). All the other macros accept one or more arguments. For example, macro named ltgt accepts exactly one argument — an arbitrary string, and returns the same string, but with characters “<”, “>” and “&” replaced with, respectively, “<”, “>” and “&”; so, the macro call %[ltgt:3 < pi < 4] will be replaced with “3 < pi < 4”. Exactly the same will happen if we write the call in the simple form: %ltgt:3 < pi < 4%

In this example, a colon “:” is used as a delimiter between the name of the macro and its argument, but this is not necessarily so. In the macro system we discuss here, a macro name can only consist of alpanumeric chars ('a'..'z', 'A'..'Z', '0'..'9'), the underscore “_” and the asterisk “*”. When the macroprocessor analyses a macro call, the first char it sees which doesn't belong to this set is taken as the delimiter. In examples all through this documentation, as well as in example sites provided along with Thalassa, we usually delimit with colon, but sometimes we have to use something different, like “|” (in case the colon is to be present within one of arguments). However, you can use whatever punctuation character you like.

WARNING: don't use non-ascii characters as delimiters, specially if you use UTF-8. Problems are almost guaranteed if you do. Also it is not a good idea to use whitespace chars in this role.

It is a very common situation when a result of one macro needs to be passed to another macro as one of its arguments. For example, macro named rfcdate turns a Unix datetime value into a human-readable form as defined by rfc2822, something like “29 Mar 2023 19:15:00 +0000”. So, to get the current date and time, we need to write “%[rfcdate:%[now]]”. It is obvious that the simple form doesn't work here: we can write “%[rfcdate:%now%]” and it will work, but if we try “%rfcdate:%now%%”, the macro rfcdate will receive empty argument (and hence will fail), while the now macro will not be called at all.

Macro calls in their simple form work a bit faster, so sometimes it makes sense to use the simple form, but it obviously doesn't work even for shortest superpositions, like the one we've just discussed. This is why the nesting flavor of macro calls is there.

Before we start discussing the last macro call flavor — the lazy one — we'd like to repeat one more time our warning. It is almost always possible to avoid lazy macro calls, and it is highly recommended that you don't use them at all. You can just skip the text until the next heading, and stay safe. If you don't understand why we give this recommendation, it means you can accidentally introduce a security hole into your site's implementation and remain unaware until a catastrophe happens. You've been warned.

When the macroprocessor performs macro expansion, it does precisely the following. First, it analyses the macro call, determines where the call starts and where it ends, and, BTW, for nested calls this is not as easy as it can sound. The next step is to break the macro call down to its parts using the delimiter chars, so that the macroprocessor knows the name of the macro and all its arguments. For the simple flavor of macro calls, we're almost done: superposition is impossible here, so the macroprocessor just calls the function corresponding to the extracted macro name and passes all the arguments to it as an array of strings. The function computes the desired result of macro expansion, returns it, and macroprocessor appends this result to the text being composed.

For the nested flavor, things are different, as each of the arguments may contain further macro calls. So in this case the macroprocessor has to effectively create another instance of itself, use that instance to process each of the arguments, and only then it can call the corresponding function, passing it this time an array composed of processing results for each argument, instead of arguments themselves.

Lazy calls follow completely another way. Once they have the arguments, they immediately call the coresponding function, passing it the array of raw (unprocessed) arguments. But once the function returns the result, this result is processed again before it goes to the target text.

Consider for example the macro named readfile. It takes a file name as the argument and reads the file; being used with simple or nesting flavor of calls, this allows to insert a whole file's content into your parameter (e.g. into an html page being generated).

If you use readfile in a lazy macro call, you'll be unable to compute the name of the file, so you have to know it in advance, but this might be not a problem. The real problems may arise out of the fact that in this case the file's content will get processed by the macroprocessor, so, should there be pieces of text looking like macro calls, they will get macro-expanded. Again, this might be no problem if the file is written by you and you're sure there's nothing wrong in it. It is even possible you decide to do this intentionally. But in case the file you read this way may (at least in theory) be modified by someone else, then your security hole is ready to serve: everyone who can technically supply such a file to your site's implementation, can do whatever can be done with Thalassa CMS macros, and that's a lot of things. Definitely it is more than you'd want to allow arbitrary people to do on your server.

Once again: if all this sounds complicated to you, simply don't use the lazy flavor of macro calls, and that's all.

Eager evaluation and its consequences

As it is mentioned above, for a nesting macro call, the process of macro expansion involves applying the macroprocessor to every argument of the macro. In other words, every argument of a nesting call gets computed as a separate text to be macroprocessed. In terms of functional programming, all arguments are first evaluated, and only then the actual macro is applied to the results. This is exactly what is usually called eager evaluation model.

It is important to understand that this happens to every argument of the call, every time the call is processed. The macro system used in Thalassa CMS is not a programming language, and it doesn't provide “special” macros that could skip evaluation of some of their arguments depending on values of other arguments. Such “selective” evaluation is simply impossible in this implementation.

Thalassa CMS provides some “conditional” macros, which allow to choose one of two or more variants. What is critically important to understand here is that all variants will be computed every time a call to such a macro is processed. The macro implementation will choose one of the variants afterwards, but it is only after all the variants are computed.

Let's consider a more or less simple example. The ifeq macro takes exactly four arguments, the first two are stripped from any leading and trailing spaces and compared; in case they are equal, the third is returned, otherwise the fourth is returned. Another macro, opt, gives access to various options given in the options section group. The last macro for our example is known to us already, it is readfile we explained earlier. Now look at the following:

  %[ifeq:%[opt:scheme:lights]:night
      :%[readfile:night.txt]:%[readfile:default.txt]]

Well, the result of this is more or less obvious: if the option scheme/lights is set to night, then the whole construction will be replaced with the night.txt file's contents, otherwise the contents of default.txt will be used. What is not so obvious is that both files will be read. Sometimes this can be a problem, specially in case one of the files actually doesn't exist and an unintentional attempt to read it produces an error.

For this particular example, it is easy to avoid the problem:

  %[readfile
    :%[ifeq:%[opt:scheme:lights]:night:night.txt:default.txt]
  ]

Being used this way, ifeq doesn't choose between two readfile call results (after doing them both), as it was done in the previous example; instead, it just chooses one of the two file names, and the choice is used as the readfile's argument.

In fact, Thalassa provides ways to always avoid unnecessary computations, but attention must be paid to this, and, first things first, the person who writes configuration files should at least understand what the problem is.