Setup Apache2 for a Thalassa-based site

Contents:

Scope and limitations of this instruction
Common preparations
- Getting Thalassa binaries
- Installing Apache and getting it to run
Configuration for a single site
Configuration with virtual sites but without suexec
Configuration with virtual sites and suexec

Scope and limitations of this instruction

This page gives a step-by-step instruction how to set up a fresh Linux machine to serve a Thalassa-based site with the well-known Apache HTTP server. It is assumed that your site is based on the Smoky template, but some advices are given for non-template sites as well.

Instructions are given assuming Debian or a Debian-like distribution. On machines with other distros, the things may look slightly different. This question is addressed somehow, too, but obviously it is not possible to provide a step-by-step instruction covering all possible distros and systems.

All HTTPS-related stuff is deliberately ignored here. If you really want/need HTTPS with all its certificates, trusted authorities and all the stuff, well, there are tons of sources on how to make Apache serve it.

If for some reason you don't need interactive features of Thalassa, then the simplest way to organize thing will be perhaps to generate the site on your own machine and upload the results to the server; deployment of a completely static site has no Thalassa-specific issues and is not addressed here. The tricky part is to make Apache run the Thalassa CGI program properly.

There are three possible configurations for Apache to run a Thalassa-based site:

without virtual hosts;
with virtual hosts that belong to the same user, without suexec;
with virtual hosts, using suexec.

The last one may look the most tricky, but from the other hand it is the most flexible and secure one. In case there are several people running their sites on a single server machine (no matter if it is physical or virtual), the suexec-based configuration becomes the only possible option; however, even if you run only one site on your server, it is still much better to run it as a virtual site with suexec, so that the CGI program will execute under a (dummy) user different from the one used for Apache itself. So this one is highly recommended.

Common preparations

Getting Thalassa binaries

Generally, there are two ways of getting Thalassa binaries ready to run on your server box: either to build them from sources right on the server, or build it on your computer and upload to the server. Yes, this IS possible, as Thalassa is built fully-static by default.

In order to build Thalassa from sources (which is not necessary in most cases!), you'll need the GNU C++ compiler and GNU make; install them with smth. like this

  apt install gcc g++ make

and follow the instruction; then, as root, copy your binaries to /usr/local/bin (this step is not required, but in the rest of this text we assume you did it):

  cd thalassa_0.3.00/cms
  cp thalassa thalcgi.cgi /usr/local/bin

However, even though Thalassa doesn't need any external libraries and weird tools to build, installing even the minimal C++ toolchain on a low-end VPS may considerably reduce the amount of available disk space, so you might want to avoid having it on your server. For most of programs around, you can't copy executable binaries from one computer to another, but for Thalassa, this approach works. The only limitation is that both computers must be of the same processor architecture and must run the same OS kernel type (okay, sometimes these may differ and copying the binaries will still work, but in this instruction we will not discuss things like cross-compiling, architecture backwards compatibility, foreign binary interfaces and the like). To check if they match, issue the uname -mo command on both boxes. Most probably, in both cases you'll see “x86_64 GNU/Linux”; this means everything's good (well, if you see something else, but occasionally the same thing on both boxes, then everything is okay, too, but hardly this will happen in real life).

So, build Thalassa following the instruction on your working machine (e.g., your home computer or your office workstation), then do something like this:

  cd thalassa_0.3.00/cms
  scp thalassa thalcgi.cgi root@your-server.example.com:/usr/local/bin

To check if they really work, login to your server and try launching the thalassa program, e.g.:

  thalassa show version

Installing Apache and getting it to run

On Debian-like distros, Apache is installed like this:

  apt install apache2

If your system is systemd-based (which is highly likely in these insane days), try launching Apache with smth. like

  systemctl start apache2

If you're lucky enough and your server runs a non-systemd system, you'll need a different command to start your Apache, like

  service apache2 start
  service apache start
  service httpd start

or even

  apachectl start

Then, point your browser to your server's address, and you should see the default Apache web page.

Please note that in Debian and its company, the Apache's configuration is found in /etc/apache2, the /var/www directory is considered the “web root” (/var/www/html being the default site's location), and Apache runs under the www-data account (dummy user). In other distros and systems the things may be different; it is useful to figure out everything in advance.

For a Thalassa-based site, you need to enable at least the Apache's mod_cgi module, and it is useful (although not strictly required) to enable mod_rewrite as well. In Debian, it's is done like this:

  cd /etc/apache2/mods-enabled
  ln -s ../mods-available/cgi.load .
  ln -s ../mods-available/rewrite.load .

In other circumstances, you might need to place the following lines

  LoadModule cgi_module /usr/lib/apache2/modules/mod_cgi.so
  LoadModule cgi_module /usr/lib/apache2/modules/mod_rewrite.so

somewhere in the Apache's configuration (the symlink chemistry shown just above actually does exactly this, only in a weird manner). Just look through what you have to figure out where other LoadModule directives are placed (or simply grep ^LoadModule -R *), and add this one.

Finally, uncomment and edit the ServerName directive for your site. In Debian and friends, it is in the sites-available/000-default.conf file, within the “default” site configuration. Make it look like

          ServerName www.example.com

It is might also be useful to let Apache know for sure how your server (that is, the machine, as opposite to the site) is named. To do so, add to your apache2.conf a line like this:

  ServerName your-server.example.com

There's one more thing to address. The CGI configuration files, named thalcgi.ini, have to reside within your web tree (side by side with the CGI programs they tune), but they must never be leaked though the web server, like normal (.html and other) files do. If you're going to use a suexec-based configuration, this problem is solved with properly set permissions, and the thalcgi.cgi program even refuses to run until you set the permissions for its configuration file properly. If, however, you're not going to use suexec (thus disregarding all our recommendations), you must take some other actions to protect these files. The Smoky template does that from the .htaccess files it generates, but even if you do use Smoky, these .htaccess files may occasionally get disabled and disregarded; and if you don't use Smoky, it is very easy to leave one of your thalcgi.ini files exposed to the whole Internet. So, it may be not a bad idea to add the following to your Apache configuration:

  <FilesMatch "^thalcgi\.ini$">
          Require all denied
  </FilesMatch>

Be sure to run apachectl restart for your changes to take effect.

Configuration for a single site

One of the simplest approaches is to place your site's source (that is, configuration files and other stuff used by Thalassa) side-by-side with that html directory (that is, make a dir like /var/www/Site for your site's source), and make the whole /var/www owned by Apache (the www-data account in case of Debian), and perform all site maintenance tasks as that user.

You will also need a directory for your session database. We'll use /var/www/_site directory for this purpose.

Letting the whole /var/www tree be owned by the www-data user is terribly risky. If some bad guys manage to establish control over your Apache, they will have write access not only to your site, but to your site sources as well, and the worst thing here is that all this may remain unnoticed for a long period. However, we cant't follow the usual permissions scheme in which the files placed in the web space are owned by another user and are only readable for Apache (actually, they are typically made world-readable, thus Apache being just one of these others). This is because, as long as we don't use suexec, the CGI programs are launched by Apache with the same credentials as it has itself, and the Thalassa CGI program must be able to update the files within the web space. If you want a secure configuration, please give up this single-site, no-suexec approach in favor of the suexec-based configuration. Yes, it does make sense even in case you only run one site on your server.

We will assume you're going to use the Smoky template; it's a good starting point anyway.

Assuming the Site directory isn't there yet, do the following (as root):

  cp -R /your/sources/of/thalassa/example/templ_smoky /var/www/Site
  mkdir /var/www/_site
  chown www-data /var/www -R
  chmod ugo+rX /var/www -R
  chmod 700 /var/www/_site

From now on, you will only need root access to modify the configuration of Apache, while all manipulations with the site's content can be done with the Apache user's (www-data in our example) permissions. If the command su - www-data fails, you might need to change the login shell for the www-data user in your /etc/passwd.

Prepare your site for generation; if you use Smoky, follow the instructions found in the README file; in particular, set at least the following in your config.ini:

  [options dir]
  target = ../html
  spool = ../_site/_spool
  thalcgi = /usr/local/bin/thalcgi.cgi

  [options site]
  url = http://www.example.com    whatever is the URL of your site
  briefname = example.com

  [options cgi]
  userdbdir = ../_site
  sitesrc = ../Site
  thalassa_bin = /usr/local/bin/thalassa
  secret = YOUR_VERY_SECRET_PIECE_OF_JUNK

  [general]
  opt_selector:feed = news

Be sure to pay attention to other configurable options as well. The parameters listed above may be more important in the sense your site will not work without them, but that's no reason to ignore everything else.

Launch the thalassa gen -r command in your Site directory and check if the /var/www/html directory is now populated with files. Actually, your site must be reachable now, showing all the static content, but the CGI program is still unable to run.

As root, edit your /etc/apache2/apache2.conf. Find the default Directory directives, which usually looks like this:

  <Directory />
          Options FollowSymLinks
          AllowOverride None
          Require all denied
  </Directory>

  <Directory /usr/share>
          AllowOverride None
          Require all granted
  </Directory>

  <Directory /var/www/>
          Options Indexes FollowSymLinks
          AllowOverride None
          Require all granted
  </Directory>

Add the following here:

  <Directory /var/www/_site>
          Require all denied
  </Directory>

  <Directory /var/www/Site>
          Require all denied
  </Directory>

  <Directory /var/www/html>
          Options Indexes FollowSymLinks ExecCgi
          AddHandler cgi-script .cgi
          AllowOverride All
          Require all granted
  </Directory>

Now restart your Apache, and everything should work.

Configuration with virtual sites but without `suexec`

This is perhaps the ugliest thing possible. Only one Apache user (that poor www-data) is available, so you can't give different users of your system access to different sites. You can't, e.g., invite friends to use your Linux box, and this is weird, because even the lowest-end-possible VPS is capable of running hundreds of Thalassa-based sites.

Anyway, if you're affraid of suexec that much, what we'd suggest is to create 3 subdirectories under your /var/www for every site you're going to run: e.g., if your site is identified somehow by the word foobar (e.g., it is http://www.foobar.example.com, well, you've got the idea), then let the /var/www/foobar directory be your site's web tree, place your site's source in /var/www/Foobar and also be sure to make an empty directory named /var/www/_foobar for the session database. This may (and should) be done as the www-data user.

Indeed, your config.ini should now be slightly different:

  [options dir]
  target = ../foobar
  spool = ../_foobar/_spool
  thalcgi = /usr/local/bin/thalcgi.cgi

  [options site]
  url = http://www.foobar.example.com
  briefname = foobar.example.com

  [options cgi]
  userdbdir = ../_foobar
  sitesrc = ../Foobar
  thalassa_bin = /usr/local/bin/thalassa
  secret = YOUR_VERY_SECRET_PIECE_OF_JUNK

  [general]
  opt_selector:feed = news

Next, as root, create an Apache configuration file for your site. Here's an example:

  <Directory /var/www/_foobar>
          Require all denied
  </Directory>

  <Directory /var/www/Foobar>
          Require all denied
  </Directory>

  <Directory /var/www/foobar>
          Options Indexes FollowSymLinks ExecCgi
          AddHandler cgi-script .cgi
          AllowOverride All
          Require all granted
  </Directory>

  <VirtualHost *:80>
          ServerName foobar.example.com
          ServerAlias foobur.example.com www.foobar.example.com

          ServerAdmin webmaster@remove-this-crap.foobar.example.com
          DocumentRoot /var/www/foobar

          ErrorLog ${APACHE_LOG_DIR}/error.log
          CustomLog ${APACHE_LOG_DIR}/access.log combined
  </VirtualHost>

If your system uses that debian-style configuration, save the file as /etc/apache2/sites-available/001-foobar.html (or, you can pick 002-, 003- etc, in case the 001- prefix is already taken). Then, go to your sites-enabled directory and make the appropriate symlink:

  cd /etc/apache2/sites-enabled
  ln -s ../sites-available/001-foobar.conf .

If you're not on a debian-like system, it is still useful to define each virtual site in its own conf file. You can include them one by one by adding to your apache2.conf directives like

  Include foobar.conf

— or, alternatively, you can make a directory for these files, e.g., /etc/apache2/virtuals/, and include them all at once:

  IncludeOptional virtuals/*.conf

Everything should work when you restart your Apache.

Configuration with virtual sites and `suexec`

Install the suexec program. If you use a Debian-based distro, you might want to use the custom (in Debian's terms) version of suexec, installed like this:

  apt install apache2 apache2-suexec-custom

However, the default suexec version (as provided by the original Apache package) will work well too; if you decide to use that one, install like this:

  apt install apache2 apache2-suexec-pristine

The following steps don't depend on which suexec you choose, both will work. On non-debian systems, there's usually no such option, but there's nothing wrong in using the default suexec.

Enable the module. In Debian, it's is done like this:

  cd /etc/apache2/mods-enabled
  ln -s ../mods-available/suexec.load .

In other circumstances, you might need to place the following line

  LoadModule suexec_module /usr/lib/apache2/modules/mod_suexec.so

somewhere in the Apache's configuration. Whatever system you use, be sure to add the following line to your apache2.conf:

  Suexec on

Set up the directory tree. For this approach, we strongly recommend to create per-user directories under your /var/www/, with per-site directories residing under the users' directories of their owners. For example, if the user lizzie is going to run a site named foobar, then, according to this layout paradigm, there must be the per-user directory named /var/www/lizzie/, and the directories /var/www/lizzie/foobar, /var/www/lizzie/Foobar and /var/www/lizzie/_foobar, for the site web space, the site sources (those files used by Thalassa to generate the site) and the session database respectively. The whole /var/www/lizzie/ subtree must belong to the user lizzie, and only the /var/www/lizzie/foobar directory has to be world-readable (so that Apache can read it), while the other two dirs (Foobar and _foobar) may — and should — only be accessible for the owner. You might want to make a symlink to /var/www/lizzie/ from within the lizzie's home directory for the user's convenience; we'll name the symlink “web”.

To make the things as they should, do the following as root:

  chown root:root /var/www
  chmod 755 /var/www
  mkdir /var/www/lizzie
  chown lizzie: /var/www/lizzie
  chmod 755 /var/www/lizzie
  ln -s /var/www/lizzie ~lizzie/web

Then, login as lizzie, and do the following:

  cd ~/web
  cp -R /your/sources/of/thalassa/example/templ_smoky ./Foobar
  chmod 700 Foobar
  mkdir _foobar
  chmod 700 _foobar

After that, cd Foobar and customize the Smoky template to fit your needs, exactly as we did it for the non-suexec scheme with multiple sites; as all paths are relative there, nothing actually changes from Thalassa's content generator's point of view.

Take care about permissions for your thalcgi.ini. The Smoky template sets the appropriate permissions for the thalcgi.ini file on its own, but if you wrote your site ini files from scratch, make sure the file has the mode 0600 in your web space. You can copy the file to your web space as a “binary” or generate one as a “page” (note Smoky does exactly this); in both cases be sure to add chmod = 700 to the appropriate section.

Tell your Apache to serve the site. To make Apache recognize (and serve) your site, prepare a virtual site description file in a way similar to the one discussed for the non-suexec approach. The file's content will be a bit different, as you need to adjust the directories and to add the SuexecUserGroup directive, which tells Apache which UID and GID to use for the CGI “scripts” launched via suexec for the particular site. Assuming that lizzie's primary user group is webadmin, the virtual site description for our example will become as follows:

  <Directory /var/www/lizzie/_foobar>
          Require all denied
  </Directory>

  <Directory /var/www/lizzie/Foobar>
          Require all denied
  </Directory>

  <Directory /var/www/lizzie/foobar>
          Options Indexes FollowSymLinks ExecCgi
          AddHandler cgi-script .cgi
          AllowOverride All
          Require all granted
  </Directory>

  <VirtualHost *:80>
          ServerName foobar.example.com
          ServerAlias foobur.example.com www.foobar.example.com

          ServerAdmin webmaster@remove-this-crap.foobar.example.com
          DocumentRoot /var/www/lizzie/foobar

          ErrorLog ${APACHE_LOG_DIR}/error.log
          CustomLog ${APACHE_LOG_DIR}/access.log combined

          SuexecUserGroup lizzie webadmin
  </VirtualHost>

See the discussion of the non-suexec approach for details on where to place this file and how to activate it.

Thalassa CMS