From an XML File to a Web Site

At some point I was sick of having to fix navigation links, and at the same time I wouldn't have wanted to have to navigate manually through my site. Things started getting even worse as I put my pictures online - it was a major pain to get the hierarchy in order, and I finally decided to resort to a tool to manage at least that part.

I surveyed the field and finally settled on a tool whose name I unfortunately forgot (any hints?). The salient feature of this tool was that it created a thumbnail .GIF and a miniature image with the suffix '_sm.jpg' from all your files...

At the same time, I was starting to look into navigable pages with content flow, and the solution of hardcoding each of them was starting to be really onerous (especially since I couldn't really settle on a style...). I decided it was time to write another website management tool. Download htmltree here

Since what I wanted to display was an ordered hierarchy, I chose XML as the origin format. A utility would take the XML file and a template and generate a directory hierarchy from there. My task would then be to fill the template body with content.


Initially, the only element allowed was <dir>. The element would take a name and a title attributes, respectively what the directory would be named and as what it would appear on the site. The root directory was marked as such for convenience. Directories without title attribute would inherit it from the name. Thus, the original file from where I started looked like this:

<dir type="root" title="Main">
        <dir name="blog" title="Biking"/>
        <dir name="writing" title="Writing">
                <dir type="flat" name="months" title="Months in the City">
                        <dir name="january"/>
                        <dir name="february"/>
                        <dir name="june"/>
                        <dir name="july"/>
                <dir type="flat" name="essays" title="Essays">
                        <dir name="competence" title="The Curse of the Competent"/>
                        <dir name="factory" title="The Factory vs. the Workshop"/>
                        <dir name="ceo" title="The Good Executive Officer"/>


The system needed to be able to take a template and replace markers (directives) with the navigation links. There was no particular requirement to use one system over the other, so I chose to use double percent signs as markers. The substitution directives would be:

  • %%Title%% - The title of the page, as defined in the XML file
  • %%Timestamp%% - The time at which the template was generated
  • %%Body%% - The body of the file, the content without navigation

To iterate over tree items, there are sections of HTML that have special replacement variable. The sections typically require iteration over a collection, so they have to be marked with start and end in the template. The current iteration directives are:

  • %%BeginCurrent%% - %%EndCurrent%% - A section of HTML that gets replaced for the current directory. Inside of it, you can include the directives that start with %%Dir explained below.
  • %%BeginSiblings%% - %%EndSiblings%% - A section of HTML that is inserted per sibling of the current directory.
  • %%BeginUp%% - %%EndUp%% - A section of HTML that will be used to navigate to the parent directory.
  • %%BeginSub%% - %%EndSub%% - A section of HTML that will be replaced per child of the current directory.
  • %%BeginNavigate%% - %%EndNavigate%% - A section of HTML that will be replaced with a set of links to ancestors of the current directory for quicker navigation.

Inside of each iterator, the following directives are available:

  • %%DirTitle%% - The title of the directory, as in the XML file
  • %%DirLocation%% - A URL link to the directory for navigation

Worked quite well, especially after adding the %%Body%% directory. Initially, I needed to protect the content of each file from being overwritten on update, so I decided to add include files. If there was a file by the name 'x.body' in a directory into which the file 'x' would have been written, then the contents of the former were pasted into the latter replacing the %%Body%% directive.


Later on, I decided to merge the web gallery functionality with this utility. Changes in the code were fairly profound, but the directory XML file would be modified only minimally. Indeed, all that occurs is the addition of two attributes to the <dir> element:

  1. type="images" - which marks the current directory as containing images and hence as obeying external rules
  2. source="dirlocation" - which indicates where the original images for this directory are coming from.

Image directories are handled differently in many respects:

  • image directories are recursed automatically
  • the name of the directory is automatically its title
  • the attribute type is set to imgdir for all descendants of the image directory
  • most importantly, the source directory is consulted for content. If an image file is present in the source, it is resized to thumbnail and web preview size and the two generated pictures put into the destination
  • the index file generated contains a thumbnail view of all pictures in the %%Body%% section of the template, and allows for navigation.

A few of the command line switches deal specifically with image directories.

Flat Directories

The special marker type flat distinguishes directories that do not have children on the file system, although they have structure in the XML file. Instead of generating a directory for each dir element, for flat directories each dir element represents a file whose name is given by the name attribute, with an appended ".html".

In the example above, consider the dir element with name months. It has four subelements, with the names of months. As a result, once the utility completes operating, you will have a directory called 'months'. The directory will contain five files, named index.html, january.html, february.html, june.html, july.html.

Currently, recursion of flat directories is not implemented, so that they have to be leaves.

Command Line Switches

This utility accepts a set of command line switches. To help you remember them (and their default values), try htmltree.tcl -help. In the following a complete list of switches, default values, and meaning.

General Command Line options
-to . (dot) The destination directory, which is the final site directory (or a temporary directory if you so wish).
-xmltree dirtree.xml The name of the XML file that contains the information on the site tree.
-template template.html The name and location of the template file.
-start {} (empty) The name of a starting element, in directory notation. If non-empty, all computation and replacement will be performed, but no files are going to be written unless the name of the file begins with the -start attribute
-from . (dot) The name of a directory from where the body files and images will be taken.
-output short The type of output. Currently, only short and long are supported.
Command Line options relating to image directories
-imagetitlesep  :  (space-colon-space) The separator used between hierarchy levels
-imagerowcount 3 The number of images to put in one row
-imageskipglob see program The glob patterns that determine files to skip. I typically don' delete image files taken with my camera, but leave them with the default name. Only those I want to put online I rename to something meaningful. This switch ensures that I get only what I want.
-imageconvert false Whether to convert images in image directories to thumbnails or not. Defaults to false for file integrity.
-imageskipdir CVS The glob patterns of directories to skip entirely, although part of an image tree.
-imagethumbsize 128x128 The UNIX geometry size (widthxheight) of generated thumbnails. This geometry determines only maximum size; that is, images will be converted respecting their aspect ratio, but such that their maximum pixel size is bounded by the rectangle defined above.
-imagesmallsize 600x600 See above for explanation. Used to determine the maximum frame for a reduced size image. If the image is 1024x768, in the default setting it would turn into a 600x450 image.