Marco's Blog

All content personal opinions or work.
en eo

Comparison Shopping: Tcl, Perl, PHP, Python

2008-06-18 7 min read Comparisons marco

I have always been fascinated by programming languages, and scripting languages have always had a particular place in my heart. After all, they allow you to develop without much encumbrance, starting from nothing to program in no time. There are no lengthy build and compilation cycles, and sometimes you can even use dynamic language features to make your changes to a running application – neat!

For me, it all started with Forth, which cannot really be called a scripting language at all. It compiles functions into bytecode as soon as they are defined, and the only feature that reminds one of scripting languages is just how easy it is to write and rewrite. Since Forth reacts dynamically to changes and is interpreted,

After that, I discovered Tcl, which quickly became my favorite programming language. It had a rich set of extentions that included GUIs (albeit at that point only on UNIX) and highly dynamic capabilities. It seemed like an ideal choice.

In the years after that, I came close to working for the inventor of Tcl, John Osterhout. In 1999 I even had a written offer for a position at his company, Scriptics, a spin-off from Sun that went absolutely nowhere really, really fast. After that, it became imperative to branch off the beaten path, since Tcl was becoming less and less a viable option.

Fortunately, a series of scripting languages decided that the approach taken by Tcl was the right one and followed many of its choices, avoiding many of the same pitfalls.

The first that became prominent was Larry Wall’s Perl. Originally a replacement of and improvement on awk (and who aside from me remembers awk?) it grew quite rapidly into the leading scripting language of the nascent Web. Its main advantages: lightning speed and huge bench strength where it mattered the most, string processing.

[As an aside, I truly believe that it was mostly its atrocious string processing that killed Java as the dominant server-side Web language.]

Perl’s strength were unfortunately accompanied by atrocious weaknesses that derived in large part from its history. Its syntax is atrocious, focused on saving bits and bytes and trying to accommodate everyone. The language balooned where it didn’t matter, in the support of concurring similar features, and it didn’t improve much where it did, in the support of higher level programming interfaces. A programming language written by system administrators for system administrators had little chance of spreading to the new world of software developers.

After Perl came the huge moment of PHP. Allegedly, the acronym originally stood for Personal Home Page, and the name still is a good indicator of what the language was meant for: rapid development and deployment of web sites. As such, the language early on got outstanding string support and excellent bindings for most important Web “stuff”, like databases and network protocols.

I love PHP. I have used it with success in a series of web sites, and to this day PHP is probably the first skill I would want to see in a software developer that is interviewing with me. PHP is practical, non-dogmatic, and proven in the field. The only thing missing from it is first class support for high-potency web sites (things like memory caching of data queries and the like).

Last on the list, since I just started using it, is Python. I always hated the notion of it, since I always disliked its inventor, Guido van Rossum. I witnessed a talk by Guido a while back, read about his work, and found him oddly opinionated and forcefully assertive on matters that were clearly just his personal crusades. Most of all sticks out a purely syntactical choice: that of making whitespace the block marker.

What does that mean? Most programming languages are very explicit with statements: they have to end in a character, typically semi-colon, that doesn’t show up in the rest of the text, and they have to group statements together in blocks that are typically enclosed in braces. That’s true C, C++, Java, and most compiled languages. Amongst the scripting languages, newline characters replace the semicolon as statement terminator, but for the most part the braces are retained.

Guido decided that he wanted indentation to mark blocks instead of braces. In theory, that made for more readable programs, since there were no braces around. In practice, though, indentation is terrible, since it is typically handled with tabs and spaces – and Python accepts only a specific set of characters for a particular level. This means that if you set your editor to display a tab as four spaces, what looks like the same indentation could be two tabs or eight spaces or one tab and four spaces – to Python, that’s all different.

I decided to try it out, anyway. I wanted to write a KDE application, and I needed a scripting language. Of all the scripting languages around, only Python had a long relationship with Qt (the framework around which KDE is built), and while PyQt is not available on Windows, my primary focus was KDE.

I confess I was pleasantly surprised. Once you get over the (still extremely idiotic) choice of whitespace as indentation marker, you learn the language and you just have to fall in love with the cleanliness with which the Pythonites work. Unlike Perl, where the inventor is proud of “there’s more than one way to do it”, or Tcl, where different parts of the language follow different syntax, in Python everything is clean. Basic syntax, argument lists, return values, function definitions – everything is consistent, making for great usability as a programmer.

Even better is the philosophy of “batteries included”. Python tries to ship with a critical mass of extension that allow you to do pretty much anything you’d want. So you find database bindings, XML parsers, string processors, and a bunch of useful utilities in the main distribution.

And here is where I come full circle back to Tcl, where I started. To this day, if I want to write a cross-platform utility, I would use Tcl. Its support for cross-platform GUI is outstanding, with the later versions having entirely inherited the native L&F. I write one app, and on Linux it looks like my desktop, while on Windows it morphs to look like a Windows app. That’s wonderful!

At the same time, Tcl’s syntax is awkward. It is marred by an insanely stupid parser that looks at things from the point of view of a process invocation, parsing by whitespace and performing keyword substitution like a shell. That almost works in the shell (you try to this day working with file names that have a space in them in UNIX), but in a programming language, this need to distinguish between substitution levels is inane. [Note: that’s a word, I didn’t mean “insane”]

Even worse, Tcl is horribly inconsistent. It has only two basic data types, strings and lists. While all lists are also strings, not all strings are lists. Now, string processing commands behave completely differently than list processing commands, and list processing commands are inconsistent in behavior.

Example: there is only one “string” command that performs most of the things you’d want to do with a string. If you want to get a substring of a string, you use the string range command, and the result is the substring. If you want find a range in a list, you use the special lrange command. Why? Why isn’t there a list range command?

Tcl’s GUI support is wonderful. You create objects by name, then you operate on the objects. Say, you want a button: you just type “button .b” and there you have it. Then, if you want the button to do something, you just tell it. .b invoke, for instance, would run whatever action you associated with the button. Unfortunately, even here you’ll find that same inconsistency: if you want to inquire about a property of the object, you cannot just ask for it, you have to use the configure command. Instead of saying “.b text” to get the button text, you have to use .b cget -text.