Posts Tagged ‘Python’

Languages that are easy to pick back up

Tuesday, July 1st, 2008

Some programming languages are much easier to come back to than others. In my previous post I mentioned that Mathematica is easy to come back to, put Perl is not.

I found it easy to come back LaTeX after not using it for a while. It has a few quirks, but it’s basically consistent. The LaTeX commands for Greek letters are their names, lower case names for lower case letters, upper case names for upper case letters. The command mathematical symbols is usually the name a mathematician would give the symbol. Modes always begin with \begin and end with \end.

Python also has a consistent syntax that make it easier to come back to the language after a break. Someone has said that Python is similar to Perl, except that the word “except” does not appear nearly so often in the Python documentation.

It’s more important that a language be internally consistent than conventional. Each of the languages I mentioned have their peculiarities. Mathematica uses square brackets for function argument arguments. LaTeX uses percent signs for comments. Python uses indention to denote blocks. Each of these take a little getting used to, but each makes sense in its own context.

A special case of consistency is using full names for keywords. Mathematica always spells out words in full. For example, the gamma distribution object is named GammaDistribution. I don’t mind a little extra typing. I’d rather optimize for recall and readability than minimize keystrokes since I spend more time recalling and reading than typing. (One flaw in LaTeX is that it occasionally uses unnecessary abbreviations. For example, \infty for infinity. The corresponding Mathematica keyword is Infinity.)

Porting Visual C++ code to Linux/gcc

Thursday, May 29th, 2008

Here are a few lessons learned from porting a numerical library recently from Windows/Visual C++ to Linux/gcc.

Some of our code only runs on Windows, and only needs to run on Windows. Our first thought was to put #ifdef WIN32 directives around that code. Clift Norris came up with the clever idea of using #ifndef EXCLUDE_WINDOWS_ONLY_CODE instead. That way we could do a preliminary test of the portable subset of the code while still working on Windows where we’re more comfortable. Also, by not referring specifically to 32-bit Windows, we’re OK moving the code to 64-bit Windows.  

Visual C++ does not require source and header files to end with newline characters, but gcc does. We got hundreds of warnings of the form warning: no newline at end of file when we first attempted to compile our code on Linux. Apparently there’s no gcc switch to turn this off, and it may not be prudent to turn it off if you could. As I understand it, Visual Studio inserts a linebreak after including header files, but gcc may not and so gcc needs to issue a warning in this case while Visual Studio does not. We copy our source tree to a Linux box then run the following Python code on that box to insert the extra newline characters when needed.

import os  

# List of directories of files to add newlines to.
# Script must be in the same location as these directories.
directories = ["Banana", "Apple", "Peach"] 

for dir in directories:
    for file in os.listdir(dir):
        if file.endswith(".h") or file.endswith(".cpp"):
            path = dir + "/" + file
            handle = open(path, "r")
            slurp = handle.read()
            handle.close() 

            if not slurp.endswith("\n"):
                retcode = os.system("chmod +w " + path)
                if retcode != 0:
                    print "chmod returned " + retcode + " on " + path
                else:
                    handle = open(path, "a")
                    handle.write("\n")
                    handle.close()

There were several places in our code where a variable was deliberately unused but retained in a function signature. Suppose a function has signature void foo(int a, int b) but b is unused. We had usually handled that by making b; the first line of the implementation. That would suppress unused variable warnings in Visual C++, but not in gcc. When we changed the function signature to void foo(int a, int /* b */), that made both compilers happy.

We started out using autoconf, but that was overkill for our project. Our build process became two orders of magnitude simpler when we switched over to a crude, old-fashioned make file. After porting the library to Linux, we built it on OS X without any issue.

This wasn’t an issue for us, but a potential problem when moving numerical code between Unix-like systems is that the function gamma computes different things on different systems. On Linux it computes the logarithm of the mathematical gamma function but on OS X it computes the gamma function itself. See the last two paragraphs of how to calculate binomial probabilities and Thomas Guest’s comment on that post for a full explanation.

This library had been ported to Linux years ago, but nobody used it on Linux and so development continued only on Windows. When we first ported the code, gcc and Visual C++ seemed to have incompatible requirements, especially with templates. The more recent port described here was much easier now that both compilers are more compliant with the C++ standard.

Code to make an XML sitemap

Monday, February 25th, 2008

Here’s some Python code to create a sitemap in the format specified by sitemaps.org and read by search engines. Download the file sitemapmaker.txt and change the extension from .txt to .py.

Change the url variable in the script before running it or else you’ll point search engines to my web site rather than yours. Also, edit the file extensions_to_keep variable if you want to index any file types besides HTML and PDF.

Copy the file sitemapmaker.py to the directory on your computer where you have your files. Run the script and direct its output to a file, sitemapmaker.py > sitemap.xml. See sitemaps.org for instructions on how to let search engines know about your sitemap.

This code assumes all the files to index in your sitemap are in one directory, the directory you run the script from. It also assumes the timestamps on your computer match those on your web server. Optional fields are left out of the sitemap.