Tip: Dictionary Substitution With Python's print

09:30 Thu, 23 Sep 2010

You have no doubt been using python's print statement for yonks and are pretty familiar with it. You know that you can use %, the string formatting or interpolation operator. Most people use the standard formatting types of %s, %d, and so on. I have recently changed to using the less common dictionary mapping and here I go through the benefits and why I changed.

Briefly, % is the operator that allows you to do C style sprintf operations. If you don't know them, they let you substitute values into a string, like this: print "%s is the time" % "now" which give the result "now is the time". The '%s' is an indicator to insert a string here, and the string to insert is in the second part of the expression, the 'now'. There are many other types of formatted insertions: int, float, scientific and so on as well as string.

If you do any sort of web programming, you use the print statement a lot. After all, the purpose of your program is to generate HTML for the web server to serve up to your users, and you make the HTML by outputting strings using print.

If your coding is anything like mine, you start off by using lots and lots of print statements, just to get the ball rolling and getting some sort of output. Often the prints are interspersed with a bunch of calculation statements , creating a bit of a jumble. Later, I go back and refactor the block to bring all the calcs to one place and all the prints to one place, then try and consolidate them to reduce the number of statements. (Although the server does do some caching to reduce the IO between server and app, it still makes sense to give your input to the server in as few statements as possible.) In particular, I try to move print statements around so I can consolidate them into big blocks of """ ... """. This has two benefits: it reduces the number of print statements, and, perhaps more importantly, I can more easily see the structure of the HTML since I can format the tags so they appear in logical blocks rather than spread over one long line.

By the way, while doing this I discovered that print """ ... """ can use the % operator too. I didn't know that and assumed for some reason that you could use it only for "normal" print statements. That was quite a discovery and made generating the HTML much easier, both to create and debug. So now I can write print """ ... %s """ % "now" and I get the correctly formatted substitution in my output.

However, using substitution in large blocks of HTML becomes cumbersome. Say you want to insert a new value in the middle of a block of HTML. You now have to count the parameters and hope you are inserting in the correct place: print """ <h1> some word %s inserted</h1> <p>another word %s inserted</p> <p>another word %s inserted</p> <p>another word %s inserted</p> <p>another word %s inserted</p> <p>another word %s inserted</p> <p>final word %s inserted</p> """ % ('now', 'is', 'the', 'time', 'all', 'good', 'men')Tricky. Especially when you come to maintain it six months later.

I've intentionally made a mistake in the above example (the values are missing the word 'for' in the start of the well known phrase "now is the time for all good men to come to the aid of their party"). How would you fix it? First you need to work through the %s's to figure how where to insert the extra %s, then you would need to count through the substitution values to figure out where to add the new value of 'for'. Very easy to make a mistake, and a pain to actually do.

Aside: it's probably not really an issue when you're dealing with multiple print statements. Each print probably has a substitution list of only a few values. But once you have consolidated print statements, as in this use case of writing HTML, each print may have a dozen or more substitutions. That's when it gets tricky.

There is a better way to do string formatting and that is to use a dictionary. Many python people don't use this, if they are anything like me, because the doc page does a pretty poor job of explaining it, at least in terms of why you would want to. The docs show the syntax, but not an example of its benefits. We will come to the benefits in a second.

This is how you use a dictionary instead of a tuple of values. Create a dictionary (called phrase in this example) with all the values and keys to represent them. It's not much extra work; the values fall out when you are doing the calculations to generate them and it is a simple matter to put them in a dictionary at that time. Then, in the print statement, use the keys to insert the values from the dictionary. Here's the dictionary: phrase = { 'word1': 'now', 'word2': 'is', 'word3': 'the', 'word4': 'time', 'word5': 'for', 'word6': 'all', 'word7': 'good', 'word8': 'men', 'word9': 'to', 'word10': 'come', 'word11': 'to', 'word12': 'the', 'word13': 'aid', ...} Not a great choice of key names, but I don't want to get bogged down in details. I want to make the example clear, so I chose the key to reflect the word's position within the phrase. (In actuality, you would choose key names that reflect the purpose of the variable, such 'name' or 'email'.)

Here's the print statement: print """ <h1>some word %(word1)s inserted </h1> <p> another word %(word2)s inserted</p> <p>another word %(word3)s inserted</p> <p>another word %(word4)s inserted</p> <p>another word %(word5)s inserted</p> <p>another word %(word6)s inserted</p> <p>final word %(word7)s inserted</p> """ % phrase Notice the change: the value's key is placed between parentheses that are between the '%' and the indicator 's'.

Some points: the key is specified between parentheses placed in between the substitution operator '%' and the substitution type 's' (or d, f, g, etc); the key is not stringified, but typed as it appears in the dictionary without the enclosing ''; and finally you need only to specify the dictionary name at the end of the print statement.

You can see this is very easy to maintain. If I want to insert a value, I don't need to count any values to make sure I'm specifying the correct one in the correct place, I just specify it by key name. It's much harder to get wrong.

An added benefit is that there is (unintuitively) less typing since you specify only key names, not like dictionary[key].

Finally, and this is the real kicker, the number of keys in the dictionary doesn't have to be the same as the number of substitutions. This is great; it is such a benefit that even by itself it means that a dictionary is much better than using parameter substitution. With "normal" substitution, the tuple has to have the same number of items as there are substitutions (unless you can slice the tuple), and the values have to be in the order that you want to use them (rather than unordered). You have to have a one-to-one match between each %s and 'value'. You can't have too few values (obviously), but you also can't have too many, which is annoying sometimes when working with tuples and lists. With a dictionary, you can pre-compute any number of values and just pick off the ones you need.

So what? Say you are generating a <form> which is re-using values from a previous POST, and that the form is quite complex with about 30 different elements ranging from textboxes, drop-down lists to radio buttons. If you use "normal" %s substitution, the print statements become very complex and it becomes hard to maintain and change. (I had real life experience of this when making my ephemeris.) Instead, store all the values in a dictionary and recall them by using their key name, and you will find maintenance of the form becomes a snap.

I hope this gives you a real life example of the benefits of using a dictionary in string formatting. I find it both better and easier to use for anything involving intermixed calculations and print.

Categories: programming, python

Leave a comment

Your email address will not be published. Required fields are marked *

Plain text only please, any < or > are removed.