There are several ways to quote strings in Python. Triple quotes let strings span multiple lines. Line breaks in your source file become line break characters in your string. A triple-quoted string in Python acts something like “here doc” in other languages.
However, Python’s indentation rules complicate matters because the indentation becomes part of the quoted string. For example, suppose you have the following code outside of a function.
x = """\ abc def ghi """
Then you move this into a function foo
and change its name to y
.
def foo(): y = """\ abc def ghi """
Now x
and y
are different strings! The former begins with a
and the latter begins with four spaces. (The backslash after the opening triple quote prevents the following newline from being part of the quoted string. Otherwise x
and y
would begin with a newline.) The string y
also has four spaces in front of def
and four spaces in front of ghi
. You can’t push the string contents to the left margin because that would violate Python’s formatting rules. (Update: Oh yes you can! See Aaron Meurer’s comment below.)
We now give three solutions to this problem.
Solution 1: textwrap.dedent
There is a function in the Python standard library that will strip the unwanted space out of the string y
.
import textwrap def foo(): y = """\ abc def ghi """ y = textwrap.dedent(y)
This works, but in my opinion a better approach is to use regular expressions [1].
Solution 2: Regular expression with a flag
We want to remove white space, and the regular expression for a white space character is \s
. We want to remove one or more white spaces so we add a +
on the end. But in general we don’t want to remove all white space, just white space at the beginning of a line, so we stick ^
on the front to say we want to match white space at the beginning of a line.
import re def foo(): y = """\ abc def ghi """ y = re.sub("^\s+", "", y)
Unfortunately this doesn’t work. By default ^
only matches the beginning of a string, not the beginning of a line. So it will only remove the white space in front of the first line; there will still be white space in front of the following lines.
One solution is to add the flag re.MULTILINE
to the substitution function. This will signal that we want ^
to match the beginning of every line
in our multi-line string.
y = re.sub("^\s+", "", y, re.MULTILINE)
Unfortunately that doesn’t quite work either! The fourth positional argument to re.sub
is a count of how many substitutions to make. It defaults to 0, which actually means infinity, i.e. replace all occurrences. You could set count
to 1 to replace only the first occurrence, for example. If we’re not going to specify count
we have to set flags
by name rather than by position, i.e. the line above should be
y = re.sub("^\s+", "", y, flags=re.MULTILINE)
That works.
You could also abbreviate re.MULTILINE
to re.M
. The former is more explicit and the latter is more compact. To each his own. There’s more than one way to do it. [2]
Solution 3: Regular expression with a modifier
In my opinion, it is better to modify the regular expression itself than to pass in a flag. The modifier (?m)
specifies that in the rest of the regular the ^
character should match the beginning of each line.
y = re.sub("(?m)^\s+", "", y)
One reason I believe this is better is that moves information from a language-specific implementation of regular expressions into a regular expression syntax that is supported in many programming languages.
For example, the regular expression
(?m)^\s+
would have the same meaning in Perl and Python. The two languages have the same way of expressing modifiers [3], but different ways of expressing flags. In Perl you paste an m
on the end of a match operator to accomplish what Python does with setting flasgs=re.MULTILINE
.
One of the most commonly used modifiers is (?i)
to indicate that a regular expression should match in a case-insensitive manner. Perl and Python (and other languages) accept (?i)
in a regular expression, but each language has its own way of adding modifiers. Perl adds an i
after the match operator, and Python uses
flags=re.IGNORECASE
or
flags=re.I
as a function argument.
More on regular expressions
- Regular expressions in Perl and Python
- Regular expressions with Hebrew and Greek
- Why are regular expressions difficult
[1] Yes, I’ve heard the quip about two problems. It’s funny, but it’s not a universal law.
[2] “There’s more than one way to do it” is a mantra of Perl and contradicts The Zen of Python. I use the line here as a good-natured jab at Python. Despite its stated ideals, Python has more in common with Perl than it would like to admit and continues to adopt ideas from Perl.
[3] Python’s re
module doesn’t support every regular expression modifier that Perl supports. I don’t know about Python’s regex
module.
Heh. I wondered if it was deliberate when you invoked TIMTOWTDI. I think of Python and Perl as two brothers who disagree more intensely *because* they are so similar.
I use similar techniques to convert canonical JSON to single-line form. Didn’t know about (?m). Thanks!
ICYMI: indentation is not required for multiline strings, so you can have exactly the whitespace you want. Emacs’s Python mode e.g. handles this correctly. Unindented multiline string literals in otherwise indented code are not exactly pretty, however. People tend to put them at the unindented top-level for that reason, or, alternatively, use multiple string fragments and explicit newlines (\n) with line continuations or in parentheses.
The regex solution has the downside that any significant whitespace at the beginning of a line (e.g. indentation) will also be removed. Dedent only removes leading whitespace matching that of the line where the triple-quoted string begins.
@Waldir: Thanks. I didn’t think about that. I’ve only used triple quoted strings that are flush left, but I could see how you might quote code or prose that has its own levels of indentation.
You could change the regex from
\s+
to\s{4}
to remove four spaces, but of course then the number would have to change if the level of indentation changes. Butdedent
keeps track of that for you.I would prefer inspect.cleandoc() over textwrap.dedent() as the former deals correctly with the first line. Also if you have text with different indentation it only removes the beseline for all lines, not all the leading space of each line.
—-
def cleandoc(doc):
“””Clean up indentation from docstrings.
Any whitespace that can be uniformly removed from the second line
onwards is removed.”””
try:
lines = doc.expandtabs().split(‘\n’)
except UnicodeError:
return None
else:
# Find minimum indentation of any non-blank lines after first line.
margin = sys.maxsize
for line in lines[1:]:
content = len(line.lstrip())
if content:
indent = len(line) – content
margin = min(margin, indent)
# Remove indentation.
if lines:
lines[0] = lines[0].lstrip()
if margin < sys.maxsize:
for i in range(1, len(lines)): lines[i] = lines[i][margin:]
# Remove any trailing or leading blank lines.
while lines and not lines[-1]:
lines.pop()
while lines and not lines[0]:
lines.pop(0)
return '\n'.join(lines)
—
> You can’t push the string contents to the left margin because that would violate Python’s formatting rules.
Yes you can. I do this all the time.
Thanks! I didn’t realize that. Was it not allowed at some point in the past? Maybe I just assumed it wouldn’t work. Will update the post.