I was looking recently at the Python module toolz
, a collection of convenience functions. A lot of these functions don’t do that much. They don’t save you much code, but they do make your code more readable by making it more declarative. You may not realize need them until you see them.
For example, there is a function partitionby
that breaks up a sequence at the points where a given function’s value changes. I’m pretty sure that function would have improved some code I’ve written recently, making it more declarative than procedural, but I can’t remember what that was.
Although I can’t think of my previous example, I can think of a new one, and that is Descartes’ rule of signs.
Given a polynomial p(x), read the non-zero coefficients in order and keep note of how many times they change sign, either from positive to negative or vice versa. Call that number n. Then the number of positive roots of p(x) either equals n or n minus a positive even number.
For example, suppose
p(x) = 4x4 + 3.1x3 − x2 − 2x + 6.
The coefficients are 4, 3.1, −1, −2, and 6. The list of coefficients changes signs twice: from positive to negative, and from negative to positive. Here’s a first pass at how you might have Python split the coefficients to look sign changes.
from toolz import partitionby coefficients = [4, 3.1, -1, -2, 6] parts = partitionby(lambda x: x > 0, coefficients) print([p for p in parts])
This prints
[(4, 3.1), (-1, -2), (6,)]
The first argument to partitionby
an anonymous function that tests whether its argument is positive. When this function changes value, we have a sign alteration. There are three groups of consecutive coefficients that have the same sign, so there are two times the signs change. So our polynomial either has two positive roots or no positive roots. (It turns out there are no positive roots.)
The code above isn’t quite right though, because Descartes said to only look at non-zero coefficients. If we change our anonymous function to
lambda x: x >= 0
that will work for zeros in the middle of positive coefficients, but it will give a false positive for zeros in the middle of negative coefficients. We can fix the code with a list comprehension. The following example works correctly.
coefficients = [4, 0, 3.1, -1, 0, -2, 6] nonzero = [c for c in coefficients if c != 0] parts = partitionby(lambda x: x > 0, nonzero) print([p for p in parts])
If our coefficients were in a NumPy array rather than a list, we could remove the zeros more succinctly.
from numpy import array c = array(coefficients) parts = partitionby(lambda x: x > 0, c[c != 0])
The function partitionby
returns an iterator rather than a list. That’s why we don’t just print parts
above. Instead we print [p for p in parts]
which makes a list. In applications, it’s often more efficient to have an iterator than a list, generating items if and when they are needed. If you don’t need all the items, you don’t have to generate them all. And even if you do need all the items, you could save memory by not keeping them all in memory at once. I’ll ignore such efficiencies here.
We don’t need the partitions per se, we just need to know how many there are. The example that escapes my mind would have been a better illustration if it needed to do more with each portion than just count it. We could count the number of sign alternations for Descartes rule as follows.
len([p for p in parts]) - 1
The pythonic idiom to create a list from an iterator is to use the list function rather than a list comprehension:
print(list(parts))
You have to be careful with iterators though, since converting them to a list will exhaust the iterator and leave it empty. So this won’t work the way you might expect when parts is an iterator rather than a sequence:
print(list(parts))
# this will print 0 now since the iterator has been exhausted
print(len(list(parts))
Using list comprehensions here will behave the same.
Probably it’s simplest to convert to a list right away:
parts = list(partitionby(lambda x: x > 0, nonzero))
Defensively using list when you don’t know whether you have a list or an iterator works since if x is a list then list(x) will return a copy of x. If x might be large and you’re worried about performance, there’s a somewhat opaque idiom that takes advantage of the fact that iter(x) returns x unchanged if x is already an iterator, so
if iter(x) is not x:
x = list(x)
will avoid making a copy of x if it is already a list.
There’s also starred unpacking, which is the most succinct:
>>> print(*range(5))
0 1 2 3 4
You could also have used one of the piping or threading functions to add the filter! One of my favorite aspects of toolz, actually – less Cognitive Load from naming things. Especially for those little “whoops, turns out I needed to do something upstream” moments.
So something like like this:
I think I’m actually gonna use this as an example for a talk I’m giving on Thursday on FP in Data Science, if that’s okay! https://www.toptal.com/community/events/2020-09-24/functional-programming-for-data-scientists