Louis Abraham's Home Page

Generating closures

29 Nov 2022

Last week, a link to the article Closures And Objects Are Equivalent was posted on Hacker News. It reminded me of an old university project I did that uses generators to generate closures (pun intended) and add caching mechanisms to online algorithms.

Closures are equivalent to objects

The article explains why closures and objects are equivalent. As a reminder, here is what a closure looks like:

def Counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

The article explains that the closure above is equivalent to the following object:

class Counter:
    def __init__(self):
        self.count = 0
    def __call__(self):
        self.count += 1
        return self.count

You might argue that the object allows to access the value of count directly, while the closure does not. This can be fixed:

def Counter():
    count = 0
    def counter(increment=True):
        nonlocal count
        count += increment
        return count
    return counter

Now, counter(False) will return the current value of count without incrementing it.

You might also argue that the object can implement multiple methods, while the closure can only implement one. While this is true, the closure can receive any argument and return any value, so it can be used to implement multiple methods:

def Counter():
    count = 0
    def counter(name="increment", *args, **kwargs):
        nonlocal count
        if name == 'increment':
            count += 1
            return count
        elif name == 'decrement':
            count -= 1
            return count
        elif name == 'get':
            nonlocal count
            return count
        else:
            raise ValueError(f'Unknown method: {name}')
    return counter

You can see how one would implement any “object” with a closure. Of course just because you can does not mean you should.

What about generators?

In general, people will use classes (objects) over closures, as they are more expressive: it is much nicer to write counter.decrement() than counter('decrement').

There is a special use case where a third option exists: iterators and generators.

Iterators are simply objects that implement the __next__ special method:

class Counter:
    def __init__(self):
        self.count = 0
    def __next__(self):
        self.count += 1
        return self.count

We can call an instance with: next(counter). This is equivalent to counter.__next__().

Nothing new here, but Python also happens to have a special syntax for iterators: generators. Generators are functions that use the yield keyword instead of return:

def Counter():
    count = 0
    while True:
        count += 1
        yield count

Generators as improved closures

Generators will produce an iterator when called, but they are more powerful than iterators because the yield keyword is embedded in the code itself, unlike a return statement from a __next__ method that just interrupts the execution of the function.

As an illustration, it is very easy to replicate a generator from a class-based iterator:

def generator(IteratorClass):
    iterator = IteratorClass()
    while True:
        yield next(it)
    # or simply:
    # yield from it
    # but that would hide the point

An iterator just has a state that is initialized by __init__ and updated by __next__. Hence we could easily transform the code above to remove the need for IteratorClass.

On the other hand, it is much harder from a refactoring point of view to replace a generator with a class-based iterator, aka forget about the yield statement, because is it very expressive. yield does multiple things, among others:

The last feature is the most powerful and unknown one. As such, generators are closures since they store both a function and an environment. But they are also more expressive than closures because they can resume execution at multiple points in the code: the instruction pointer is part of the state.

The point I wanted to make is that automatically capturing the environment of a function including the instruction pointer is a very powerful feature because it can semantically transform any function into a closure. This feature can be useful in some cases.

A practical example: AWESOME ONLINE MEMOISED DIJKSTRA

The best use I have seen of generators was in a university project I did 6 years ago: we had to compute a bunch of shortest paths in a large graph, during the computation of reach. We were given queries with starting and ending nodes, but we could not predict them in advance. However, the starting nodes were all close to the node that we wanted to compute the reach of, hence there were not a lot of them.

The assignment was quite computationally intensive and advised to use C++ or Java.

Since the graph was quite big (4.1M vertices, 9.4M directed edges), the only algorithm that could be used was Dijkstra’s algorithm.

I tried two optimizations that worked well but did the exact opposite:

But what if we could continue computations at the last known Dijkstra’s state?

That would give the best of two worlds (if given enough memory): no computation would be ever wasted since new queries would resume from the last known state, and the result would be returned as soon as possible.

To implement it, I could either:

Of course, the difference is not that big, but the second option is much more elegant and concise. The resulting code of an “AWESOME ONLINE MEMOISED DIJKSTRA” as I wrote in the docstring back then is stupidly small and simple to read:

@memoize_generator
def dijkstra_with_target(graph, start):
    target = yield
    dist = {}
    ans = None

    Q = [(0, start)]

    while Q:

        while ans is not None:
            target = yield ans
            ans = dist.get(target)

        d, s = heappop(Q)

        if not s in dist:
            dist[s] = d
            if s == target:
                ans = d
            for v in graph[s]:
                if not v in dist:
                    # facultative condition for optimisation
                    heappush(Q, (d + graph[s][v], v))

    while True:
        target = yield dist[target]

The generator is produced with gen = dijkstra_with_target(graph, source) and can be called with gen.send(target).

Since the generator is memoized, we can always call the same instance with a one-liner: dijkstra_with_target(graph, source).send(target).

I reproduced my implementation of memoize_generator below. The graph is not a hashable object so we have to remember it by its id.

def memoize_generator(f):
    @wraps(f)
    def g(*args):
        targs = tuple(a if isinstance(a, Hashable) else id(a)
                      for a in args)
        if not targs in g.mem:
            ans = f(*args)
            next(ans)
            g.mem[targs] = ans
        return g.mem[targs]
    g.mem = {}

    def clean():
        for i in g.mem:
            g.mem[i].close()
        g.mem = {}
    g.clean = clean

    return g

Note the first call to next that is used to initialize the generator. This is because else the first call to send would throw TypeError: can't send non-None value to a just-started generator.

I also included a clean function that forces the generators to release their memory. Note that g.mem = {} would be enough if the generators are not referenced anywhere else, since it calls the __del__ method of g, which calls the __del__ method of the generators, which is a wrapper for close() as PEP342 states.

In the end, my Python code (executed with PyPy) outperformed all C++ and Java implementations by an order of magnitude.