*This post was originally written on Codeforces; relevant discussion can be found here.*

**TL;DR**

- The Floyd-Warshall algorithm is a special case of the more general case of aggregation over paths in a graph — in this post, we look at a few examples and point to some references for further study.
- The algebraic path problem constitutes a family of problems that can be solved using similar techniques.
- This and this paper develop the theory of semirings in this context, and this treats some computational aspects of it.
- Kleene algebras are what end up being used in most of the examples given here, but semirings are more general structures.
- While not discussed in the post, the idea of a valuation algebra is a further generalization that shows connections to more problems of various types.

## Introduction

Most people doing competitive programming who learn about the Floyd-Warshall algorithm learn it in the context of computing the lengths of the shortest paths between all pairs of vertices in a graph on \(n\) vertices in \(O(n^3)\), which is better than naively running Dijkstra or other kinds of shortest-path algorithms (other than Johnson’s algorithm for the sparse case), and can also identify negative cycles.

However, this is not what happened historically. Kleene’s algorithm for converting a DFA to a regular expression was published in 1959, and the algorithm resembles Floyd-Warshall’s algorithm in many aspects. The transitive closure problem was first solved in a paper by Bernard Roy in 1959 and an algorithm to solve the all-pairs-shortest-paths (APSP) problem was mentioned as an extension of the algorithm, but this publication went unnoticed. Stephen Warshall solved the transitive closure problem in 1962, and Robert Floyd solved the APSP problem in a paper in 1962 — all of these used essentially the same algorithm. Peter Ingerman came up with the three nested loops algorithm that we know about and use today.

As it turns out, the basic idea behind these algorithms is the same. They’re all dynamic programming algorithms, and they use the same states as well. We’ll see a couple of these applications and then put forth a generalization that can be black-boxed. Feel free to skip through these sections if you already know about the connections between them.

## All-Pairs Shortest Paths (APSP)

The actual statement of all-pairs-shortest-paths is slightly different, but we rephrase it here in order to fit Floyd-Warshall’s capabilities.

## Original statement

Given a weighted directed graph \(G = (V, E, w)\), construct a matrix \(D\) with \(d_{ij}\) being the length of the shortest path from \(i\) to \(j\), where the length of a path is defined as the sum of weights of the edges in the path.

The Floyd-Warshall algorithm is applicable to the special case when the edge weights are non-negative.

**Statement:** Given a weighted directed graph \(G = (V, E, w)\),
construct a matrix \(D\) with \(d_{ij}\) being the length of the
shortest walk from \(i\) to \(j\), where the length of a walk is defined
as the sum of the weights of the edges (with multiplicity) in the walk.

There are a few natural questions that might pop up, and they’re addressed as follows:

## What about the undirected case?

An undirected graph is a special case of this setup — for an undirected edge \(\{u, v\}\), add directed edges \((u, v)\) and \((v, u)\).

## Is D always well-defined?

When \(w\) is such that there is a negative cycle (i.e. a cycle with sum of weights of its edges being negative), then for some pairs of vertices, it is possible that \(d_{ij}\) is not defined (i.e., given a walk between two vertices, we can always find a shorter walk). For example, consider vertices \(u\) and \(v\) such that the negative cycle is reachable from \(u\) and \(v\) is reachable from the negative cycle. Then we can go around the negative cycle any number of times to get an arbitrarily short path. It can also be shown that this is the only case where such a thing happens. Fortunately, it is possible to detect this scenario using the output of the algorithm.

## Why do we want a shortest walk and not a shortest path?

Informally, a walk is a sequence of edges connected at their endpoints, and all edges (when connected at the endpoints) point in the same direction along the sequence. By most definitions, a path is a walk without any repeated vertices.

So the shortest path problem would need a shortest walk without any repeated vertices. However, when negative edge weights are allowed, even in an undirected setting, you can reduce the Hamiltonian path problem to the decision version of this problem, and hence the decision version of the shortest path problem here is NP complete. For the reduction, this would be a good resource.

However, in the common case of non-negative edge weights, shortest path is equivalent to shortest walk (by eliminating any cycles).

Now that these are out of our way, we will make the following simplifying assumption, that will make the output of the algorithm the answer to the problem:

- There is no negative cycle in the graph.

## Note for undirected graphs

Note that in the case of undirected graphs, this also rules out the
possibility of having a negative edge, since the edges \((u, v)\) and
\((v, u)\) form a negative cycle. If you want a shortest *path* with a
negative edge, that is NP complete too, as mentioned earlier.

Now let’s look at how we would solve the problem using a recursion.
Let’s order the vertices arbitrarily (say \(1, 2, \dots, n\)). Define
\(d(i, j, k)\) as the length of the shortest walk between \(i\) and
\(j\) subject to the constraint that all vertices between \(i\) and
\(j\) (excluding both) on the walk must be indexed with something
\(\le k\). Let’s call such a shortest walk a *\(k\)-good walk*.

The base case is \(d(i, j, 0)\) — which corresponds to a single edge or none. By our assumption that there are no negative cycles (in particular, no negative self-loops), \(d(i, i, 0) = 0\). Similarly, \(d(i, j, 0)\) is \(\infty\) or \(w(i, j)\) depending on whether or not there is an edge between \(i\) and \(j\). Later, we will again use the fact that it is always optimal to ignore self-loops.

Let’s say we have \(d(i, j, k - 1)\) for all \(i, j\). How do we extend this to \(d(i, j, k)\)? Consider the following two cases:

- There is a \(k-1\)-good walk that is also \(k\)-good: In this case, we have \(d(i, j, k) = d(i, j, k - 1)\).
- There is no \(k-1\)-good walk that is also \(k\)-good: In this case, \(k\) must show up on a \(k\)-good walk. Consider a \(k\)-good walk with the smallest number of appearances of \(k\). We claim that \(k\) appears exactly once on such a walk, for if this was not the case, we could have clipped cycles one by one from this walk, never increasing the length of the walk obtained at each step (as there are no negative cycles), and end up with at most a single copy of every vertex. Hence, \(d(i, j, k) = d(i, k, k - 1) + d(k, j, k - 1)\) (it is optimal to ignore any self-loop at \(k\)).

Hence, an algorithm would look like this:

```
Initialize d(i, j, 0) for all i, j
for k = 1 to n:
for i = 1 to n:
for j = 1 to n:
d(i, j, k) = min(d(i, j, k - 1), d(i, k, k - 1) + d(k, j, k - 1))
```

We can get away with just \(O(n^2)\) memory by maintaining two matrices \(d\): one for \(k - 1\) and the other for \(k\). We can also get away with a single matrix in this special case, because the upper cap on \(d(i, j, k)\) by \(d(i, j, k - 1)\) implies that we will never make the answer to any future value of \(d\) worse, and such a value corresponds to a valid walk from \(i\) to \(j\) with all internal vertices of the walk being \(\le k\) — hence the value isn’t smaller than the actual value either. So this is indeed the correct value.

So the algorithm reduces to the following:

```
Initialize d(i, j) for all i, j
for k = 1 to n:
for i = 1 to n:
for j = 1 to n:
d(i, j) = min(d(i, j), d(i, k) + d(k, j))
```

I would refer you to this cp-algorithms article for negative cycle detection and shortest path reconstruction using the same algorithm.

## Transitive closure

**Statement:** The transitive closure of a binary relation \(R\) on a set
\(X\) is the smallest relation \(S\) on \(X\) that contains \(R\) and is
transitive, i.e., if \(x S y\) and \(y S z\), then \(x S z\). Compute
the transitive closure of a given binary relation \(R\) on a finite set
\(X\) of \(n\) elements.

In other words, if we treat the relation as a graph, with vertices being the elements of \(X\) and the directed edges \((u, v)\) existing iff \(uRv\), then we just need to compute the pairs \((u, v)\) of vertices such that there is a path from \(u\) to \(v\) of length at least \(1\).

All of the following approaches need a few minor changes for paths of length at least \(1\), but they work without any changes for the reflexive transitive closure problem (where \(S\) is also required to be reflexive).

This can independently be solved using the APSP setup we developed before, as well as some other ad-hoc algorithms. Let’s describe a few approaches:

**Reduction to APSP:**for every edge \((u, v)\), assign it a cost of \(0\) (and the cost of all other pairs is \(\infty\) as usual). Then there is a path from \(u\) to \(v\) iff the shortest walk from \(u\) to \(v\) has length \(0\).

This is equivalent to the following:

```
for k = 1 to n
for i = 1 to n
for j = 1 to n
reachable(i, j) = reachable(i, j) or (reachable(i, k) and reachable(k, j))
```

**BFS**: Running a BFS for every vertex leads to an \(O(VE)\) algorithm.**SCCs and finding reachable nodes in a DAG**: As the name suggests, we preprocess the graph by condensing strongly connected components into one vertex each, and then apply the first technique here.**Algebra**: Let’s consider one iteration of a level-wise traversal (rather than a BFS traversal) beginning at a vertex \(i\). Suppose we have a boolean vector whose \(j\)-th element tells us whether it is reachable from \(i\) or not using the currently traversed edges. Consider the variant of “matrix multiplication” where \(+\) is replaced by boolean OR and \(\cdot\) is replaced by boolean AND. If we “multiply” this boolean vector by the adjacency matrix \(M\) of the graph (with diagonal elements being \(1\)), then the resulting vector corresponds to the next level of the level-wise traversal. Starting with the vectors \((0, 0, \dots, 1, \dots, 0)\) where \(1\) is at the \(i\)-th position, and multiplying them by \(M^{n-1}\) (exponentiation done by this “matrix multiplication” — which is also associative) gives us the “reachability vector” from vertex \(i\) (since there are at most \(n-1\) levels). In other words, the answer is just \(M^{n-1}\).

Take a moment to consider how the reduction to APSP and the matrix exponentiation algorithm are related — it is left as an exercise for the reader.

## Kleene’s algorithm

**Statement**: Given a DFA \(D\), find a regular expression that accepts
the same language as \(D\).

A very short description of what we want to do in this problem (after reducing some obvious cases to fit the current context): we’re given a graph \(G\) with labelled edges (such that labels for outgoing edges from a vertex are distinct), and there is a source and a sink. We want to find a regular expression that recognizes all possible strings formed by sequentially reading labels on edges of a walk from the source to the sink.

At this point, it should be obvious how we’ll approach this problem: let \(r(i, j, k)\) be a regular expression that recognizes all strings formed by a walk from vertices \(i\) to \(j\) with all internal vertices on the path being \(\le k\). The DP transitions are left as an exercise to the reader, and if you get stuck somewhere, refer to this. Remember that self-loops are now important.

## Generalization

In all of these problems, you might have noticed a pattern. There are
aggregations *across* paths and aggregations *along* paths. These also
need to be compatible in certain ways (i.e., we can combine the results
of aggregations along paths with some kind of distributivity) — for
instance, in the Floyd-Warshall algorithm, we needed the `min`

operation
to be compatible with the `+`

.

### A restrictive generalization: Kleene algebras

Kleene algebras are the underlying structures on which the Floyd-Warshall algorithm works. For the ease of presentation, we will first look at the definition (you can pick up a proof of the Floyd Warshall algorithm as applied to the problems above, and derive these conditions yourself, but it’ll be quite a bit of work):

A Kleene algebra is a set \(K\) with two binary operators \(+, \cdot : K \times K \to K\), and a unary operator \(* : K \to K\), that satisfies the following axioms:

- Semiring axioms on \((K, +, \cdot)\): in other words,
\((K, +, \cdot)\) should satisfy the following properties for
\(a, b, c \in K\):
- Associativity of \(+\): \(a + (b + c) = (a + b) + c\) (allows us to unambiguously write \(a + b + c\))
- Commutativity of \(+\): \(a + b = b + a\)
- Existence of additive identity: \(\exists 0 \in K \ni a + 0 = a \forall a \in K\)
- Associativity of \(\cdot\): \(a(bc) = (ab)c\). (allows us to unambiguously write \(abc\))
- Existence of multiplicative identity: \(\exists 1 \in K \ni a1 = 1a = a \forall a \in K\).
- Additive identity as an annihilator: \(a0 = 0a = 0\).
- Left and right distributivity of \(\cdot\) over \(+\): \(a(b + c) = ab + ac\) and \((b + c)a = ba + ca\).

- Idempotency of \(+\): That is, \(a + a = a\) for all \(a \in K\) — this leads to a partial order \(\le\) on \(K\) defined by \(a + b = b \iff a \le b\). Checking that this satisfies reflexivity, antisymmetry and transitivity is fairly straightforward, and follows from idempotence, commutativity and associativity of \(+\).
- Kleene axioms on \(*\):
- \(1 + aa^* \le a^*\)
- \(1 + a^*a \le a^*\)
- \(ax \le x \implies a^*x \le x\)
- \(xa \le x \implies xa^* \le x\)

Note that given a structure without \(*\) but satisfying the first two
properties (which is called an *idempotent semiring*, and in general, a
structure satisfying the first property is called a *semiring*), we can
derive a natural closure operator \(*\) as the following (it is easy to
check that it satisfies the above properties):

\[a^* = \sum_{i=0}^\infty a^i = 1 + a + aa + aaa + \dots\]

Obviously, this definition is subject to the expression on the right being well-defined (you need a concept of an infinite sum, which inevitably brings the idea of a limit into the picture, but I won’t go into those details).

Now that we have this definition of a Kleene algebra, let’s look at the underlying Kleene algebras for the above problems:

**APSP**: Here, \(K = \mathbb{R} \cup \{\infty\}\), \(+\) is the usual \(\min\), \(\cdot\) is the usual \(+\), \(1\) is the usual \(0\), \(0\) is the usual \(\infty\) and \(*\) is the natural closure operator of this semiring. This idempotent semiring is also called the tropical semiring, and tropical geometry which arose from the study of this semiring is an active field of study. This paper is a great introduction to idempotent/tropical mathematics that I recommend for reading if you are interested in this semiring. This semiring has seen some activity in competitive programming as well — for instance, the min-plus/max-plus convolution was discussed in this post.**Transitive closure**: Here, \(K = \{0, 1\}\), \(+\) is the usual OR, \(\cdot\) is the usual AND, \(0\) is the usual \(0\), \(1\) is the usual \(1\), and \(*\) is the natural closure operator of this semiring. When another operation, the logical NOT, is added to this semiring, it is called the Boolean algebra. It is quite well-studied, and the first chapter of this book is a great reference on some parts of research happening in this field.**Kleene’s algorithm**: Here, \(K\) is the set of regular expressions over an alphabet, \(+\) is the usual \(\cup\), \(\cdot\) is the usual concatenation, \(0\) is \(\emptyset\) (the usual null regular expression), \(1\) is \(\varepsilon\) (the usual empty word), and \(*\) is the natural closure operator of this semiring, which also turns out to be the Kleene closure.

As we will see below, the algebraic path problem will capture all of these problems into one single problem. Note that you don’t really need idempotence in defining the problem (or even defining the natural closure), but it is needed for the algorithm to work.

Consider the following problem on an idempotent semiring:

Given a weighted directed graph with \(n\) vertices and with edge weights coming from an idempotent semiring, find the sum of weights of all walks from a vertex \(i\) to another vertex \(j\). The weight of a walk is defined as the product of weights along the walk.

Note that \(n \times n\) matrices with entries coming from an idempotent semiring also form an idempotent semiring, as should be easy to verify. Now the adjacency matrix \(M\) of this graph is a member of this idempotent semiring too (and correspondingly, its natural Kleene algebra).

Using induction, we see that sum of weights of walks of length \(k\) from vertex \(i\) to vertex \(j\) are the \((i, j)\)-th entries of \(M^k\). So the sum of weights over all walks corresponds to the matrix \(\sum_{i=0}^\infty M^i = M^*\).

This works for all idempotent semirings, and Kleene’s algorithm works as a drop-in replacement for the Kleene algebra induced by any idempotent semiring.

As a consequence, we also get the fact that for any idempotent semiring, Kleene’s algorithm leads to an efficient algorithm for computing the natural closure of any matrix on that idempotent semiring.

**Application to transitive reduction:** Note that this is quite useful
for reasoning about transitive reduction as well. \(M \cdot M^*\) in the
Boolean algebra tells us whether there is a path of length at least 2
between two vertices, so the transitive reduction is just
\(M \land \lnot(M \cdot M^*)\).

### Further reading

To avoid making this post too long and technical, I would recommend this paper and this paper for treatments that consider matrix algorithms reminiscent of Gaussian elimination to solve path problems using matrix algebra in the semiring context. You can refer to this paper that treats some computational aspects of it. The idea of a valuation algebra is a further generalization that shows connections to more problems of various types.

## Problems

There aren’t really many problems based on this topic (let’s call them Kleene-type algorithms) that I know of, but here are a few that require similar methods. Let me know in the comments if you know some more related problems.

- Given the capacities of edges in a directed graph, find the maximum amount of flow that you can pass through a single path from the source to the sink, for every possible choice of source and sink vertices.
- Given the weights of edges in a directed graph, find the minimum possible weight of the heaviest edge on any path from the source to the sink, for every possible choice of source and sink vertices.
- https://codeforces.com/contest/590/problem/E