Generalized Möbius Inversion on Posets

This post was originally written on Codeforces; relevant discussion can be found here.

This post will be more about developing ideas about certain structures rather than actually coding up stuff. The ideas we’ll be developing here have many applications, some of which involve:

Number Theoretic Möbius Inversion
Principle of Inclusion-Exclusion
Directed Acylic Graphs
Subset Sum Convolution
Finite Difference Calculus (and prefix sums)

Note that some of the terminology used below might be non-standard. Also, some parts of the post make a reference to abstract algebra without much explanation; such parts are just for people who know some algebra, and can safely be skipped by those who don’t know those concepts at all.

Prerequisites:

Some knowledge about graphs.
Matrix multiplication
Basic combinatorics

Definition of a poset and some intuition

All of these have one thing in common: they have an underlying “poset”. Let’s understand what a poset is.

Rigorously speaking, a poset is a partially ordered set, i.e., a set \(S\) equipped with a relation \(\le\) that is reflexive, antisymmetric, and transitive.

More precisely,

\(\le\) is reflexive means that \(a \le a\) for all \(a\) in \(S\).
\(\le\) is antisymmetric means that if \(a \le b\) and \(b \le a\), then \(a = b\)
\(\le\) is transitive means that if \(a \le b\) and \(b \le c\), then \(a \le c\)

To get some intuition about posets, it’s helpful to think of them in terms of directed acyclic graphs (DAGs), equipped with the relation of reachability; i.e., vertices \(a\) is related to \(b\) iff there is a path from \(a\) to \(b\). In fact, there is always at least one DAG corresponding to every poset that is equivalent to it in terms of semantics.

To make this notion a bit clearer, we construct a “DAG” (we allow self-loops, so it is not precisely a DAG, but we can get rid of these self-loops) as follows: construct a vertex for each element of the set, and there is an edge from \(u\) to \(v\) iff \(u \le v\). For the lack of a better word, we shall call this the “relational DAG” of the poset. Note that this notation is not standard.

Remark: By the properties of a poset, this “DAG” has the property that if there is a path from \(u\) to \(v\), there is an edge between \(u\) and \(v\) (i.e., reachability and adjacency are equivalent in this graph). In other words, this graph is its own reflexive transitive closure. Also, a topological sort of this “DAG” (ignoring the self-loops) is called a linear extension of the poset (flattening out the “DAG” induces a natural linear order on the vertices, which are just the elements of the poset).

It is usually easier to look at this graph after removing all redundancies (i.e., if \(u \le v\) and \(v \le w\), then remove the edge \((u, w)\) from the graph if it exists). The resulting graph is called the Hasse diagram of the poset, and is super useful for reasoning about posets.

Note that it is NOT necessary that for any two elements \((u, v)\), one of \(u \le v\) and \(v \le u\) must hold. For instance, consider the poset given by the set \(S = \{a_1, a_2, a_3, a_4\}\) and the relation \(\le\) given by \(\{(a, a) \mid a \in S\} \cup \{(a_1, a_2), (a_1, a_3), (a_1, a_4), (a_2, a_4), (a_3, a_4)\}\). This satisfies the constraints that \(\le\) must be reflexive, antisymmetric and transitive; however, neither of \(a_2 \le a_3\) and \(a_3 \le a_2\) hold.

For the remainder of this post, it will help to think of \(\le\) in terms of the Hasse diagram of the poset.

Note: when we say \(a < b\), we mean that \(a \le b\) and \(a \ne b\).

Some examples of posets

Either of the sets \(\mathbb{R}, \mathbb{N}, \mathbb{Z}, \mathbb{Q}\) equipped with the usual \(\le\) is a poset.
For any set \(A\), the set of its subsets, equipped with the relation \(\subseteq\) (or \(\supseteq\)) is a poset.
The set \(\mathbb{N}\) equipped with the relation \(\mid\) (divisibility) is a poset.

The incidence algebra of a poset

Note for the pedantic reader: In what follows, we only look at locally finite posets (i.e., posets in which the set of \(z\) such that \(x \le z \le y\) is finite for all \(x, y\)). In particular, we can use the following ideas with infinite DAGs too, though it might be possible that infinitely large (but “locally sparse”) matrices may not make sense.

In particular, the following doesn’t apply to the posets \((\mathbb{R}, \le), (\mathbb{Q}, \le)\) and so on.

Let \((S, \le)\) be some poset. Consider the adjacency matrix \(M\) of its relational DAG. It has the property that if \(x \not\leq y\), then the entry at \((x, y)\) is \(0\), and otherwise it is \(1\). We can also interpret this matrix \(M\) as a function from \(S \times S \to \{0, 1\}\).

We generalize this notion a bit. Rather than having \(M(x, y) = 1\) whenever \(x \le y\), we assign arbitrary complex numbers to \(M(x, y)\) (which is equivalent to assigning arbitrary edge weights (possibly \(0\)) to edges in the relational DAG).

This leads to a set of functions \(\alpha : S \times S \to \mathbb{C}\), with the property that \(x \not\leq y \implies \alpha(x, y) = 0\). We call this the “incidence algebra” of the poset (for those who know what an algebra is, we’ll see why this is an algebra). We won’t distinguish between these functions and their corresponding matrices in what follows.

The main idea behind the whole approach is to look at these functions as matrices and extract useful information using matrix operations.

To define the “product” on this set of functions, we can simply use matrix multiplication to define its semantics. That is, for functions \(\alpha, \beta\) in the incidence algebra, we define the product as \((\alpha\beta)(x, y) = \sum_{z \in S} \alpha(x, z) \beta(z, y)\), which is some kind of convolution.

We can see that the incidence algebra is closed under product (intuitively, think of it as aggregating some combination of edge weights over paths of length 2; if there is no path from \(x\) to \(y\), the set of such paths is empty anyway, so the answer is still 0, the identity of the aggregation function which is \(+\)).

Similarly, we can define scalar multiplication and addition in terms of matrix operations, and see that the incidence algebra is closed under these operations too.

Important remark: Note that if \(f\) is a function from \(S\) to \(\mathbb{C}\), then a vector with \(f(x)\) at the position corresponding to \(x\) also satisfies the usual matrix multiplication identities with the elements of the incidence algebra, with the semantics of multiplication with a column vector being analogous to aggregating over paths that end at a given vertex (and for row vector left-multiplication, paths that end at a given vertex).

Optional remark: For those who know what an algebra is, matrix multiplication is a bilinear operator so product is the bilinear operator in the incidence algebra. A further generalization would be to define a different underlying field than \(\mathbb{C}\) and another bilinear operator, but we won’t look into this further.

Some special members of the incidence algebra

Okay, now that we have defined this set and some operations on it, let’s see some examples of members of this set, what information we can get out of these operations, and what kind of operations are “nice” in this context.

Firstly, the most obvious choice of a matrix is \(O\), the zero matrix (which is clearly a member of the incidence algebra). This doesn’t really give us any information though. It still has a role as the additive identity for addition in this algebra.

The second most obvious choice is \(I\), the identity matrix. Note that this is a member of the incidence algebra due to reflexivity of \(\le\). This is a multiplicative identity, and that’s something we will need for Möbius inversion.

Thirdly, let’s consider the adjacency matrix \(M\) of the relational DAG. Let the corresponding function be \(\zeta\). Clearly, this is also in the incidence algebra. We have the property that \(\zeta(x, y) = 1\) if \(x \le y\), and \(0\) otherwise.

Spoiler alert: the inverse of this matrix is the Möbius function for the poset and corresponds to things like the number theoretic Möbius function in the divisibility poset and so on.

The Möbius function

Finally, we define the Möbius function \(\mu\) as follows (It might not feel intuitive at first, but bear with me for a while):

For \(x \not\leq y\), \(\mu(x, y) = 0\), \(\mu(x, x) = 1 \, \forall x \in S\), and inductively define

\[\mu(x, y) = -\sum_{x \le z < y} \mu(x, z)\]

Note that \(\mu(x, y)\) is always an integer. Also, note that \(\sum_{x \le z \le y} \mu(x, z) = 1\) if \(x = y\) and \(0\) otherwise (in the case \(x < y\), it follows from the identity, in the case when \(x, y\) are not comparable, the sum is empty, and in the case when \(x = y\), it consists of only one term equal to \(1\)).

Hence we have for all \(x, y \in S\),

\[\sum_{z \in S} \mu(x, z) \zeta(z, y) = \sum_{x \le z \le y} \mu(x, z) = \begin{cases} 1 & \text{if } x = y\\ 0 & \text{otherwise} \end{cases} = I(x, y)\]

However, the expression on the left is just \((\mu \zeta)(x, y)\). Since this holds for all \(x, y\), we have \(\mu\zeta = I\).

As we mentioned earlier, this means that \(\mu\) is the inverse of \(\zeta\), so we must also have \(\zeta\mu = I\) (and expanding this gives \(\sum_{x \le z \le y} \mu(z, y)\) is \(1\) if \(x = y\) and \(0\) otherwise).

The Möbius inversion formula

The Möbius inversion formula in this setting just reduces to a simple identity corresponding to the multiplication of a matrix and a vector.

Formally, the Möbius inversion formula states that if \(f, g\) are functions from \(S\) to \(\mathbb{C}\) such that \(g(x) = \sum_{y \le x} f(y)\), then we have \(f(x) = \sum_{y \le x} \mu(y, x) g(y)\). The converse also holds true.

For a proof, consider a vector \(F\) with the element at position corresponding to \(x\) being \(f(x)\). Define \(G\) similarly.

Now the condition merely means that \(G = \zeta^\intercal F\). Left-multiplying this by \(\mu^\intercal\), we get \(\mu^\intercal G = \mu^\intercal \zeta^\intercal F = IF = F\). Expanding this gives the Möbius inversion formula. The converse can be proved by reversing all these steps, since \(\mu^\intercal\) is invertible.

Note that this formula is just a trivial consequence of some matrix multiplication properties; so there’s a lot more to the theory we have developed above than just this formula. However, we shall limit ourselves to only the applications of this formula to get to some interesting results.

Some applications

The generalized principle of inclusion-exclusion

Let’s first talk about the contexts in which we use generalized inclusion-exclusion. We usually want to compute some function \(f\) of a finite set \(A\), but it is much easier to compute the sum of \(f\) over all subsets of \(A\).

That is, it is easy to compute \(g(A) = \sum_{B \subseteq A} f(B)\), but hard to compute \(f(A)\) itself.

The generalized principle of inclusion-exclusion states that \(f(A) = \sum_{B \subseteq A} (-1)^{|A| - |B|} g(B)\).

For a proof, let’s look at the poset corresponding to the power set of a finite set \(F\) such that \(A \subseteq F\), equipped with the \(\subseteq\) relation. Clearly, using the Möbius inversion formula, we have \(f(A) = \sum_{B \subseteq A} \mu(B, A) g(B)\).

We claim that \(\mu(B, A) = (-1)^{|A| - |B|}\) for \(B \subseteq A\). We’ll do this by fixing \(B\) and doing an induction on \(|A|\).

For \(|A| = |B|\), this is clear. Now suppose this is true for all \(|A| < k\). Consider the case when \(|A| = k\). We have (from the definition of the Möbius function) that

\[\mu(B, A) = -\sum_{B \subseteq C \subsetneq A} (-1)^{|C| - |B|} = -\sum_{\emptyset \subseteq C \setminus B \subsetneq A \setminus B} (-1)^{|C \setminus B|} = (-1)^{|A| - |B|}\]

where the last equality follows from the binomial theorem, and the proof is complete.

The usual principle of inclusion-exclusion as a special case

Now let’s see how the usual inclusion-exclusion principle is just a special case of this generalized formula.

The inclusion-exclusion principle states that for sets \(A_1, A_2, \ldots, A_n\), we have

\[|A_1 \cup \cdots \cup A_n| = \sum_{1 \le i \le n} |A_i| - \sum_{1 \le i < j \le n} |A_i \cap A_j| + \cdots + (-1)^{n-1}|A_1 \cap \cdots \cap A_n|\]

For a proof, consider the poset corresponding to the power set of \([n] := \{1, \ldots, n\}\), equipped with \(\supseteq\).

Also define \(U = A_1 \cup \ldots \cup A_n\).

For a set \(S \subseteq [n]\), define

\(f(S) =\) number of elements in all \(A_i\), where \(i \in S\), but not in any other \(A_j\)-s.

\(g(S) =\) number of elements in all \(A_i\), where \(i \in S\), and possibly in other \(A_j\)-s.

Then we have \(g(S) = |\cap_{i \in S} A_i|\). Also, we have \(g(S) = \sum_{T \supseteq S} f(T)\). Using the Möbius inversion formula, we get

\[0 = f(\emptyset) = \sum_{A \supseteq \emptyset} \mu(\emptyset, A) g(A) = \sum_{A \in 2^{[n]}} (-1)^{|A|} |\cap_{i \in A} A_i|\]

which completes the proof.

Subset sum convolution

I won’t be explaining the ideas behind subset sum convolution in too much detail, but would refer you to this post that covers the necessary material.

The poset in consideration is the power set of \([n]\), and its elements are in bijection with \(n\) bit integers where the \(i^\mathrm{th}\) bit is \(1\) iff \(i\) is in the set.

The transforms \(z\), \(\mu\) are the \(\zeta\) and \(\mu\) functions here (note that a two-variable function in the incidence algebra is just a map that transforms a one-variable function to another one-variable function, which can be thought of as the multiplication of the matrix corresponding to the two-variable function with the vector corresponding to the one-variable function), and \(\sigma\) corresponds to a diagonal matrix with the element on the diagonal being 1 if the element of the poset has an even number of elements, and -1 otherwise.

The rest of that post can conveniently be studied in terms of the incidence algebra for this poset.

The number-theoretic Möbius function

For this, we will need to cover a bit more ground, since the divisibility poset has nicer properties compared to other posets (it is what is called a lattice).

What is a lattice

A lattice is a special kind of a poset, where for each pair of elements \((x, y)\), there exist elements \(x \land y\) (called the meet of \(x, y\)) and \(x \lor y\) (called the join of \(x, y\)) such that

\(z \le x\) and \(z \le y\) \(\implies\) \(z \le x \land y\)
\(x \land y \le x\) and \(x \land y \le y\)
\(x \le z\) and \(y \le z\) \(\implies\) \(x \lor y \le z\)
\(x \le x \lor y\) and \(y \le x \lor y\)

Intuitively, this basically means that each pair of elements in the Hasse diagram has a least common ancestor and highest common descendent. For instance, the power set poset and the divisibility poset are both lattices (exercise: what are the meet and the join functions for these lattices?).

Note that inductively, every finite subset of a lattice has a meet and a join. Also, due to antisymmetry of \(\le\), the meet and the join are unique for each pair \(x, y\). Hence for finite lattices \(L\), there is a unique “maximum” (say \(\top_L\)) and a unique “minimum” (say \(\bot_L\)).

A theorem for lattices

Theorem: For a lattice \((L, \le)\), and \(\bot_L < a \in L\), \(\sum_{x \lor a = \top_L} \mu(\bot_L, x) = 0\).

Essentially, this theorem states that the sum of the Möbius function applied on all \(\bot_L, x\) where the “LCA” of \(a\) and \(x\) is \(\top_L\) is 0.

For a proof, we first note that because of the second Möbius identity (using the product of \(\mu\) and \(\zeta\)), we have:

\[\sum_{x \lor a = \top_L} \mu(\bot_L, x) = \sum_{x} \mu(\bot_L, x) \sum_{x \lor a \le y} \mu(y, \top_L)\]

Using the definition of \(\lor\) and rearranging the sums reduces the sum to \[\sum_{a \le y} \mu(y, \top_L) \sum_{x \le y} \mu(\bot_L, x)\]

Using the first Möbius identity and noting that \(y\) is not \(\bot_L\), reduces this expression to \(0\), as needed.

Now let’s try to compute \(\mu(a, b)\) for \(a, b\) in the divisibility poset. Note that if \(a\) doesn’t divide \(b\) or if \(a > b\), this is just 0. So it suffices to consider the case where \(b = ax\) for some natural number \(x\).

Let’s look at all positive integers \(r\) such that \(1 \mid r \mid x\). This forms a lattice \(L\). Similarly, the set of positive integers \(r’\) such that \(a \mid r’ \mid ax\) also forms a lattice, which is isomorphic to \(L\). Also, since by the inductive definition, \(\mu(a, ar)\) depends only on the elements \(s\) such that \(a \mid s \mid ar\), so we must have \(\mu(a, ar) = \mu(1, r)\). So it suffices to compute \(\mu(1, x)\) for all \(x\).

We claim that for \(x\) being square-free, \(\mu(1, x)\) is \((-1)^k\) if \(x\) is a product of \(k\) distinct primes, and it is \(0\) otherwise. To show this, we induct on \(x\).

Note that \(1 = \bot_L\) and \(x = \top_L\). The claim is trivial for \(x = 1\). Now suppose \(p\) is a prime that divides \(x\). Using the previously proved theorem, we get \(\mu(1, x) = -\sum_{y \ne x, y \lor p = x} \mu(1, y)\).

If \(p^2\) divides \(b\), then the sum on the right is empty (as \(\lor\) is just the LCM). Otherwise, the sum contains only one \(y\), which equals \(x / p\), so using the inductive hypothesis, we are done.

As a side-note, for “reasonable” functions that only depend on the ratio between their parameters, we can “compress” them into a function that takes a single parameter. The “product” in that case becomes Dirichlet convolution, which has tons of interesting properties. A relevant post for that can be found here.

Also note that if we restrict ourselves to the divisibility posets of square-free numbers \(n\), they are isomorphic to the power set poset of the set of prime divisors of \(n\), so all power set posets are just special cases of divisibility posets!

Finite Difference Calculus

In this case, the poset is particularly simple, and it equals \((\mathbb{N} \cup \{0\}, \le)\). The function \(\mu\) takes a value of \(1\) at \((n, n)\), \(-1\) at \((n, n + 1)\), and \(0\) everywhere else. “Convolution” with \(\mu\) is equivalent to the difference operator, and using Möbius inversion, the prefix sum operator is that with \(\zeta\).

Final notes and remarks

Firstly, the content of this post might not be super-useful in competitive programming, but the intent behind this was to introduce a very standard technique in combinatorics that people don’t realize is super-flexible.

Secondly, it might not be easy to understand in a first reading, so it’s encouraged to work out the math through the post, and understand why and how these things are tied together. It will also help to make some Hasse diagrams of posets, and look at concrete examples of operations we described here.

Finally, there might be errors and typos in the post, as well as better ways to approach the content of the post. If you find any, please let me know in the comments!

Definition of a poset and some intuition#

Some examples of posets#

The incidence algebra of a poset#

Some special members of the incidence algebra#

The Möbius function#

The Möbius inversion formula#

Some applications#

The generalized principle of inclusion-exclusion#

The usual principle of inclusion-exclusion as a special case#

Subset sum convolution#

The number-theoretic Möbius function#

What is a lattice#

A theorem for lattices#

Finite Difference Calculus#

Final notes and remarks#