*This post was originally written on Codeforces, relevant discussion can be found here.*

This post is intended to be an introduction to permutations for beginners, as there seem to be no resources other than cryptic (for beginners) editorials that talk about some cycles/composition, and people unfamiliar with permutations are left wondering what those things are.

We’ll break the post into three major parts based on how we most commonly look at permutations. Of course, even though we will treat these approaches separately (though not so much for the last two approaches), in many problems, you might need to use multiple approaches at the same time and maybe even something completely new. Usually, you’d use these ideas in conjunction with something like greedy/DP after you realize the underlying setup.

An ordering of the sections in decreasing order of importance according to the current permutations meta would be: cycle decomposition, permutation composition, ordering.

However, the topics are introduced in a way so that things make more sense, because there are some dependencies in definitions.

## Table of contents

- Definition, basic properties, and notation
- Definition
- Two-line notation
- Permutation invariants

- The ordering perspective
- Fixed points
- Derangements
- Inversions
- Longest increasing subsequences, and Erdos Szekeres theorem
- Next permutation
- Random permutations

- The cycle decomposition perspective
- Cycles
- Cycle notation
- Finding \(k\)-th next, functional graphs
- Swaps/transpositions
- Swaps on cycles

- The permutation composition perspective
- Definition
- Examples — cycles and swaps
- Parity of permutations
- Parity under composition
- Inverse of permutations
- Involutions
- \(k\)-th power of permutations
- Order of permutations
- Square-root of permutations
- Conjugation and conjugacy classes

- Miscellaneous topics (brief discussion)
- Group theory
- Counting: Burnside, Polya enumeration, Stirling numbers
- Coordinate compression
- Connected component DP
- Young tableaus
- Schreier Sims algorithm
- Permutation tree

- Problems

## Definition, basic properties, and notation

To start with, here’s the definition of a permutation that you might find in many problems on Codeforces:

**Definition:** A permutation of size \(n\) is an array of size \(n\)
where each integer from \(1\) to \(n\) appears exactly once.

But why do we care about these arrays/sequences? The reason behind this is simple.

Consider any sequence \(a\) of \(n\) integers. We can index them with integers from \(1\) to \(n\). So the \(i\)-th integer in the sequence is \(a_i\). To understand the following, it helps to consider the example where \(a_i = i\) for all \(i\).

Let’s now shuffle this sequence (elements are allowed to stay at their original places). We will try to construct a permutation from this shuffling in two ways:

- What is the new index of the element that was initially at position \(i\)? (In the case when \(a_i = i\), this just means finding the position of \(i\) in the new array). Let’s say this index was \(f(i)\). Then \(f\) is a function from \(\{1, 2, \dots, n\}\) to itself. Now notice that when we write out the array \([f(1), f(2), \dots, f(n)]\), this satisfies the definition of a permutation! And we can easily construct a shuffling operation corresponding to every permutation just by reversing this process.
- What was the old index of the element that is now at position \(i\)? (In the case when \(a_i = i\), this just means looking at position \(i\) in the new array and reporting the number there). Let’s say this index is \(g(i)\). Then \(g\) is again a function from \(\{1, 2, \dots, n\}\) to itself. When we write out the array \([g(1), g(2), \dots, g(n)]\), this is again a permutation! Note that \(f\) and \(g\) are not defined the same way, and the generated permutations are, in fact, distinct. In fact, it should not be hard to see that \(f(g(x)) = g(f(x)) = x\) for any \(x\) in \(\{1, 2, \dots, n\}\).

This means shuffling around elements (or **permuting** them, in standard
terminology) is another way to think about permutations.

This shows us how permutations are related to functions from
\(\{1, 2, \dots, n\}\) to itself such that all images are distinct
(and all elements have a unique pre-image). In other words, a
permutation can be thought of simply as a bijection on a set, which is
the most natural definition to use for quite a few applications. The
second interpretation is what corresponds to this definition, and the
thing in the first definition is what we sometimes call the **position
array** of the permutation in the second definition (this name is not
standard). We will later see that it is what is called the **inverse** of
the permutation.

Note that the above explanation is quite important for having an intuition on what a permutation is. If you don’t understand some parts, I recommend re-reading this and trying a few simple examples until you’re comfortable with what the permutation and functions \(f\) and \(g\) have to do with each other.

#### Two-line notation

Note that a permutation \([a_1, a_2, \dots, a_n]\) of \([1, 2, \dots, n]\) corresponds to a function \(f\) on \(\{1, 2, \dots, n\}\) defined by \(f(i) = a_i\). Implicitly speaking, the set of pairs \((i, a_i)\) uniquely determines the permutation (it is just the set of mappings from position to element) and vice versa. To understand compositions and inverses (which will be introduced later on) better, the following notation can come in handy:

Here we have just arranged the pairs vertically, from left to right. Note that these pairs can be shuffled around (like dominoes), and this can lead to some nice properties:

\[\begin{pmatrix} 1 & 2 & \cdots & n \\\ a_1 & a_2 & \cdots & a_n \end{pmatrix}\]

- What happens when we invert the pairs (that is, replace \((i, a_i)\) by \((a_i, i)\) for all \(i\), which corresponds to swapping the two rows)? If the shown permutation corresponds to the function \(g\), the permutation after inverting the pairs corresponds to the function \(f\)! This should be clear by looking at how \(f\) and \(g\) were defined earlier. So this sort of operation leads to a function inverse.
- What happens if we try to combine two permutations by trying to match the lower half of the first permutation \(p_1\) with the upper half of the second permutation \(p_2\)? If the \(g\) for \(p_1\) is \(g_1\), and that for \(p_2\) is \(g_2\), then it can be verified that the new permutation’s \(g\) is just the function formed by applying \(g_1\), then \(g_2\). So this sort of operation leads to function composition.

#### Permutation invariants

Note that when a permutation is sorted, it leads to \([1, 2, \dots, n]\). So any accumulation operation on the permutation array (like sum, product, xor, sum of squares, number of odd integers, etc.) that doesn’t rely on the order of elements in an array will give the same value for all permutations of the same length.

## The ordering perspective

#### Fixed points

A fixed point of a permutation \(a\) is an index \(i\) such that \(a_i = i\). These are essentially the values that are not affected by the permutation at all, so if we disregard what happens to these values, we won’t be losing out on any information for that permutation \(a\). This is also why these \(i\)-s are sometimes skipped in the two-line notation (and also in the cycle notation, which will be introduced in the next section).

**Note:** If you are familiar with cycles or are on your second reading,
you might note that this idea is probably more appropriate for cycles
and compositions, but I kept it in this section since it shares some
ideas with other subsections in this section. The same goes for the next
section.

#### Derangements

A derangement is a permutation with no fixed points. That is, for every \(i\), we have \(a_i \ne i\). One useful thing to know is how many of the \(n!\) permutations of size \(n\) are derangements. Let’s say this number is \(D_n\). There are at least three distinct ways to count them:

Solving linear equations

In this approach, we group all possible permutations by the number of their fixed points. If there are \(i\) fixed points, then upon relabeling the remaining \(n - i\) elements from \(1\) to \(n - i\), the resulting permutation is a derangement. So, the number of permutations on \(n\) elements with \(i\) fixed points is exactly \(\binom{n}{i} \cdot D_{n - i}\).

Summing this over all \(i\) gives us the identity \(n! = \sum_{i = 0}^n \binom{n}{i} \cdot D_{n - i}\).

Along with the fact that \(D_0 = 1\) and \(D_1 = 0\), we can use induction to show that \(D_n = n! \cdot \sum_{i = 0}^n \frac{(-1)^i}{i!}\). We can also use the following identity to come up with the same result:

## Special case of Mobius inversion

If there are functions \(f\) and \(g\) from \(\mathbb{Z}_{\ge 0}\) to \(\mathbb{R}\), then the following two statements are equivalent:

- \(g(n) = \sum_{i = 0}^n (-1)^i \binom{n}{i} f(i)\)
- \(f(n) = \sum_{i = 0}^n (-1)^i \binom{n}{i} g(i)\)

Recursion

Let’s count the number of derangements using a recursion. Suppose \(a_n = k \ne n\). We look at \(a_k\). If \(a_k\) is \(n\), then there are \(D_{n - 2}\) ways of assigning the remaining numbers to the permutation. If \(a_k\) is not \(n\), then we note that this reduces to the number of derangements on \(n - 1\) numbers, by renaming \(n\) (to be assigned in the remaining \(n - 2\) slots) to \(k\).

So the recurrence becomes \(D_n = (n - 1) \cdot (D_{n - 1} + D_{n - 2})\), and this can be solved using induction too.

Using the principle of inclusion and exclusion

We can use an inclusion-exclusion argument to solve this problem too. Let \(S_i\) be the set of permutations that leave the \(i\)-th element fixed (the remaining are free to be permuted).

Then we need \(n! - |\cup_i S_i|\). Using the inclusion-exclusion principle, since the intersection of any \(k\) different \(S_i\)’s has size \((n - k)!\), we again get the same expression for \(D_n\).

#### Inversions

An inversion in an array (not necessarily a permutation) is any pair of indices \((i, j)\) such that \(i < j\) and \(a_i > a_j\). What is the minimum number of inversions? It is 0, because you can just enforce \(a_i = i\). What is the maximum number of inversions? It is \(n(n - 1)/2\) because you can just enforce \(a_i = n - i + 1\), and then every unordered pair of distinct indices corresponds to an inversion.

Let’s consider the following task: For a given permutation (or an array with distinct elements), you are allowed to do the following operation any number of times:

Pick any \(1 \le i < n\) and swap \(a_i\) and \(a_{i + 1}\).

Is it possible to sort this array using these operations only?

If yes, what is the minimum number of operations if you want to sort this array using these operations only?

Note that if it is possible for a permutation, it is also possible for any array with distinct elements (we can just replace the elements in the array by their ranks when sorting in increasing order).

The answer to the first part is yes. The main idea is to pick the smallest element and keep swapping it with the previous element until it reaches the first position. Then do the same thing recursively for \([2, \dots, n]\).

Let’s think about the number of operations here. How many operations do we do in the first phase? Note that everything coming before \(1\) in the original permutation contributed to an inversion where the second element was \(1\). Now after \(1\) is in its intended place, there will be no more inversions that are related to \(1\) in any way. For the rest of the permutation, we can do this argument recursively. But for that, we need to see what happens to inversions where \(1\) was not involved. Note that the relative order of no other pair changes, so the other types of inversions remain the same! So, the total number of operations is clearly the number of inversions in the permutation.

Is this the minimum number of operations you need to sort the array? The answer turns out to be yes. Consider any operation that is done. Since it flips the order of two adjacent things, there is at most one inversion it can reduce. So the decrease in the number of inversions is at most \(1\). In a sorted array, there are no inversions. So the number of operations must be at least the number of inversions in the original permutation, and thus we are done.

Now you might think — how do we actually count the number of inversions in a permutation? The answer is you can do that using merge sort or by some point-update-range-sum data structure like Fenwick tree or segment tree. The merge sort solution can be found here, and for the data structure solution, consider the following:

- Initially, you have an array \(A\) of size \(n\) filled with \(0\) (on which the data structure will be constructed).
- For each \(i\) starting from \(1\) to \(n\), find the sum of the prefix of \(A\) of size \(a_i - 1\) (that is, sum of \(A_j\) for \(0 \le j < a_i\)), and add it to the answer. Increment \(A_i\) by \(1\).

#### Longest increasing subsequences and Erdos Szekeres theorem

An increasing subsequence of an array \(a\) (not necessarily a permutation) is a sequence of indices \(i_1 < i_2 < \dots < i_k\) such that \(a_{i_1} < a_{i_2} < \dots < a_{i_k}\). Decreasing subsequences are defined similarly.

An algorithm to find a longest increasing (or, with a few modifications, non-decreasing) subsequence of an array can be found in this nice video tutorial. But that is not what we are concerned with at the moment.

We are concerned with bounds on the length of any longest increasing subsequence (LIS from here on). However, for decreasing subsequences, an LIS has length \(1\). The Erdos Szekeres theorem tells us that in such cases, the length of the longest decreasing subsequence will be large.

More formally, the theorem states that in any permutation (or array with distinct elements) of size at least \(xy + 1\), there is either an increasing subsequence of length \(x + 1\) or a decreasing subsequence of length \(y + 1\).

The easiest way to prove this theorem is via an application of the Pigeonhole principle.

Suppose for the sake of contradiction that the theorem is false for some permutation \(a\). For every \(i\), consider the length of a longest increasing subsequence ending at index \(i\) and the length of a longest decreasing subsequence ending at index \(i\). Let’s call these numbers \(x_i\) and \(y_i\). Note that all \(x_i\)-s are integers between \(1\) and \(x\), and all \(y_i\)-s are integers between \(1\) and \(y\). So there can be at most \(xy\) distinct pairs \((x_i, y_i)\). By the Pigeonhole principle, there are \(i < j\) such that \((x_i, y_i) = (x_j, y_j)\). Since all elements are distinct, \(a_i < a_j\) or \(a_i > a_j\). In the first case, it is impossible that \(x_i = x_j\), and in the latter, it is impossible that \(y_i = y_j\). This is a contradiction, and we are done.

A more sophisticated and deeper proof of this theorem can be done using Dilworth’s theorem. This blog uses it to prove a special case of this theorem, though the proof can be modified easily to work for the complete proof too.

#### Next permutation

Note that permutations are just sequences of integers, so it is possible to sort the set of all possible sequences of size \(n\) lexicographically (i.e., in the order you would find words in a dictionary). This defines a natural indexing of each permutation. How do we find the next permutation from a given permutation?

The easiest way (in C++) is to use `std::next_permutation`

, but we’ll
briefly describe how it works.

Let’s find the first index \(i\) from the right so that \(a_i > a_{i - 1}\). Since \(i\) was the first index satisfying this condition, all indices \(i\) to \(n\) must form a decreasing sequence. Note that the smallest number in this sequence that is larger than \(a_i\) will be the new element at position \(i\), and the rest of them (along with the current \(a_i\)) will be sorted in increasing order after it. All of this can be implemented in time \(O(n - i + 1)\).

Note that starting from the lexicographically smallest permutation (which is \([1, 2, \dots, n]\)), the number of permutations between (both inclusive) this permutation and a permutation whose first position \(i\) such that \(a_i \ne i\) is \(k\), is at least \((n - k)! + 1\). This means that if you apply next_permutation even a large number of times, the number of elements in the permutation that will ever change will not be large (unless you start from a specific permutation, and even in that case, apart from a single change for potentially all indices, the same conclusion holds).

So if for a permutation of length \(n\), you do the next_permutation operation (as implemented above) \(O(n^k)\) times, the time taken will be (much faster than) \(O(k n^k \log n)\). You can even bound the number of operations to perform next_permutation \(r\) times by \(O(r + n)\). The analysis is similar to the complexity analysis when you increment the binary representation of an \(n\)-bit number \(r\) times.

#### Random permutations

There are a total of \(n!\) permutations of size \(n\). How do we construct a random permutation if all we have is a random number generator that can give us integers in a range we can choose?

For thinking about how we can incrementally construct such a permutation, let’s try to remove all elements \(> k\) for some \(k\). Note that the position of \(k\) in this array is equally likely to be any integer from \(1\) to \(k\). However, this won’t lead to a very efficient algorithm right away.

We would want to rather construct a random permutation from left to right. Suppose we have a prefix chosen already. Note that the permutations of all elements on the right are equally likely. Also, all elements on the right are equally likely to be the first element of the suffix (i.e., the last element of the just-larger prefix). This observation leads to Durstenfeld’s variation of the Fisher-Yates shuffle.

Now let’s think about some statistics of random permutations.

Firstly, what is the expected length of the greedily picked increasing sequence by this algorithm:

- If there is nothing in the sequence or the new element is greater than the last picked element, pick this element.

Note that an element is picked if and only if it is more than everything before it. That is, we can do the following using linearity of expectation:

\(E[l] = E[\sum_i(a_i \text{ chosen})] = \sum_i P[a_i \text{ chosen}] = \sum_i P[a_i > \max(a_1, \dots, a_{i - 1})] = \sum_i \frac{1}{i}\), which is the harmonic sum, and is approximately \(\ln n\).

However, it can be shown that the expected length of the LIS is \(\Theta(\sqrt{n})\), so a greedy approach is much worse than computing the LIS systematically.

For more random permutation statistics that also have some information on statistics derived from the next few sections, I would refer you to this.

## The cycle decomposition perspective

We will now switch gears and note a very important part of the theory of permutations — the cycle decomposition. You will find yourself referring to this quite a lot while solving problems on permutations.

#### Cycles

Suppose we have a permutation \(a\). Let’s fix some \(i\). Let’s try to look at the sequence \(i\), \(a_i\), \(a_{a_i}\) and so on. Eventually, it must repeat because there are at most \(n\) values that this sequence can take. For the sake of convenience, let’s call the \(j\)-th element of this sequence \(b_j\).

Then for some \(k < l\), we have \(b_k = b_l\). Let’s take \(k\) to be the smallest such index, and let \(l\) be the smallest such index for that corresponding \(k\). If \(k\) is not \(1\), then we must have \(b_{k - 1} = b_{l - 1}\), because \(a\) is a bijection. However, this contradicts the minimality of \(k\)! This means that the first element that repeats in the sequence is, in fact, \(i\).

Now suppose something in the sequence between \(1\) and \(l\) repeats
(let’s say at positions \(1 < m < o < l\)). Then by repeating the
bijection argument \(m - 1\) times, we must have
\(b_{o - m + 1} = b_1 = i\), so \(o - m + 1 \ge l\). But this is a
contradiction. This means that the sequence repeats starting from \(l\)
and forms a **cycle**.

We call this a cycle because if we made the graph where there was a directed edge from \(i\) to \(a_i\), then this \(i\) would be in a cycle of length \(l - 1\).

Now, this was done for a single \(i\). Doing this for all \(i\) means that all elements are in a cycle in this graph. Clearly, since the indegree and outdegree of each \(i\) are \(1\), this means that these cycles are all disjoint.

Let’s take an example at this point. Consider the permutation \([2, 3, 1, 5, 4]\). The cycles for each element are as follows:

- \(1 \to 2 \to 3 \to 1\)
- \(2 \to 3 \to 1 \to 2\)
- \(3 \to 1 \to 2 \to 3\)
- \(4 \to 5 \to 4\)
- \(5 \to 4 \to 5\)

These correspond to the cycles \(1 \to 2 \to 3 \to 1\) and \(4 \to 5 \to 4\) in the graph.

#### Cycle notation

Note that these cycles are independent. That is, we can just say that for these cycles, any element outside the cycle is irrelevant. So in this sense, for any permutation, we can just list out its cycles and we will be able to determine the permutation uniquely.

So we just represent a permutation as a list of its cycles. For example, the permutation \([2, 3, 1, 5, 4]\) can be represented as \((1 2 3)(4 5)\).

Note: there is a deeper meaning behind this notation that is related to composition and how disjoint cycles commute for composition.

#### Finding \(k\)-th next, functional graphs

Consider the following task: for a permutation, compute the value of \(a_{a_{\ddots_{i}}}\) for each \(i\) where there are \(k\) \(a\)-s, \(k \le 10^{18}\).

Note that if you can find cycles, you can do a cyclic shift for each cycle (appropriately computed modulo the cycle length) and give the answer.

What if it was not guaranteed that \(a\) is a permutation, but all elements in \(a\) are in \([1, n]\) nevertheless? The cycle argument breaks down here, but you can show the following fact:

- Each weakly connected component consists of two parts: a directed cycle and directed trees rooted at distinct vertices in the cycle, directed towards the root (also called reverse-oriented arborescences).

The same problem can be solved in this setting, too, by using binary lifting for each vertex till it reaches the “main” cycle of its component.

In the language of graph theory, such graphs are called functional graphs (which have outdegree of each vertex = 1).

Some problems to practice on are Planet Queries (I and II) on CSES.

#### Swaps/transpositions

Let’s start from the array \([1, 2, \dots, n]\) and try to apply swaps on this array till we reach some desired permutation \(a = [a_1, a_2, \dots, a_n]\). Note that we can always apply such swaps, by trying to build the permutation from left to right. These swaps are also called transpositions.

#### Swaps on cycles

Let’s now think about what happens when we apply a swap to a permutation. Suppose the swap swaps elements at positions \(i\) and \(j\). Let’s look at the graph associated with this permutation. In this graph, swapping two elements at positions \(x\) and \(z\) is equivalent to breaking two edges \(x \to y\) and \(z \to w\) and making two edges \(x \to w\) and \(z \to y\) (in something like a crossing manner).

We make two cases:

- \(i\) and \(j\) are in different cycles: this joins the two cycles together.
- \(i\) and \(j\) are in the same cycle: this breaks the common cycle into two.

A relevant problem at this point would be 1768D.

Let’s consider another permutation sorting problem, but here rather than just adjacent elements being swapped, we allow swapping any elements. What is the minimum number of swaps to sort a permutation this way?

Note that in the final result, there are exactly \(n\) cycles (singletons). Let’s say we currently have \(c\) cycles. Since the number of cycles increases by at most 1 each time, there must be at least \(n - c\) swaps to sort this permutation.

Is this achievable? Yes. We can keep swapping elements in the same cycle while there is a cycle of length at least \(2\), and this will sort the permutation.

## The permutation composition perspective

#### Definition

Suppose we have two permutations \(a\) and \(b\) of the same size \(n\). Now we want to define a composition of them as follows: \(ab\) is defined as the array \(c\) where \(c_i = a_{b_{i}}\).

This essentially corresponds to finding the composition of the \(g\)-functions of \(a\) and \(b\), with \(g\) for \(b\) being applied first.

Let’s get an intuitive sense of what it does, by getting some information out of the definition. Let’s first deal with the case when \(b\) is the “identity” permutation: \(b_i = i\). Then \(c = a\) according to this definition. Similarly, if \(a\) is the identity permutation, then \(c = b\). So, composing with the identity permutation in any order gives the original permutation back.

Now let’s understand this by using the two-line notation. The idea is to take the two-line notation for both permutations and do some sort of pattern matching to get to their composition.

Consider the following permutations \(a\) and \(b\).

\[\begin{pmatrix} 1 & 2 & \cdots & n \\ a_1 & a_2 & \cdots & a_n \end{pmatrix}\]

and

\[\begin{pmatrix} 1 & 2 & \cdots & n \\ b_1 & b_2 & \cdots & b_n \end{pmatrix}\]

Let’s reorder the columns of the two-line notation for \(a\) so that the lower row of \(b\) and the upper row of \(a\) match. However, this corresponds to shuffling the columns of \(a\) according to \(b\):

\[\begin{pmatrix} b_1 & b_2 & \cdots & b_n \\ a_{b_1} & a_{b_2} & \cdots & a_{b_n} \end{pmatrix}\]

Note that this is still a valid two-line notation for \(a\). Now, if we “merge” this with the two-line notation for \(b\), this gives us the two-line notation for the composition of \(a\) and \(b\).

Note how this idea of composition arises naturally from the interpretation of a permutation as a bijection on a set.

We start with the identity permutation, apply the first permutation to it, and then apply the second permutation to it. Applying means replacing \(x \mapsto a_x\). The two-line notation for a permutation \(a\) has the following property:

- If the permutation in the first row is \(p\), then the permutation in the second row is \(ap\).

#### Examples — cycles and swaps

Note that we can associate a cycle itself with a permutation, where the non-cycle elements are fixed points, and the remaining are mapped according to the cycle. In this sense, cycles are permutations.

Also note that a swap itself is a permutation in the same manner. In fact, a swap is just a cycle of length 2.

It is helpful to play around with a few examples of swaps and cycles and write them in two-line notation to understand their mappings and try composing them.

At this point, you should also try to recall the following two things:

When we introduced cycles, we wrote them in a certain format. In fact, you can compose disjoint cycles without worrying about the order in which they are composed. This should be obvious from the fact that they are independent in a way (an element that gets moved in one permutation is a fixed point of the other). You can also see that the notation \((1 2 3)(4 5)\) also conveniently captures that this permutation is a composition of the cycles \((1 2 3)\) and \((4 5)\).

When we discussed swaps, we noted that you could “apply” swaps. This corresponds to left-multiplying the permutation with the corresponding swap permutation. It is instructive to come up with a few examples of swaps and how they compose at this point.

It is important to note that (by using the fact that function composition is associative) permutation composition is associative.

Let’s say we have a permutation where we came to by applying the following three permutations: \((12)\), \((312)(43)\) and \((53241)\) in this order, we would write it as \((53241)(312)(43)(12)\). Note that here \((12)\) and \((43)\) are cycles of length \(2\), i.e., they are swaps.

Now how do we reduce this back to a normal permutation? One way would be to write out the permutation for each cycle and then keep composing it. A cleaner way of doing the same thing would be to do this: for every element, start from the right, and replace it with what the cycle maps it to. For instance, if we want to find the image of \(1\), the first cycle maps \(1\) to \(2\), the second cycle maps \(2\) to \(2\), the third cycle maps \(2\) to \(3\), and the fourth cycle maps \(3\) to \(2\).

#### Parity of permutations

Recall that we mentioned earlier that an adjacent swap reduces the number of inversions by at most \(1\). Well, to be more precise, it changes the number of inversions by \(\pm 1\), and hence flips the parity of the number of inversions of the permutation.

We define the **parity** of the permutation to be the parity of the number
of inversions of the permutation.

The reason why this is important is that it gives a handle on the parity of swaps (not just adjacent swaps) as well as some information about the cycle parities, as follows.

Let’s consider a swap that swaps elements at positions \(i < j\). Then a way to get to this would be to apply \(j - i\) adjacent swaps to bring \(a_j\) to position \(i\), and \(j - i - 1\) adjacent swaps to bring \(a_i\) (from its new position) to position \(j\). This takes an odd number of total adjacent swaps, so the final number of inversions is also flipped.

As a corollary, applying any swap changes the parity of the permutation, and in particular, all swap permutations have odd parity (parities add up modulo \(2\) over composition, as can be seen easily).

Now note that for a cycle, we can construct it by doing a few swaps. More specifically, if the cycle is \((i_1, i_2, \dots, i_k)\), then we can do \(k - 1\) swaps to apply the same transformation as the cycle permutation.

So all even-sized cycles have odd parity, and all odd-sized cycles have even parity.

Let’s consider any decomposition of a permutation into (possibly non-disjoint) cycles. As a result of the above discussion, the parity of the permutation is odd iff the number of even-sized cycles is odd.

In particular, for a decomposition of a permutation into swaps, the parity of the permutation is the same as the parity of the number of swaps in the decomposition. Note that this also implies that we can’t have two decompositions with an odd difference in the number of swaps.

Rephrasing the above result, we have the fact that if it takes an odd number of swaps to get to a permutation, the permutation has odd parity and vice versa.

#### Parity under composition

Clearly, from the above discussion, it can be seen that parities add modulo \(2\) over composition. This, however, is important enough to warrant its own section because it summarizes all of the discussion above.

#### Inverse of permutations

Firstly, let’s think about permutation in an application sense. Is there a way to get back the original permutation after some permutation has been applied to it? From our remarks on the two-line notation, it suffices to do this for the initial permutation being the identity permutation. Let’s say the applied permutation was \(a\):

\[\begin{pmatrix} 1 & 2 & \cdots & n \\ a_1 & a_2 & \cdots & a_n \end{pmatrix}\]

By our remarks on composition using the two-line notation, we need something that swaps these two rows as follows:

\[\begin{pmatrix} a_1 & a_2 & \cdots & a_n\\ 1 & 2 & \cdots & n \end{pmatrix}\]

Note that this is easy: consider the position array \(p\) of \(a\) as we defined earlier; i.e., \(p_i\) is the position where \(a\) equals \(i\), i.e., \(a_{p_i} = i\). Note that \(a\) is \(a_i\) at \(i\), so \(p_{a_i}\) is the position where \(a\) is \(a_i\), i.e., \(p_{a_i} = i\). This shows that \(pa = ap\), and this common value is the identity permutation (which we shall denote by \(\text{id}\) in what follows).

In other words, \(p\) and \(a\) are inverses of each other under composition. It is trivial to note that \(a\) is the position array of \(p\) as well. We denote \(p\) by \(a^{-1}\).

Now that we have some intuition about the inverse in terms of the two-line notation, we think about it in terms of cycles, swaps, and compositions:

- The inverse of a cycle \((i_1, i_2, \dots, i_k)\) is simply \((i_k, \dots, i_2, i_1)\). This is done by reversing all the edges in the related directed graph.
- The inverse of a swap is itself. This also follows from the fact that a swap is a 2-cycle.
- The inverse of a composition \(ab\) is \(b^{-1}a^{-1}\). This corresponds to undoing the permutation application on a stack and also makes algebraic sense: \(abb^{-1}a^{-1} = aa^{-1} = \text{id} = b^{-1}b = b^{-1}a^{-1}ab\).

In particular, for a decomposition of a permutation into (not necessarily disjoint) cycles, we can just take a mirror image of the decomposition, and it will correspond to the inverse of the permutation!

Also, the parity of the inverse is the same as the parity of the original permutation.

#### Involutions

An involution is a permutation whose inverse is itself. Consider the disjoint cycle decomposition of the permutation \(a = c_1 \dots c_k\). \(a = a^{-1}\) means \(\text{id} = a^2 = \prod_i c_i^2\). Note that we are able to write this because disjoint cycles commute, and so do identical cycles. \(c_i^2\) should be an identity map on the elements of \(c_i\) because else the resulting permutation won’t be \(\text{id}\). This means that \(c_i\) is either of size \(1\) or \(2\). Conversely, it can be checked that these permutations are involutions.

#### \(k\)-th power of permutations

Since permutation composition is associative, we can use binary exponentiation to compute the permutation. A more mathematical way to think about it is to use the same trick as in the previous subsection. With definitions the same as in the previous section, for any \(a\), \(a^k = \prod_i c_i^k\), so the cycle structure is determined by the powers of the cycles.

Finding the \(k\)-th power of a cycle just corresponds to mapping it to its \(k\)-th next neighbor in the cycle. If the length of the cycle is \(c\), and \(g = \gcd(c, k)\), then the disjoint cycle decomposition of \(c^k\) consists of \(g\) cycles, with each of them corresponding to a stride of \(k\) along the original cycle.

#### Order of permutations

The order of a permutation \(a\) is defined as the least \(k\) such that \(a^k\) is the identity permutation. Let’s again look at the cycle decomposition as in the previous subsection. For a cycle \(c\) of length \(l\) to be “dissolved” into singletons, it is necessary and sufficient to have the power of the permutation divisible by \(l\). Hence, the order of the permutation is just the LCM of all cycle lengths.

#### Square-root of permutations

For this section, we will consider the following problem: 612E. The solution is left as an exercise to the reader (it shouldn’t be hard, considering what we have discussed in the last few sections).

#### Conjugation and conjugacy classes

Suppose \(a, b\) are two permutations. We think about the following permutation: \(aba^{-1}\). If we think about how the two-line notation for these permutations combines, we can note that this is just the permutation whose cycle decomposition is almost the same as \(b\), but \(a\) is applied to each element in the cycles. In other words, we “reinterpret” the cycles by mapping to and from a separate “ground set” via permutation \(a\).

For a formal statement, let the cycle decomposition of \(b\) be \(c_1 \cdots c_k\), where \(c_i = (x_{i1}, \dots, x_{il_i})\). We claim that \(aba^{-1}\) is \(c’_1 \cdots c’_k\) where \(c’_i = (a_{x_{i1}}, \dots, a_{x_{il_i}})\).

The proof formalizes the earlier argument. Consider an element \(i\), and we look at what happens to \(i\) as each permutation is applied to it:

- \(aba^{-1}\): There is a \(j\) such that \(a_j = i\) since \(a\) is a permutation. So \((aba^{-1})_i = a_{b_{a^{-1}_i}} = a_{b_j}\).
- \(c’_1 \cdots c’_k\): Without loss of generality, let’s say \(i\) is the first element of \(c’_1\), i.e., \(i = a_{x_{i1}}\). Then \(j = x_{i1}\). Now it is mapped to \(a_{x_{i2}} = a_{b_{x_{i1}}} = a_{b_j}\).

Since both of them give the same result, we are done.

This says that the operation \(aba^{-1}\) is nothing but a translation of the labels of \(b\) according to \(a\).

Note that by this result, since we are just applying the permutation to everything inside the cycles, we can map any product of disjoint cycles to any other product of disjoint cycles, provided the multiset of cycle sizes doesn’t change.

Let’s call \(b\) and \(c\) to be conjugate if there is an \(a\) such that \(c = aba^{-1}\), and call this operation conjugation.

Thus this leads to a partition of all permutations into equivalence classes (after verifying a couple of axioms), where everything inside a class is conjugate to everything else in the class. Note that any equivalence class is uniquely determined by the multiset of cycle sizes. These equivalence classes are called conjugacy classes.

## Miscellaneous topics

This section will mostly be about some very brief references and is intended to tell people about some topics they might not have seen or heard about before, which are relevant to this discussion. There might be posts in the future regarding these topics, but it’s still a good idea to read up on these yourself.

#### Group theory

The set of permutations of size \(n\) forms what is called a group and is denoted by \(S_n\), and it turns out that all finite groups of size \(n\) are isomorphic to some subgroup of \(S_n\). This is why there has been a lot of work on permutations from a group theoretic perspective. The section on permutation composition is best viewed under a group theoretic lens.

The analysis of a lot of games can be done using group theory, for instance, 15-game, Rubik’s cube, Latin squares, and so on.

There is a subgroup of permutations that corresponds to all even
permutations. This group is also known to be *simple* and can be used to
show results about the 15-game, for example.

#### Counting: Burnside, Polya enumeration, Stirling numbers

Using group theoretic ideas, the Burnside lemma and the Polya enumeration theorem come into the picture, and they help to count classes of objects (equivalent under some relation) rather than objects themselves.

Stirling numbers are quite useful when counting things related to permutations, and they deserve a whole post for themselves.

#### Coordinate compression

Note that in the above discussions, especially in the section on ordering, we used arrays of distinct elements and permutations interchangeably. Whenever something only depends on the \(\le, <, >, \ge, =\) relations between the elements and not the elements themselves, it makes sense to replace elements with their rank. In the case of distinct elements, this “compressed version” becomes a permutation.

#### Connected component DP

Just like the failed idea in the generation of permutations, that idea can also be used for dp. More details can be found in this post.

#### Young tableaus

Closely related to the LIS and other ordering concepts is the idea of Young tableaus. This has a very interesting theory, and this post is a good reference for some of the results.

#### Schreier Sims algorithm

Suppose you have a subgroup generated by some permutations, and you want to find the size of the subgroup. The Schreier Sims algorithm helps in finding this size, and a good post can be found here.

#### Permutation tree

There is a cool and interesting data structure that can do different kinds of queries on permutations and subarrays. This post is a good introduction to the data structure and what it can do.

## Problems

I have added some standard problems whenever suitable, but for practicing permutations well, it makes sense to practice on a larger set of problems. Since there is a very large number of problems on permutations, maybe just adding a few of them that I know would not do justice to the variety of problems that can be made on permutations. A couple of possible methods of practicing are the following:

- Search on the CF problemset for problems with the name permutation in them.
- Look out for permutation problems on sources for standard problems like CSES.