This post was originally written on Codeforces; relevant discussion can be found here.

Introduction

A lot of people shy away from solving (mathematical) recurrences just because the theory is not very clear/approachable due to not having an advanced background in math.

As a consequence, the usual ways of solving recurrences tend to be:

  • Find the first few terms on OEIS.
  • Guess terms from the rate of growth of the recurrence (exponential rate of growth means you can sometimes estimate the exponential terms going from largest to smallest — though this fails in cases where there is a double-root of the characteristic equation)
  • Use some theorem whose validity you can’t prove (the so-called characteristic equation method)
  • Overkill using generating functions

But this doesn’t have to be the case, because there is a nice method you can apply to solve equations reliably. I independently came up with this method back when I was in middle school, and surprisingly a lot of people have no idea that you can solve recurrences like this.

The method is more principled and reliable than the first two guesswork methods, the third method might fail to apply in some cases, and the fourth method requires knowing what an infinite series is. An added bonus is that it prevents you from forgetting special cases that often plague recurrences! Meanwhile, as we show in the last section, having intuition about the method also allows you to apply the ideas developed to things that are only vaguely related to recurrences.

The best thing about the method is that it requires nothing more than school-level algebra and potentially some intuition/common sense.

I’m not saying that you should only use this method, just that keeping it in mind is pretty helpful whenever you’re faced with a recurrence (or even more fundamentally, linear combinations of numbers).

A simple example

Note: I recommend working through this section (with pen and paper) with the examples \(x_k = 2x_{k-1} + 3\) and \(x_k = x_{k-1} + 5\). Using symbols can be intimidating for some and thus examples should be used while presenting new ideas, but it is also important to understand the intuition behind the underlying ideas in the form of algebra.

Let’s first consider a recurrence \(x_k = ax_{k-1} + b\). We employ wishful thinking: what is there was no \(b\) in the recurrence? In that case, we would have had \(x_k = x_0 \cdot a^k\), and that would be the end of the story. So can we reduce the general case to one where \(b = 0\)?

If you define \(y_k = x_k + c\) (i.e., vertically shift the graph of \((i, x_i)\) by some offset), then we have \(y_k = a(y_{k-1} - c) + b + c\). Now we have one degree of freedom, and we can choose \(c\) such that the constant term \(-ac + b + c\) is \(0\), i.e., \(c = \frac{b}{a-1}\). Note that we immediately run into a special case here: \(a = 1\) — in this case, such a \(c\) is impossible to choose. So we make two cases to finish off the problem:

  • \(a = 1\): In this case, we have \(x_k = x_{k-1} + b\). This means that \(x\) is an arithmetic progression and not a geometric progression, and \(x_k = x_0 + bk\).
  • \(a \ne 1\): In this case, there is a \(c\) such that \(y_k = ay_{k-1}\), so \(y_k = a^k y_0\). Translating it back into \(x\)-space (expressing \(y\) in terms of \(x\)), we have \(x_k + c = a^k (x_0 + c)\) which gives us \(x_k = a^k x_0 + \frac{a^k - 1}{a - 1}b\).

Exercise: Try to solve the recurrence \(x_k = 2 x_{k-1}^2\) using this method.

A slightly harder example

Note: I recommend working through this section with two separate examples in mind: \(x_k = 3x_{k-1} + 10x_{k-2}\) and \(x_k = 4x_{k-1} - 4x_{k-2}\).

Let’s now consider the recurrence \(x_k = a x_{k-1} + b x_{k-2} + c\). For the sake of simplicity of presentation, we note that we can do something like the previous example to get rid of the \(c\), so let’s assume \(c = 0\). So we want to solve \(x_k = a x_{k-1} + b x_{k-2}\).

Now how do we deal with the two lagging \(x_{k-1}\) and \(x_{k-2}\) terms instead of just one? The trick is to try and find \(d\) such that \(y_k = x_k - d \cdot x_{k-1}\) becomes a geometric progression (we can also use a \(+d\) instead of \(-d\) like we did in the previous solution, but that does not matter).

We can also look at it in this way: we want to subtract a suitable multiple of \(x_{k-1}\) from both sides, so that \(x_k - d x_{k-1}\) (for some \(r\)) becomes a multiple of its lagged sequence.

For that to happen, notice that we must have \(y_k = e y_{k-1}\). This translates to \(x_k - d x_{k-1} = e (x_{k-1} - d x_{k-2})\). Comparing it with the original recurrence, we must have \(d + e = a\) and \(de = - b\). So we have \(|d - e| = \sqrt{(d + e)^2 - 4de} = \sqrt{a^2 + 4b}\) (note how this relates to solving a quadratic equation: \(x^2 - ax - b\) needs to be factorized into \((x - d)(x - e)\) — this will be helpful later on). Using this, we can solve for \(d\) and \(e\).

Now once that we have \(d\) and \(e\), we have \(y_k = e y_{k-1}\), so \(y_k = e^{k-1} y_1\). Let’s now, as earlier, go back to \(x\)-space. We have \(x_k - d x_{k-1} = e^{k-1} y_1\): note that \(y_1 = (x_1 - d x_0)\) is a constant we already know.

We now have two methods of solving this: one is to simply add these equalities for different \(k\) in a telescoping manner, after dividing the whole equation by \(d^k\). Just for the sake of showing the usage of this method a bit more, I will go with the other one.

Looking at the new recurrence we got for \(x\), we again want to define some \(z_k\) as a function of \(x_k\), such that \(z_k = d z_{k-1}\). Since there a term proportional to \(e^{k-1}\) on the right, we will choose \(z_k = x_k + f \cdot e^{k-1}\).

Going back to the recurrence in \(x\), we get \(z_k - f \cdot e^{k-1} - d \cdot z_{k-1} + d \cdot f \cdot e^{k-2} = e^{k-1} \cdot y_1\).

To get rid of the non-\(z\) terms, we must have \(f \cdot (d - e) = ey_1\). We are again faced with two cases, as in the previous example.

The case with \(d \ne e\) (the “distinct-root” case) is easy: in this case, we can just solve for the problem analogously to what we did to solve the simpler example.

The only remaining case is \(d = e\) (the “repeated-root” case): let’s look closely at why we failed here. The two terms coming from \(x_k\) and from \(x_{k-1}\) cancel out completely. What can we do to be able to have different contributions of \(e^{k-1}\) from both of these terms? Let’s choose \(z_k = x_k + g(k) \cdot e^{k-1}\) instead. Then we must have \(g(k - 1) - g(k) = y_1\). This means \(g(k)\) is in fact a linear function of \(g\). That is, we have \(g(k) = hk + i\) for some \(h, i\): here \(h = -y_1\) and \(i\) can be set to anything arbitrary (for now let’s say \(i = 0\)). So we choose \(g(k) = -y_1 k\), i.e., \(z_k = x_k - y_1 \cdot k \cdot e^{k-1}\).

Since we know \(z_k = dz_{k-1}\), we know that \(z_k = d^k z_0\). Going into \(x\)-space (expressing \(z\) in terms of \(x\)), we have a solution to the equation.

A short sketch of how to deal with higher order recurrences

In this section, just by generalizing the approach in the previous section, we will show how you can in fact derive the method of characteristic equations while solving recurrences.

Remember the note about solving for \(d\) and \(e\). It looks like finding a factorization of \(x^2 - ax - b\) in the form \((x - d)(x - e)\).

When you try to use the method above for third order recurrences, you will quickly realize that we are trying to find a factorization of \(x^3 - ax^2 - bx - c\) into \((x - d)(x^2 - ex + f)\) for the first step, and as you go along solving the lower order recurrence at every step, you will realize that you also need to factorize \(x^2 - ex + f\) into two linear polynomials.

If the roots \(r_i\) (\(i = 1\) to \(3\)) of \(x^3 - ax^2 - bx - c\) are all distinct, you will get \(x_k\) in the nice form \(x_k = \sum_{i=1}^3 c_i r_i^k\). But if the roots have multiplicity more than 1 (i.e., \((x - r)^s\) divides the polynomial for some \(s > 1\)), then the \(c_i\) becomes a polynomial in \(k\) with degree \(s - 1\).

Using induction, you can show that a result of this form is indeed true for higher order recurrences as well.

An example slightly unrelated to recurrences

There’s a funny story related to this method. Back when I was testing [contest:1909], problem D was not in the problemset. When it was added (with \(k = 1\) as the easier version, the problem was supposed to be problem B in the set), I solved the problem instantly because I used this method for solving recurrences before, and thought that its difficulty is supposed to be not more than A. A lot of people disagreed with me on this, and it was eventually agreed upon to use general \(k\) as a harder problem — I still disagreed with this opinion for a while, but then I realized that not a lot of people know this method of approaching recurrences/linear combinations.

This was partly the motivation behind writing this post (the other part was the existence of posts like this).

Here’s how you solve that problem using this method.

In this problem, you are given a reverse recurrence of the form \(y + z = x + k\). Inspired by the method developed above, we will just subtract \(k\) from both sides, and let \(x’ = x - k, y’ = y - k, z’ = z - k\), to get \(y’ + z’ = x’\).

This means that if we shift all the array elements vertically by \(-k\), we are just splitting elements into other elements \(\ge 1 - k\).

(Arguably) the hardest part of the problem is to realize that you can shift numbers and deal with them as usual, after the transformation. But to a person who knows this method well, it might seem like this problem was made specifically to be solved by it.

Conclusion

In this post, we saw that we can solve recurrences of the form \(a_k = f(a_{k-1}, \dots, a_{k - l})\) by trying to impose easy-to-analyze structure on expressions of the form \(b_k = g(a_{k}, a_{k-1}, \dots, a_{k - l + 1})\). In each example, the structure was of the form \(b_k = r b_{k-1}\), which gives us a way to compute \(b_k\) directly, thereby reducing the “order” of the problem from \(l + 1\) to \(l\). If you tried the exercise, the structure was supposed to be of the form \(b_k = b_{k-1}^2\).

In a way, this is an application of the divide and conquer way of thinking (in the general sense). This kind of divide and conquer, but in a bottom-up way rather than a top-down way, is also used when you derive things like Hensel’s lifting lemma.