This post was originally written on Codeforces; relevant discussion can be found here.
In this post, I will focus on the following problem: 920E. More generally, we will look at how to do efficiently traverse the complement graph of a given graph in both dfs and bfs orders. In particular, the bfs order also allows us to find a shortest-path tree of the complement graph. This is my first post, so please let me know of any errors/typos that might have crept in.
Note 1: I will present the algorithms in the order I came up with them, so it might not be the most clean presentation, but I hope that it might give some insight into how to come up with new ideas in graphs. Note that I don’t claim to be the first person to come up with these ideas (especially the naive algorithm, which is identical to the solution in the editorial), but I haven’t seen any of these implementation tricks being used in any implementation before. Resources pointing to similar ideas/implementations are welcome!
Note 2: All adjacency relations will be in terms of the original graph (which is given to us as an input, and is sparse).
(Not-so-important) note 3: We won’t be using lists (or an implementation of lists using arrays, which was made popular by the usual implementation of the adjacency list which is used by a lot of Chinese competitive programmers), but in the final implementation, we will be using an application of DSU which can be used to implement deletions + traversal in a sorted list efficiently.
TL;DR
If you’re looking for the implementations, here they are:
- Naive DFS in \(O((n + m) \log n)\): [submission:125679159]
- BFS without sorting adjacency lists in \(O(n + m)\): [submission:125679188]
- BFS with sorting adjacency lists (using radix sort) in \(O(n + m)\): [submission:125679224]
- DFS with sorting adjacency lists (using radix sort) in (conjectured) \(O(n + m)\) (guaranteed \(O((n + m) \log n)\)): [submission:125679248]
Solving the problem using DFS in \(O((n + m) \log n)\) time
Note that in usual dfs, we do something of the following sort while running a DFS on vertex \(u\):
mark u as visited
for v adjacent to u:
if v is not already visited:
dfs(v)
Note that the order of the loops doesn’t really matter, so we can do something like the following too:
mark u as visited
for v in the set of unvisited vertices:
if v is adjacent to u:
dfs(v)
Now if we want to do a DFS on the complement graph, we can merely change
the condition to if v is not adjacent to u
and we will be done.
How will we go about implementing this algorithm efficiently? Suppose we maintain a set of unvisited vertices, and the adjacency lists are sets instead of vectors/lists. Then we can go through the set of unvisited vertices, and if an unvisited vertex is not in the adjacency set of the current vertex, we can recursively do a DFS on that vertex.
Why is this efficient? Note that in any iteration, we either skip a vertex or we don’t. In the case we skip a vertex, the vertex has to be in the set of vertices that are adjacent to the current vertex, so we don’t do more than \(O(m)\) skips. In the case we don’t skip, we remove a vertex from the unvisited set, so we don’t do more than \(O(n)\) iterations of this type either. So the total number of iterations is \(O(m + n)\), and the cost of every iteration is upper bounded by \(O(\log n)\), which gives us the bound.
Implementation: [submission:125679159]
Solving the problem using BFS in \(O(n + m)\) time
Let’s come to our first linear implementation. I switched over to BFS, since you can still find components using BFS, and it is easier to reason about an iterative algorithm than a recursive algorithm with global variables.
The first change is that we don’t use a set, instead we use a vector to store unvisited vertices. Initially all the vertices are unvisited.
Let’s see what happens when we process a vertex \(u\) in this algorithm.
Firstly, we note down all the unvisited vertices that are adjacent to the vertex \(u\). Let’s call them blocked vertices (for \(u\)), and mark them in a global array (we will unmark them when we finish the iteration corresponding to \(u\)).
Then we iterate over all the unvisited vertices, and we can push an unvisited vertex into the queue if and only if it is not \(u\) and it is not blocked. Then the only remaining unvisited vertices are the ones that are blocked, so we replace the set of unvisited vertices by the set of blocked vertices. This can be seen to be identical to a usual BFS, but with minor modifications.
Why is this efficient? Suppose the order in which vertices are popped from the queue is \(u_1, u_2, \ldots, u_k\). Then when we finish processing \(u_i\), the unvisited set is \(V \cap N(u_1) \cap \cdots \cap N(u_i) \subseteq N(u_i)\), so the total size of the vectors we process at any point is at most \(O(m)\). The number of iterations is also hence upper bounded by \(O(n + m)\), and we are done. Note that this argument doesn’t depend on the connectedness of the graph.
Implementation: [submission:125679188]
Solving the problem using BFS in \(O(n + m)\) time, using yet another algorithm
Remember how we implemented DFS? We can do a similar thing for BFS, and we will modify the algorithm in the previous section a bit to somehow match what we did in the DFS case. We can try sorting the adjacency lists. However, that can be worst case \(O(m \log n)\) or something similar. How do we do better than that? Instead of sorting adjacency lists separately, we can sort all edges using radix sort in \(O(n + m)\), and then create a graph from those edges which will automatically have sorted adjacency lists.
Now our invariant will be that the set of unvisited vertices is sorted. So when we process a vertex, instead of marking the blocked vertices in advance, we can use two-pointers to check if an unvisited vertex is in adjacent to \(u\) or not. The rest of the algorithm is pretty much identical to the remaining part of the previous algorithm.
Implementation: [submission:125679224]
Solving the problem using DFS in conjectured \(O(n + m)\) time.
This is (in my opinion) the most interesting algorithm among all of the ones I have presented so far. However, I don’t have a correct proof of the time complexity. We will use the idea from the previous implementation here for sorting the adjacency list, and all we need to do now is to make the following operations in the naive DFS implementation fast enough:
- Erasing an element from a set
- Iterating over a set
- Finding an element in the adjacency list of the current vertex
Note that the two-pointers idea still works here while iterating over
the global unvisited set (which is not implemented using a std::set
since erasing would be too slow), if we are guaranteed that the
unvisited set is sorted during iteration.
Consider the state of the unvisited set. At any point, it consists of some vertices \(-1 = a_0 \le a_1 \le a_2 \le \cdots \le a_k \le n = a_{k+1}\) (the first and the last elements are just for the sake of convenience in the following definition). Define the operation \(root\) as follows: for any \(a_i < x \le a_{i+1}\), we have \(root(x) = x\). Note that the computation of \(root\) will be done lazily whenever we need to compute the function \(root\). Then if we have such an operation, we can do the needed operations as follows:
- Erasing an element from a set: set \(root[x] = x + 1\) for now (when the actual value of \(root(x)\) is needed, we will do it then and there, and update \(root[x]\) and all its dependencies for speeding up further computations). This links together the two ranges: the one ending at \(x\) and the one starting at \(x + 1\).
- Iterating over the set: We can do this using a single for loop as
follows (by definition of the function root):
for (int it = root(0); it < n; it = root(it + 1))
- Finding an element in the adjacency list of the current vertex can be done using two pointers, since we are guaranteed by the definition of \(root\) that we iterate over the unvisited vertices in increasing order.
Now how do we lazily compute \(root\) when needed? We shall do so in a DSU-like manner with path compression as follows:
int get_root(int u) {
if (root[u] == u) return u;
return root[u] = get_root(root[u]);
}
Why is this efficient? DSU guarantees that the time taken is \(O((m + n) \log n)\). However, running some stress tests suggests that this is in fact much better than that, and that the number of compressions done is in \(O(n + m)\). I tried proving it but found a fatal error in a step, so if anyone can come up with a proof (or a counter-example) for this, please let me know!
Implementation: [submission:125679248]