WQU with Path Compression
- Review of Weighted Quick Union
- Path Compression
- Proof of amortized find for WQU-PC
Review of Weighted Quick Union
A WQU data structure lets us efficiently represent and interact with disjoint
sets. Given elements, which start off as one-element sets, we can
union on any two elements to union the two sets they belong to. Weighted
quick union gives us a runtime bound of for any call to
This is due to the weighted component of the algorithm: the union will always
make the set with a less elements a child of the set with more elements. See
the below image (taken from this page):
The only way to increase the height of a set’s tree representation is if you union two sets with the same number of elements (try to draw a few examples and visualize this). As a result, the upper bound on the height of a set’s tree is . To end up with a tree with height , pairwise union all equal-sized sets, and do this times. Each time you union two equal-sized sets, the height of the resulting set is increased by , so the tree at the end has height . See below for :
This means the bound on the runtime of calling
finds the root of a set based on an element, is , since an element
at the bottom of the tree would have to traverse the entire height of the tree
to reach the root.
find asymptotically faster, we will use path compression every time
Here is the pseudo-code of
find with path compression:
def find(el): if el is not the root: el.parent = find(el.parent) return el.parent
This above pseudo-code is quite dense, try to work through it. Path compression
takes all the nodes in the path traversed (starting at
el and ending at the
root) and makes each node a child of the root (hence, it is compressing the
path). Here is an example:
A / | \ B C D / \ E F / \ G H
find(G) will transform the tree to become this:
A / / | | \ \ B C E D E G / | F H
E got moved up to be children of the root).
The key insight to why path compression gives a better runtime for calls
find (if ) is because everytime you call
find, you do some work
that flattens the tree, and ultimately gives
find an amortized runtime of
Proof of amortized find for WQU-PC
The iterated logarithm, is defined as the number of times you apply until you get . For example, . grows very slow, see the below example for some intuition why:
In general, an algorithm with time complexity would be very fast as grows larger (just below constant time!), so this is quite desirable.
First, let’s define the of a root to be the height of its underlying
tree. Before any
union calls are made, each element, which is the root of its
own one-element set, has . When we union two sets with equal
, we increment the root of the unioned set’s by , because
the underlying tree has had its height increased by .
I will use “root” and “tree” interchangeably, so when I say a “the number of elements in a root of ”, I really mean “the number of elements in the tree that has a root”.
Here are two useful properties of that will be used in our proof later:
(1) For every element ,
If , that means it was created by unioning two elements whose sets had .
(2) If there are elements, there are at most elements of
An important observation: elements that are no longer roots still have a because in the beginning, they started out as roots of their own one-element set. Note that their can no longer increase once they become a child of another root.
First, let’s prove that a root with has at least elements.
The total number of elements for a root of is something we can lower bound. is at least the sum of the number of elements in the two equal-ranked trees that were merged. Observe that this is a lower bound because you could always increase the number of elements without increasing the rank by unioning in more roots with .
Expanding and simplifying :
This means that if we have elements, there are at most elements of , since each root with has at least elements. This means the highest possible is since there is at most one root with elements.
Onto the proof…
What we’re going to try prove is that for a call to
find, the time complexity
is . Some calls to
find might take some extra work, but we
will be able to bound the total amount of extra work over all calls to
as a function of the number of elements ().
We will then show that if you calls to
find, becomes negligible
when , . This is the soul of amortization.
According to (2), the possible range of ranks an element could have is . Let’s divide this range of non-zero ranks into the following groups:
Each group has the form . In total, there are groups.
Recall in (2) that once an element is no longer a root, it’s rank can
no longer increase. We will give each child element with the
ability to take total extra steps over all calls to
is the first value of the group that falls in.
Claim: The total amount of extra steps we give is at most . That is, .
Proof: Child elements with that are in the group each get extra steps. From (2) we can upper bound the total amount of child elements with a rank in that group:
For each group, the total amount of extra steps given is at most , and there are groups.
We will define the amount of work done by
find(e) as the length of the path
e and the root. Consider the elements in this path, ranks
ascending (see (1)). Each element has . Each
element’s parent has a that is either in a
larger group or the same group as .
- At most elements would have parents with a rank in a larger group. (Why? That’s how many groups there are.)
- If has a rank that is in the same group as the rank of , will use one of its allocated extra work steps.
Observation 1 implies a time complexity of
find if we can
prove that each child element does not spend more work steps than it was
Every time a child element uses one of its extra work steps, it’s parent becomes one with a higher rank (path compression changes its parent). A child element needs at most extra steps to change to a parent that has a rank in a higher group. Once its parent’s rank is in a higher group, it does not need any more extra steps because of Observation 1.
The total time complexity of calls to
find on elements is:
At last, we have proved that the amortized runtime of
You can prove an even tighter upper bound, see this paper for more.
Corollary: Amortized union
union’s time complexity depends on
union also has an amortized
runtime of .