# WQU with Path Compression

## Review of Weighted Quick Union

A WQU data structure lets us efficiently represent and interact with disjoint
sets. Given elements, which start off as one-element sets, we can
call `union`

on any two elements to union the two sets they belong to. Weighted
quick union gives us a runtime bound of for any call to `union`

.
This is due to the *weighted* component of the algorithm: the union will always
make the set with a less elements a child of the set with more elements. See
the below image (taken from this page):

The only way to increase the height of a set’s tree representation is if you union two sets with the same number of elements (try to draw a few examples and visualize this). As a result, the upper bound on the height of a set’s tree is . To end up with a tree with height , pairwise union all equal-sized sets, and do this times. Each time you union two equal-sized sets, the height of the resulting set is increased by , so the tree at the end has height . See below for :

This means the bound on the runtime of calling `find`

, which
finds the root of a set based on an element, is , since an element
at the bottom of the tree would have to traverse the entire height of the tree
to reach the root.

To make `find`

asymptotically faster, we will use path compression every time
we call `find`

.

## Path Compression

Here is the pseudo-code of `find`

with path compression:

```
def find(el):
if el is not the root:
el.parent = find(el.parent)
return el.parent
```

This above pseudo-code is quite dense, try to work through it. Path compression
takes all the nodes in the *path* traversed (starting at `el`

and ending at the
root) and makes each node a child of the root (hence, it is compressing the
path). Here is an example:

```
A
/ | \
B C D
/ \
E F
/ \
G H
```

Calling `find(G)`

will transform the tree to become this:

```
A
/ / | | \ \
B C E D E G
/ |
F H
```

(`C`

and `E`

got moved up to be children of the root).

The key insight to why path compression gives a better runtime for calls
to `find`

(if ) is because everytime you call `find`

, you do some work
that flattens the tree, and ultimately gives `find`

an amortized runtime of
.

## Proof of amortized find for WQU-PC

The iterated logarithm, is defined as the number of times you
apply until you get . For example, . grows *very slow*, see the below example for some intuition
why:

In general, an algorithm with time complexity would be very fast as grows larger (just below constant time!), so this is quite desirable.

First, let’s define the of a root to be the height of its underlying
tree. Before any `union`

calls are made, each element, which is the root of its
own one-element set, has . When we union two sets with equal
, we increment the root of the unioned set’s by , because
the underlying tree has had its height increased by .

I will use “root” and “tree” interchangeably, so when I say a “the number of elements in a root of ”, I really mean “the number of elements in the tree that has a root”.

### Two Properties

Here are two useful properties of that will be used in our proof later:

#### (1) For every element ,

If , that means it was created by unioning two elements whose sets had .

#### (2) If there are elements, there are at most elements of

An important observation: elements that are no longer roots still have a because in the beginning, they started out as roots of their own one-element set. Note that their can no longer increase once they become a child of another root.

First, let’s prove that a root with has at least elements.

The total number of elements for a root of is something we
can lower bound. is *at least* the sum of the number of elements in the
two equal-ranked trees that were merged. Observe that this is a lower bound
because you could always increase the number of elements without increasing the
rank by unioning in more roots with .

Expanding and simplifying :

This means that if we have elements, there are at most elements of , since each root with has at least elements. This means the highest possible is since there is at most one root with elements.

Onto the proof…

### Amortization

What we’re going to try prove is that for a call to `find`

, the time complexity
is . Some calls to `find`

might take some extra work, but we
will be able to bound *the total amount of extra work over all calls* to `find`

as a function of the number of elements ().

We will then show that if you calls to `find`

, becomes negligible
when , . This is the soul of amortization.

According to (2), the possible range of ranks an element could have is . Let’s divide this range of non-zero ranks into the following groups:

Each group has the form . In total, there are groups.

#### Bounding

Recall in (2) that once an element is no longer a root, it’s rank can
no longer increase. We will give each *child element* with the
ability to take total extra steps over all calls to `find`

, where
is the first value of the group that falls in.

**Claim:** The total amount of extra steps we give is at most .
That is, .

**Proof:**
Child elements with that are in the group
each get extra steps. From
(2) we can upper bound the total amount of child elements with a rank
in that group:

For each group, the total amount of extra steps given is at most , and there are groups.

#### Bounding find

We will define the amount of work done by `find(e)`

as the length of the path
between `e`

and the root. Consider the elements in this path, ranks
ascending (see (1)). Each element has . Each
element’s parent has a that is either in a
larger group or the same group as .

Two observations:

- At most elements would have parents with a rank in a larger group. (Why? That’s how many groups there are.)
- If has a rank that is in the same group as the rank of , will use one of its allocated extra work steps.

Observation 1 implies a time complexity of `find`

if we can
prove that each child element does not spend more work steps than it was
allocated.

Every time a child element uses one of its extra work steps, it’s parent becomes one with a higher rank (path compression changes its parent). A child element needs at most extra steps to change to a parent that has a rank in a higher group. Once its parent’s rank is in a higher group, it does not need any more extra steps because of Observation 1.

The total time complexity of calls to `find`

on elements is:

At last, we have proved that the amortized runtime of `find`

is
.

You can prove an even tighter upper bound, see this paper for more.

### Corollary: Amortized union

Since `union`

’s time complexity depends on `find`

, `union`

also has an amortized
runtime of .