Disjoint Set

Two operation

  1. Ask which set an element belongs to $$x \in S$$
  2. Combine two sets. (union) $$S_1 \cup S_2$$

In graph theory, we can see "set" as connected component, which means that disjoint set can be use to ask if two vertices is in the same connected component.

We assign each set an "index".

The thinking of disjoint set is 

"Let one of the element in set be the leader"

So every element in the same set has the same leader.

In other words, each element knows which set it belongs to as long as it knows who is its leader.

let

boss[i]

be the leader of node i.

In the beginning, every element is a set contains only itself.

When combining two sets, we are given two elements.

And combine two sets which these two elements belongs to.

Suppose the leader of these two set are \(a\) and \(b\)

we write down

boss[b] = a;

 

Enough?

According to our definition, it may require more works

We should also modify all the elements who have \(boss[i] = b\)

 

How about we modify the definition to

"superior"

The boss is the only one whose superior is himself.

int find(int x){
	if(x == super[x])
    	return x;
    return find(super[x]);
}

We utilize recursion to find our boss by asking the superior.

 

1

10

7

9

8

6

11

3

2

5

4

1

10

7

9

8

6

11

3

2

5

4

merge 8, 3 or (1, 6)

12

bool same(int a, int b){
	return find(a) == find(b);
}
void Union(int a, int b){
	super[find(a)] = find(b);
}

How about the query?

If the tree becomes a link list,

the time complexity will becomes \( O(N) \)

 

Is there any way to make the tree height as low as possible ?

We mentain another array \( rank\)

\( rank[i]\)  represents

"when node i is the boss, the largest step of recursion of finding"
In short,

the "tree height"

When unioning two sets, we always join the boss with less rank under the boss with higer rank.

Proof

To increase the rank of a tree, it must have another tree with the same rank.

In order to increase the rank up to \(x\),
we need at least \(2^x\) times Union.

So 

$$ \text{tree height} \leq \log{N} $$

 

The time complexity of \(find\) is \( O(log N) \)

This is also called 啟發式合併

Better ?

Here is the thinking.

When we alter the boss definition to a superior definition.

We were actually thinking of \(O(1)\) union method. 

In real, we only need the information of boss.

Text

int find(int x){
	if( x == super[x])
    	return x;
    return super[x] = find(super[x]);
}

1

7

6

3

2

5

4

find(7);

 

1

7

6

3

2

5

4

find(7);

 

All the operation of the disjoint set is regarding operation "find"

And "find" is stick with the rank of the tree.

 

The method we just introduced is called

path compression

the averange time complexity is \(O(log(N))\)

following time complexity proof skipped (So hard)

From CLRS

n elements, f times operation
Time complexity : \(\Theta(n+f\cdot(1+\log_{2+f/n}n))\)

time complexity with  啟發式合併

(So hard)

\(O(\alpha(N)) \)

\( \alpha(N)\) is the inversion function of
Ackermann function

\(\alpha(N)\) increases reallyyyyy slow.
you can treat it as a constant

Disjoint Set

By tunchin kao

Disjoint Set

  • 85