I see a red node, and I want it painted black!

本文是阅读《Painting Nodes Black With Red-Black Trees》的读书笔记,文章来自Vaidehi Joshi在medium上发布baseCS系列,该栏目用可爱的漫画来生动地解释计算机基础、数据结构、算法、理论在实践中的应用。


Intro

  • Different approaches to solving similar problems.

  • We depend on as programmers daily are actually built upon more naive solutions that were initially invented by someone else:

  • It is wise to lean on solutions that have already been created by others and build upon them to create our own.

  • AVL trees, a type of self-balancing tree that build upon the ideas of binary search trees.
  • Red-black tree builds upon both of these two tree structures.

A very disciplined tree

  • Red-black trees were invented in 1978, by two researchers named Leonidas J. Guibas and Robert Sedgewick, at Xerox PARC, a research and development company based in Palo Alto, California.

Definition

  • a type of self-balancing binary search tree,very similar to other self-balancing trees, such as AVL trees.
  • adhere to a strict set of rules to maintain a log time complexity and stays balanced.

Rules

  1. Every single node in the tree must be either red or black.
  2. The root node of the tree must always be black.
  3. Two red nodes can never appear consecutively, one after another; a red node must always be preceded by a black node (it must have a black parent node), and a red node must always have black children nodes.
  4. Every branch path — the path from a root node to an empty (null) leaf node — must pass through the exact same number of black nodes. A branch path from the root to an empty leaf node is also known as an unsuccessful search path, since it represents the path we would take if we were to search for a node that didn’t exist within the tree.

Rules of a red-black tree

Example

  • a null leaf node is always considered to be a black node, not red.

Fulfilling the rules of a red-black tree

  • these 4 rules are super strict and very easy to break.

Proof

  • A chain of three nodes cannot possibly every be a valid red-black tree.

A chain of 3 nodes is not possible in a red-black tree


Following the rules of red

  • The trickiest time to follow the rules of red-black trees is when we’re growing or shrinking the tree.

  • A perfectly-balanced red-black tree is great, but most data structures tend to have elements and data inserted and removed from them. When it comes to red-black trees, this can be a little daunting at first.

inserting a node

strategy 1 : red

  • Inserting a node and immediately coloring it red makes it much easier to identify and subsequently fix any violations.

How would we insert a new node with a value of 3

  • The first step is basically ignoring the red-black rules, and initially just figuring out where the node would go according to the rules of a normal binary search tree.

  • At first, when we insert this node, we’ll break rule four.Instead, if we recolor our inserted node with a value of 3 to be red, instead of black, we don’t violate any of our four red-black tree rules.


insert another node

strategy 2 : recoloring

  • recoloring nodes is one well-used technique for handling insertions and deletions into a red-black tree.

Inserting nodes, continued

  • locate the position for the node and recolor it to be red.

  • However, we could also recolor the black parent node, 4, to be red instead of black.


strategy 3 : rotations

  • Rotations tend to move around and restructure the subtrees of a larger red-black tree, which can also be helpful in preventing any rule violations.

The power of left and right rotations

  • left-rotation : left-rotating on the root node, 4, so that it’s right child, 11, becomes the new parent ; the subtree of 9-5-10 moved from the right subtree to the left.
  • right-rotation : right-rotate on the new parent node, 11, and shift it down so that it once again becomes the right child of 11 ; The same subtree of 9-5-10 moved from the left subtree back to the right.

Example 1

Remember

  • it’s easiest to start off by always inserting a red node, and then recoloring and rotation as necessary, afterwards.

Handling insertion and deletion by rotating and recoloring.

  1. start with a single root node of 21, which will be red. recolor node 21 to be black.

  2. insert a node with a value of 8 , as the left child of the root. We can insert it as a red node.

  3. insert a node with a value of 3. rotate the grandparent node (the root) and then recolor.

    Rotate and recolor, no rules are violated

  4. right-rotate the root node, and shift 21 down to become the right child of 8.

  5. recolor our original parent node of the newly-inserted node (which is now the right child), 21, and the root node, 8. If we recolor 8 and 21, our root node is back to being black, and our two child nodes are both red.


The benefits of painting it black

what about inserting elements into a much larger red-black tree?

Complicated Example 2

What about inserting into a much larger red-black tree

Node 16 violates the tree, so it must be rotated.

  1. recoloring the parent nodes of our newly-inserted node, 10. This at least solves the problem of rule three being broken!
  2. can’t recolor 16 to be black, because if we did that, we’d be violating rule four.
  3. there are no parent nodes left to recolor (except for the root node, which we don’t want to make red!), we can now lean on rotations to help us out.

Now, node 21 is the node that is violating the tree

  1. rotating node 16 to the right, we’ve effectively pushed down nodes 21, 27, and 29.注意这里将16的右子树移到21的左边了!
  2. two consecutive red nodes that are still breaking rule three: 16 and 21.
  3. left-rotating node 16 and making it the new root node. perform a left-rotation on nodes 9 and 10 so that they both are on the left subtree rather than the right subtree.

In order to finally finish balancing the tree, we need to recolor the root node, 16

  1. recolor our root (as well as it’s left child, to make sure that we don’t violate rule 4).

Time Complexity and Space Complexity

Red-black trees are logarithmically efficient for insertion & deletion given how many rotations they require

  • Compared to AVL trees, red-black trees are less-perfectly balanced.
  • generally need fewer rotations during insertion/deletion with a red-black tree, which makes red-black trees superior for growing or shrinking a free, but less efficient for searching compared to AVL trees.
  • The best example of a red-black trees in use today is the Linux kernel’s Completely Fair Scheduler (CFS), which was introduced as recently as 2007. This scheduler handles resource allocation for executing process from within the CPU, and actually uses red-black trees under the hood!
  • The duality二元性 of color in red-black trees is a key part to all four of its rules, and an important part of keeping the tree balanced.
  • As we already know, each node in a red-black tree must be either red or black. This is effectively an added piece of information that every single node must keep track of; at first, this might seem silly, because we’re tracking yet another piece of information in each node, which takes up space!

Tracking the color of a node requires only 1 bit of information

  • However, tracking the color of a node requires only 1 bit of storage.
  • 1 bit is a single binary digit , is a super minuscule amount of space to store either the color red or black.

Story

Robert Sedgewick, who is now a professor at Princeton (and teaches an online algorithms course!), explained this story himself:

A lot of people ask why did we use the name red–black. Well, we invented this data structure, this way of looking at balanced trees, at Xerox PARC which was the home of the personal computer and many other innovations that we live with today entering[sic] graphic user interfaces, ethernet and object-oriented programmings[sic] and many other things. But one of the things that was invented there was laser printing and we were very excited to have nearby color laser printer that could print things out in color and out of the colors the red looked the best. So, that’s why we picked the color red to distinguish red links, the types of links, in three nodes.


C++的有序关联式容器底部支持

我容易弄混map和unordered_map的底部支持。学了红黑树之后,了解到rbt具有有序的特性,而unordered_map是一种无序的关联式容器,所以unordered_map是用hash table实现的!!!C++11以前GUN版本的unordered_map是叫做hash_map。

以下内容来自侯捷老师的《STL标准库与泛型编程》课程中的测试代码,演示了各种有序关联式容器的使用效果。

multiset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <set>
void test_multiset(long& value)
{
cout << "\ntest_multiset().......... \n";

multiset<string> c;
char buf[10];
clock_t timeStart = clock();
for (long i = 0; i< value; ++i)
{
try {
snprintf(buf, 10, "%d", rand());
c.insert(string(buf));
}
catch (exception& p) {
cout << "i=" << i << " " << p.what() << endl;
abort();
}
}
cout << "milli-seconds : " << (clock() - timeStart) << endl;
cout << "multiset.size()= " << c.size() << endl;
cout << "multiset.max_size()= " << c.max_size() << endl; //214748364

string target = get_a_target_string();
{
timeStart = clock();
auto pItem = find(c.begin(), c.end(), target); //比 c.find(...) 慢很多
cout << "std::find(), milli-seconds : " << (clock() - timeStart) << endl;
if (pItem != c.end())
cout << "found, " << *pItem << endl;
else
cout << "not found! " << endl;
}

{
timeStart = clock();
auto pItem = c.find(target); //比 std::find(...) 快很多
cout << "c.find(), milli-seconds : " << (clock() - timeStart) << endl;
if (pItem != c.end())
cout << "found, " << *pItem << endl;
else
cout << "not found! " << endl;
}
c.clear();
}

multiset

  • 最后清空容器

multimap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <map>
void test_multimap(long& value)
{
cout << "\ntest_multimap().......... \n";

multimap<long, string> c;
char buf[10];

clock_t timeStart = clock();
for (long i = 0; i< value; ++i)
{
try {
snprintf(buf, 10, "%d", rand());
//multimap 不可使用 [] 做 insertion
c.insert(pair<long, string>(i, buf));
}
catch (exception& p) {
cout << "i=" << i << " " << p.what() << endl;
abort();
}
}
cout << "milli-seconds : " << (clock() - timeStart) << endl;
cout << "multimap.size()= " << c.size() << endl;
cout << "multimap.max_size()= " << c.max_size() << endl; //178956970

long target = get_a_target_long();
timeStart = clock();
auto pItem = c.find(target);
cout << "c.find(), milli-seconds : " << (clock() - timeStart) << endl;
if (pItem != c.end())
cout << "found, value=" << (*pItem).second << endl;
else
cout << "not found! " << endl;

c.clear();
}

multimap

  • multimap 不可使用 [] 做 insertion,要用内置类型pair<key,vale> ,因为key是可能重复的。

set

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#include <set>
void test_set(long& value)
{
cout << "\ntest_set().......... \n";

set<string> c;
char buf[10];

clock_t timeStart = clock();
for (long i = 0; i< value; ++i)
{
try {
snprintf(buf, 10, "%d", rand());
c.insert(string(buf));
}
catch (exception& p) {
cout << "i=" << i << " " << p.what() << endl;
abort();
}
}
cout << "milli-seconds : " << (clock() - timeStart) << endl;
cout << "set.size()= " << c.size() << endl;
cout << "set.max_size()= " << c.max_size() << endl; //214748364

string target = get_a_target_string();
{
timeStart = clock();
auto pItem = find(c.begin(), c.end(), target); //比 c.find(...) 慢很多
cout << "std::find(), milli-seconds : " << (clock() - timeStart) << endl;
if (pItem != c.end())
cout << "found, " << *pItem << endl;
else
cout << "not found! " << endl;
}

{
timeStart = clock();
auto pItem = c.find(target); //比 std::find(...) 快很多
cout << "c.find(), milli-seconds : " << (clock() - timeStart) << endl;
if (pItem != c.end())
cout << "found, " << *pItem << endl;
else
cout << "not found! " << endl;
}
c.clear();
}

set

  • 这里set的实际大小是32768,因为每个key是不重复的,他的类型是long,4个字节,范围是0~32767,2^15-1.
  • 用容器对应的find()方法比std全局的find()方法要快很多,百万级数据是20ms的差别!

map

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <map>
void test_map(long& value)
{
cout << "\ntest_map().......... \n";

map<long, string> c;
char buf[10];

//
clock_t timeStart = clock();
for(long i=0; i< value; ++i)
{
try {
snprintf(buf, 10, "%d", rand());
c[i] = string(buf);
}
catch(exception& p) {
cout << "i=" << i << " " << p.what() << endl;
abort();
}
}
cout << "milli-seconds : " << (clock()-timeStart) << endl;
cout << "map.size()= " << c.size() << endl;
cout << "map.max_size()= " << c.max_size() << endl;

long target = get_a_target_long();
timeStart = clock();
auto pItem = c.find(target);
cout << "c.find(), milli-seconds : " << (clock()-timeStart) << endl;
if (pItem != c.end())
cout << "found, value=" << (*pItem).second << endl;
else
cout << "not found! " << endl;

c.clear();
}

map

  • 构建百万级数据容器的时候花了22秒,但是在搜索的时候很快!说明rbt很适合搜索!!
  • 而由于rbt比avl的平衡性差一些,所以在搜索方面avl更快,而rbt在建树速度上优于avl!!
  • 可以直接用[],c[i] = string(buf);。i是不重复的。

Resources

  1. Coursera Lecture 45 — Red-Black BSTs, Robert Sedgewick
  2. Generic Red-Black Tree, Dr. Richard Wiener
  3. Red-Black Trees, Professor Jim Skrentny
  4. Red/Black Tree Visualization, Professor David Galles
  5. Red-Black Tree Introduction, GeeksforGeeks
Thanks for Support.