B tree,B+ tree,B* tree,RBT,AVL

whats B tree?

  • self balancing search tree.
  • each node can hold multiple keys.
  • all leaf nodes are the same distance from the root.
  • an extension of self-balancing binary search trees.
  • allow each node to hold multiple keys and have multiple children.
  • designed to take advantage of systems that can read and write in large blocks, and are commonly used in databases and file systems.

B-Tree,RBT or AVL tree?

  • When using a RB tree, B- tree or an AVL tree?
  • What are the key points before deciding on the choice?
  • explain with a scenario .

Answer

  • B-tree 适合存海量数据

    • When you’re managing more than thousands of items and you’re paging them from a disk or some slow storage medium.
    • Can have variable number of children which allow it to hold many records but still maintain a short height tree.
    • A good general-purpose ordered container data structure, even in main memory. Even when virtual memory isn’t an issue, cache-friendliness缓存友好 often is.
    • B+ trees are particularly good for sequential access - the same asymptotic渐近 performance as a linked list, but with cache-friendliness close to a simple array. All this and O(log n) search, insert and delete.
    • B+ trees do have problems, though - such as the items moving around within nodes when you do inserts/deletes, invalidating pointers to those items. this means you get fairly easy O(n) merging of trees - convert both trees to lists, merge them, then convert back to a tree.
  • RB tree 适合增删

    • when you’re doing fairly frequent inserts, deletes and retrievals检索 on the tree.
    • Has less strict rules around rebalancing which make insertions/deletions quicker than AVL tree.
    • RB trees also have better performance O(1) on the rebalance which makes them more suitable for persistent datastructures with roll-back and roll-forward.
  • AVL tree 适合检索

    • when your inserts and deletes are infrequent relative to your retrievals.
    • More strictly balanced so lookups are faster than RB tree.

B-Tree vs Hash Table

In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n)).

On the other hand, accessing an element in a hash table is in O(1).

Why is a hash table not used instead of a b-tree in order to access data inside a database?

Answer

  • You can only access elements by their primary key in a hashtable.
  • This is faster than with a tree algorithm (O(1) instead of log(n)), but you cannot select ranges (everything in between xand y). Tree algorithms support this in Log(n) whereas hash indexes can result in a full table scan O(n).
  • Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists). Also tree algorithms are usually easier to maintain, grow with data, scale, etc.
  • Hash indexes work with pre-defined hash sizes, so you end up with some “buckets” where the objects are stored in. These objects are looped over again to really find the right one inside this partition.
  • Todays hash tables algorithms usually scale, but scaling can be inefficient.

There are indeed scalable hashing algorithms. they evolved(进化) from scalable replication where re-hashing is not easy.

Its called RUSH - Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.

  • However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.
  • The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.

B trees and B+ trees?

  • In a b- tree you can store both keys and data in the internal and leaf nodes
  • In a b+ tree you have to store the data in the leaf nodes only.

Is there any advantage of doing the above in a b+ tree?

Why not use b-trees instead of b+ trees everywhere, as intuitively they seem much faster?

I mean, why do you need to replicate the key(data) in a b+ tree?

Answear

B and B+ tree

Advantages of B+ trees:

  • B+ trees don’t have data associated with interior内部 nodes, more keys can fit on a page of memory. Therefore, it will require fewer cache misses in order to access data that is on a leaf node.
  • The leaf nodes of B+ trees are linked, so doing a full scan of all objects in a tree requires just one linear pass through all the leaf nodes. A B tree would require a traversal of every level in the tree. This full-tree traversal will likely involve more cache misses than the linear traversal of B+ leaves.

Advantage of B trees:

  • Because B trees contain data with each key, frequently accessed nodes can lie closer to the root, and therefore can be accessed more quickly.

What’s a B*Tree?

Did they just mean binary search tree?

Answer

  • No.
  • A node in a B*Tree can have many keys (which point to many children). They operate by comparing keys in order to select a child node, much like a binary tree. But, the intent is that each node is stored on disk, and can be read into memory at once. Thus, the number of disk accesses required would match the depth of the tree.

  • Note that the * indicates the nodes are at least 2/3 full.


Reference

  1. When to choose RB tree, B-Tree or AVL tree?
  2. B-Tree vs Hash Table
  3. Differences between B trees and B+ trees
  4. What’s a B*Tree?

to be continued…

Further Reading

  1. Painting Nodes Black With Red-Black Trees

  2. Busying Oneself With B-Trees

  3. How does a red-black tree work?

  4. How to easily remember Red-Black Tree insert and delete?

  5. Red Black Tree versus B Tree

  6. Applications of red-black trees

  7. Red-Black Trees

  8. What additional rotation is required for deletion from a Top-Down 2-3-4 Left-leaning Red Black tree?

  9. Statistical performance of purely functional maps and sets

  10. Red black tree over avl tree

  11. Difference between red-black trees and AVL trees

  12. bitmap

Thanks for Support.