Thursday, May 20, 2021

How BTree speeds up a Database

When data size increases you will realize your database queries responds slowly. Lets talk how databases handles such situations using B-Trees.


Disk Drive:

Each data access call needs to read/write to Disk Drive. 
Hard Disk Drive has circular disks divided in Tracks & Sectors. An area marked between a Track & a Sector is a Block. A Block is of usually 512 bytes and contains data in from of bits and is read via a Disk Arm . 

Disk Arm
Sectors

To access any data you would need Block Address(based on Track & Sector) & Offset in the Block to identify the data/record address.

Database:

Enough about Disk Drives, lets talk about Database. Consider a database table Employee having columns EmpId, Name, Address. Each column taking 40 bytes, so each record totaling 120 bytes. With 1000 such records we would need 120x1000 = 120 000 Bytes = 120 KB space that would need 120,000/512  = 234 blocks on Hard Disk. A database query will require to search through all 234 Blocks to get the required record.

Indexing:

To speed up the record search lets bring in Indexing, for that we need a index table IndexTable1 that points to Record Address. Each index will then have  a EmpId(40 Byte) and a Record Address Pointer(10Byte) totaling 50 Byte. 50Byte *1000Records = 50 000 Bytes which will need only 50000/512 = 100 Blocks thus reducing search time in database with a trade of extra space. Each block can store 10 Records.



Multi Level Indexing:

We need database to respond in no time, current approach still takes much time, lets bring in multi level indexing . Lets bring another index table IndexTable2 that keeps track of Disk block that keeps that index record. It tells that EmpId 1 to 10 are stored in Block 1 and so on. This reduces time of access by a lot of margin. Similarly more Index Tables can be brought in to further speed up the data access.
On further analysis you will identify indexes are just rotated trees.

M-Way Search Trees:

M-way Search Trees are just like Binary Search Trees with a difference that it can have more than 2 children. An M-way search tree will have utmost M children and M-1 keys. The keys are in sorted order such that k1 < k2 < k3<...<km-1 .
Here in root node ,keys are in sorted order and like BST all values less than 10 falls in its left children & values greater than 10 & less than 20( 10 < child node < 20 ) lies in child node between them , this goes on for all the nodes.
For our index tree will look something like this including one Record Pointer through each key.
2-3 tree is a M-way Search Tree with M=3 .

Though the M-way Search Tree looks fast but it has no rule for filling keys or filling child nodes first. This could degrade performance wildly.

B-Tree:

B-Tree is a M-way Search Tree & resolves issues faced with M-way Search Tree with certain rules. Rules are:
  • M/2 keys must be added to a node before creating its child nodes
  • Root Node must have at-least 2 children
  • Leaf nodes are at the same level
  • Bottom up creation of tree
Now try adding few records to our B-Tree with M=4 to see the rules in action. 
[10, 20, 40, 50, 60 , 3, 70, 80, 15, 5, 12] 
  • Step 1: Create a root node & add key 10 to it
  • Step 2: Since root node has available space lets add 20 to root Node in sorted order
  • Step 3: Root Node has available space for 1 more key lets add 40 to it. Root Node is full now.


  • Step 4: Since root is full lets break down root node into [10,20] & [50]. Sending mid element 40 up making it the root node.


  • Step 5: Lookup where 60 will be placed, since 60 is greater than 50 & its node has available space lets add 60 besides 50
  • Step 6: Lookup 3, since 3 is less than 10, it is placed before 10
  • Step 7: Lookup 70, since 70 is greater than 60, placing it next to 60    
  •  

  • Step 8: Lookup 80, 80 is greater than 80, but 70's node is full with [50,60,70] & 80 . Lets split into [50,60] & [80] & sending mid element 70 up in Root Node.


  • Step 9: Lookup 15, 15 falls with [3,10,20]. Lets sort it to 3, 10,15,20 . Split into two [3,10] & [20]. Sending 15 mid element up to root node.


  • Step 10: Lookup 5, 5 falls in [3.10] Node & added to it.


  • Step 11; Lookup 12, it falls in [3,5,10] Node. Split it into [3,5]& [10,12]& sending mid element 10 up. Root node is full hence split into [10,15] & [70] & sending mid element 40 up creating it th root node.


B+Tree

This is same as B-Tree only difference is :
  • Leaf nodes has records/data
  • All keys are present in leaf node, duplicates may exist
  • All leafs are connected as a LinkedList
BTree & B+Tree are widely used in optimizing database access optimization. CloudDb, Postgres, MySQL, SQLite & many.


References:
https://www.youtube.com/channel/UCZCFT11CWBi3MHNlGf019nw
https://blogs.umass.edu/
https://dba.stackexchange.com

No comments:

Post a Comment