Thursday, May 20, 2021

How BTree speeds up a Database

When data size increases you will realize your database queries responds slowly. Lets talk how databases handles such situations using B-Trees.


Disk Drive:

Each data access call needs to read/write to Disk Drive. 
Hard Disk Drive has circular disks divided in Tracks & Sectors. An area marked between a Track & a Sector is a Block. A Block is of usually 512 bytes and contains data in from of bits and is read via a Disk Arm . 

Disk Arm
Sectors

To access any data you would need Block Address(based on Track & Sector) & Offset in the Block to identify the data/record address.

Database:

Enough about Disk Drives, lets talk about Database. Consider a database table Employee having columns EmpId, Name, Address. Each column taking 40 bytes, so each record totaling 120 bytes. With 1000 such records we would need 120x1000 = 120 000 Bytes = 120 KB space that would need 120,000/512  = 234 blocks on Hard Disk. A database query will require to search through all 234 Blocks to get the required record.

Indexing:

To speed up the record search lets bring in Indexing, for that we need a index table IndexTable1 that points to Record Address. Each index will then have  a EmpId(40 Byte) and a Record Address Pointer(10Byte) totaling 50 Byte. 50Byte *1000Records = 50 000 Bytes which will need only 50000/512 = 100 Blocks thus reducing search time in database with a trade of extra space. Each block can store 10 Records.



Multi Level Indexing:

We need database to respond in no time, current approach still takes much time, lets bring in multi level indexing . Lets bring another index table IndexTable2 that keeps track of Disk block that keeps that index record. It tells that EmpId 1 to 10 are stored in Block 1 and so on. This reduces time of access by a lot of margin. Similarly more Index Tables can be brought in to further speed up the data access.
On further analysis you will identify indexes are just rotated trees.

M-Way Search Trees:

M-way Search Trees are just like Binary Search Trees with a difference that it can have more than 2 children. An M-way search tree will have utmost M children and M-1 keys. The keys are in sorted order such that k1 < k2 < k3<...<km-1 .
Here in root node ,keys are in sorted order and like BST all values less than 10 falls in its left children & values greater than 10 & less than 20( 10 < child node < 20 ) lies in child node between them , this goes on for all the nodes.
For our index tree will look something like this including one Record Pointer through each key.
2-3 tree is a M-way Search Tree with M=3 .

Though the M-way Search Tree looks fast but it has no rule for filling keys or filling child nodes first. This could degrade performance wildly.

B-Tree:

B-Tree is a M-way Search Tree & resolves issues faced with M-way Search Tree with certain rules. Rules are:
  • M/2 keys must be added to a node before creating its child nodes
  • Root Node must have at-least 2 children
  • Leaf nodes are at the same level
  • Bottom up creation of tree
Now try adding few records to our B-Tree with M=4 to see the rules in action. 
[10, 20, 40, 50, 60 , 3, 70, 80, 15, 5, 12] 
  • Step 1: Create a root node & add key 10 to it
  • Step 2: Since root node has available space lets add 20 to root Node in sorted order
  • Step 3: Root Node has available space for 1 more key lets add 40 to it. Root Node is full now.


  • Step 4: Since root is full lets break down root node into [10,20] & [50]. Sending mid element 40 up making it the root node.


  • Step 5: Lookup where 60 will be placed, since 60 is greater than 50 & its node has available space lets add 60 besides 50
  • Step 6: Lookup 3, since 3 is less than 10, it is placed before 10
  • Step 7: Lookup 70, since 70 is greater than 60, placing it next to 60    
  •  

  • Step 8: Lookup 80, 80 is greater than 80, but 70's node is full with [50,60,70] & 80 . Lets split into [50,60] & [80] & sending mid element 70 up in Root Node.


  • Step 9: Lookup 15, 15 falls with [3,10,20]. Lets sort it to 3, 10,15,20 . Split into two [3,10] & [20]. Sending 15 mid element up to root node.


  • Step 10: Lookup 5, 5 falls in [3.10] Node & added to it.


  • Step 11; Lookup 12, it falls in [3,5,10] Node. Split it into [3,5]& [10,12]& sending mid element 10 up. Root node is full hence split into [10,15] & [70] & sending mid element 40 up creating it th root node.


B+Tree

This is same as B-Tree only difference is :
  • Leaf nodes has records/data
  • All keys are present in leaf node, duplicates may exist
  • All leafs are connected as a LinkedList
BTree & B+Tree are widely used in optimizing database access optimization. CloudDb, Postgres, MySQL, SQLite & many.


References:
https://www.youtube.com/channel/UCZCFT11CWBi3MHNlGf019nw
https://blogs.umass.edu/
https://dba.stackexchange.com

Tuesday, May 11, 2021

Designing Tiktok



Build a TikTok of your own. TikTok is a short-form, video-sharing app that allows users to create and share 15-second videos, on any topic.


Courtesy of Bytedance

Developing an app that can let users create & share videos could be a cakewalk for many seasoned developers of you, but designing the complete infra of an app with 100 million monthly active users is a different game.



Lets walk through a small story to understand the Logical Design :

X created a lovely Video Sharing App after months of hard work. His 2 friends liked it & started using it.

His friends uploaded daily 2 videos that gets stored on a 10GB Server Space that X bought.

Number of users start increasing once X's app is on TV, a sudden increase of 10,000 users. Server Space soon exhausted & the new users are unable to upload new videos.

X bought more server space, but that too soon exhausted with increasing users.

X realized he need to reduce video size before upload. He wrote a smart code that reduces video size without compromising on video quality. Now clients have to do too much work before upload so X moved this code to its Application Servers.

X bought new Application Servers with high processing powers for quick turnaround time. (Sponsors were attracted).

Another TV ad brought total users to be 1million, things started falling apart, Application Server couldn't reply all of the requests & finally it went down. Users were angry, Sponsors were unhappy.

X realized he needs Replicas of Application Servers such that when one Application Server dies another takes up its place & user has a notion that they are being served by single server . X brought 10 new Application Servers & a Load Balancer that evenly distributed load among the Application Server. Everyone is happy again.

With more Server Replicas in place , each video is being uploaded to all the Application Server which takes a lot of time, X wants the system to be quick & consistent. He decides the app can be eventually consistent by copying the video file to 2/3rd of servers and rest can keep uploading in background. This made the app availability high.

Each Server replica communicates to each other via Network, X designed a wonderful algorithm to make the system Network Failure Tolerant by integrating polling, retries.

As the system scales, more are the components required to handle the load. This story covers the Logical Designing of the app, majorly involving CAP Theorem.