Thursday, June 10, 2021

StackOverflow Architechture


StackOverflow.com is a widely used question and answer website for professional and enthusiast programmers. It is the flagship site of the Stack Exchange Network. The scale of this site can be understood by looking at its user base stats



With a hopping user base of 15million , Stack Overflow tops the chart. As per today overall Stack Exchange network gets 1.3 billion page views per month, that means 55 TB data / month. Designing the architecture of such a site network is a humongous task in itself.

What is going to blow your mind ,Stack Exchange employs a two tier architecture, no sharding no Big Data. It’s fascinating that they run one of the most trafficked pages (that also uses long-polling “real-time” messaging) on just 25 servers, most of them on 10% load all the time.

StackOverflow's Nick Craver, Joel, Cecconi are a big advocate of Scaling Up. At a time when world is moving to Scaling Out by adding more machines, StackOverflow still lives in an old world of Scaling Up since they are getting best performance out of it with just 5% average CPU load on each of their servers and achieving 15ms page load time. Lets talk how do they achieve this level of performance while serving 1.3 page views per month:

Caching



Caching is done at multiple levels. 
  • One is at Browser/Client level by caching images, javascripts,etc until their TTL(Time to live). 
  • At Load balancer level, load balancer serves cached requests instead of making another query to the webserver behind it. This caching results in improved response times for those requests and less load on the webserver.
  • Via Redis, Redis Server caches data in-memory rather than frequent database hits. This caching is made highly consistent such that all data is in-memory & thus reducing database footprint. In-memory caching saves 10x time compared to databse hits .
  • At database level, Sql Server also provides a level of internal caching to avoid frequent disk calls
  • Using Elastic Search, which is an index server that powers quick searching. 90% of the queries are read only. This is the Read only server of Stackoverflow that help render pageviews real quick.

Scale Up

StackOverflow ensures High Availability on all its services by maintaining layers of redundancy. There has been centuries old debate regarding whether to scale up or scale out. 

Scale out ,being the newest counterpart,is also the base of BigData & is proven to be performant for very large systems. Thats why everybody is rushing towards it . Scale out is Horizontally scaling, i.e. distributing the heavy jobs processing into multiple small nodes/machines.

Scale Up (Vertical Scaling) is increasing the machine's processing power(Memory,CPU) rather than adding more machines. 

Why StackOverflow choose to Scale Up?

At StackOverflow, they believe 'Performance is a feature'. To get the best performance they maintain their own Servers and scale it up when needed. 

When a new requirement comes up for more resources, they decide on these parameters :

  • How much Redundancy do we need?
  • Do we actually need to Scale Out?
  • How much disk storage will be used?
    • SSDs or Disk Drives
  • How much Memory ?
    • Memory intensive application?
    • Parallel usage?
  • What CPU?
    • Cores based on parallelism
  • Network, whats the transaction rate?
  • How much Redundancy do we need?
Based on these questions they majorly drill down to one single machine with large RAM & great processing. Most of their servers are running at an average 5% CPU load. Thats a blow to all the Horizontal Scaling followers.

Shared Nothing Architechture

Another factor of StackOverflow's performance is its Shared Nothing Architechture.


According to the definition by Wikipedia, “A shared-nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.”

StackOverflow distributes work evenly among servers, an Elastic Search Server for faster reads, Writable DB server for Update Operations, Caching at each layer.
Dedicated server per task help get the best performance out of each server.


References:
  • https://stackoverflow.com/company
  • https://www.dev-metal.com/architecture-stackoverflow/
  • https://mobisoftinfotech.com/resources/mguide/shared-nothing-architecture-the-key-to-scale-your-app-to-millions-of-users/
  • https://www.infoq.com/news/2020/04/Stack-Overflow-New-Architecture/
  • https://nickcraver.com/blog/2016/03/29/stack-overflow-the-hardware-2016-edition/

Thursday, May 20, 2021

How BTree speeds up a Database

When data size increases you will realize your database queries responds slowly. Lets talk how databases handles such situations using B-Trees.


Disk Drive:

Each data access call needs to read/write to Disk Drive. 
Hard Disk Drive has circular disks divided in Tracks & Sectors. An area marked between a Track & a Sector is a Block. A Block is of usually 512 bytes and contains data in from of bits and is read via a Disk Arm . 

Disk Arm
Sectors

To access any data you would need Block Address(based on Track & Sector) & Offset in the Block to identify the data/record address.

Database:

Enough about Disk Drives, lets talk about Database. Consider a database table Employee having columns EmpId, Name, Address. Each column taking 40 bytes, so each record totaling 120 bytes. With 1000 such records we would need 120x1000 = 120 000 Bytes = 120 KB space that would need 120,000/512  = 234 blocks on Hard Disk. A database query will require to search through all 234 Blocks to get the required record.

Indexing:

To speed up the record search lets bring in Indexing, for that we need a index table IndexTable1 that points to Record Address. Each index will then have  a EmpId(40 Byte) and a Record Address Pointer(10Byte) totaling 50 Byte. 50Byte *1000Records = 50 000 Bytes which will need only 50000/512 = 100 Blocks thus reducing search time in database with a trade of extra space. Each block can store 10 Records.



Multi Level Indexing:

We need database to respond in no time, current approach still takes much time, lets bring in multi level indexing . Lets bring another index table IndexTable2 that keeps track of Disk block that keeps that index record. It tells that EmpId 1 to 10 are stored in Block 1 and so on. This reduces time of access by a lot of margin. Similarly more Index Tables can be brought in to further speed up the data access.
On further analysis you will identify indexes are just rotated trees.

M-Way Search Trees:

M-way Search Trees are just like Binary Search Trees with a difference that it can have more than 2 children. An M-way search tree will have utmost M children and M-1 keys. The keys are in sorted order such that k1 < k2 < k3<...<km-1 .
Here in root node ,keys are in sorted order and like BST all values less than 10 falls in its left children & values greater than 10 & less than 20( 10 < child node < 20 ) lies in child node between them , this goes on for all the nodes.
For our index tree will look something like this including one Record Pointer through each key.
2-3 tree is a M-way Search Tree with M=3 .

Though the M-way Search Tree looks fast but it has no rule for filling keys or filling child nodes first. This could degrade performance wildly.

B-Tree:

B-Tree is a M-way Search Tree & resolves issues faced with M-way Search Tree with certain rules. Rules are:
  • M/2 keys must be added to a node before creating its child nodes
  • Root Node must have at-least 2 children
  • Leaf nodes are at the same level
  • Bottom up creation of tree
Now try adding few records to our B-Tree with M=4 to see the rules in action. 
[10, 20, 40, 50, 60 , 3, 70, 80, 15, 5, 12] 
  • Step 1: Create a root node & add key 10 to it
  • Step 2: Since root node has available space lets add 20 to root Node in sorted order
  • Step 3: Root Node has available space for 1 more key lets add 40 to it. Root Node is full now.


  • Step 4: Since root is full lets break down root node into [10,20] & [50]. Sending mid element 40 up making it the root node.


  • Step 5: Lookup where 60 will be placed, since 60 is greater than 50 & its node has available space lets add 60 besides 50
  • Step 6: Lookup 3, since 3 is less than 10, it is placed before 10
  • Step 7: Lookup 70, since 70 is greater than 60, placing it next to 60    
  •  

  • Step 8: Lookup 80, 80 is greater than 80, but 70's node is full with [50,60,70] & 80 . Lets split into [50,60] & [80] & sending mid element 70 up in Root Node.


  • Step 9: Lookup 15, 15 falls with [3,10,20]. Lets sort it to 3, 10,15,20 . Split into two [3,10] & [20]. Sending 15 mid element up to root node.


  • Step 10: Lookup 5, 5 falls in [3.10] Node & added to it.


  • Step 11; Lookup 12, it falls in [3,5,10] Node. Split it into [3,5]& [10,12]& sending mid element 10 up. Root node is full hence split into [10,15] & [70] & sending mid element 40 up creating it th root node.


B+Tree

This is same as B-Tree only difference is :
  • Leaf nodes has records/data
  • All keys are present in leaf node, duplicates may exist
  • All leafs are connected as a LinkedList
BTree & B+Tree are widely used in optimizing database access optimization. CloudDb, Postgres, MySQL, SQLite & many.


References:
https://www.youtube.com/channel/UCZCFT11CWBi3MHNlGf019nw
https://blogs.umass.edu/
https://dba.stackexchange.com

Tuesday, May 11, 2021

Designing Tiktok



Build a TikTok of your own. TikTok is a short-form, video-sharing app that allows users to create and share 15-second videos, on any topic.


Courtesy of Bytedance

Developing an app that can let users create & share videos could be a cakewalk for many seasoned developers of you, but designing the complete infra of an app with 100 million monthly active users is a different game.



Lets walk through a small story to understand the Logical Design :

X created a lovely Video Sharing App after months of hard work. His 2 friends liked it & started using it.

His friends uploaded daily 2 videos that gets stored on a 10GB Server Space that X bought.

Number of users start increasing once X's app is on TV, a sudden increase of 10,000 users. Server Space soon exhausted & the new users are unable to upload new videos.

X bought more server space, but that too soon exhausted with increasing users.

X realized he need to reduce video size before upload. He wrote a smart code that reduces video size without compromising on video quality. Now clients have to do too much work before upload so X moved this code to its Application Servers.

X bought new Application Servers with high processing powers for quick turnaround time. (Sponsors were attracted).

Another TV ad brought total users to be 1million, things started falling apart, Application Server couldn't reply all of the requests & finally it went down. Users were angry, Sponsors were unhappy.

X realized he needs Replicas of Application Servers such that when one Application Server dies another takes up its place & user has a notion that they are being served by single server . X brought 10 new Application Servers & a Load Balancer that evenly distributed load among the Application Server. Everyone is happy again.

With more Server Replicas in place , each video is being uploaded to all the Application Server which takes a lot of time, X wants the system to be quick & consistent. He decides the app can be eventually consistent by copying the video file to 2/3rd of servers and rest can keep uploading in background. This made the app availability high.

Each Server replica communicates to each other via Network, X designed a wonderful algorithm to make the system Network Failure Tolerant by integrating polling, retries.

As the system scales, more are the components required to handle the load. This story covers the Logical Designing of the app, majorly involving CAP Theorem.

Friday, April 15, 2016

Extracting all HBase columns

Have been fiddling with HBase? or new to HBase? and well-versed with our very favourite Hive. Here is a quick post to get all the columns in a HBase table.

Here is the usecase, Hbase is :
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. Apache HBase is an open-source, distributed, versioned, non-relational database
 Since its not a relational database so no concept of tables, rows, columns. Just data, that could be in any form of key value pair. So its like a JSON where each JSONObject may have unique data regardless of any fixed structure.

So here keys are the columns and their value is tuple for that JSONObject. In realtime scenarios a HBase table has a whole lot of JSONObjects and each object has whol lots of key-values. Now the hard part is how to find how many keys are there to make it structured or to convert it to Table format or any other business use-case.

  1. Step1:
You have this HBase table hivehbase
On Hbase shell
create 'hivehbase', 'ratings'
put 'hivehbase', 'row1', 'ratings:userid', 'user1'
put 'hivehbase', 'row1', 'ratings:bookid', 'book1'
put 'hivehbase', 'row1', 'ratings:rating', '1'
 
put 'hivehbase', 'row2', 'ratings:userid', 'user2'
put 'hivehbase', 'row2', 'ratings:bookid', 'book1'
put 'hivehbase', 'row2', 'ratings:rating', '3'
 
put 'hivehbase', 'row3', 'ratings:userid', 'user2'
put 'hivehbase', 'row3', 'ratings:bookid', 'book2'
put 'hivehbase', 'row3', 'ratings:rating', '3'
 
put 'hivehbase', 'row4', 'ratings:userid', 'user2'
put 'hivehbase', 'row4', 'ratings:bookid', 'book4'
put 'hivehbase', 'row4', 'ratings:rating', '1'
 

Create column family by name 'ratings' and columns inside it

 
2.Now just get the column family names as below:

hbase(main):017:0> describe 'hivehbase'
Table hivehbase is ENABLED                                                                                                                  
hivehbase                                                                                                                                   
COLUMN FAMILIES DESCRIPTION                                                                                                                 
{NAME => 'ratings', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}       

Here column family name is 'ratings'
 
 
3.Follow https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration to integrate HBase to Hive and add Hbase-Hive Storage Handler to Hive libs
On the Hive shell 
hive>add jar /usr/lib/hbase/lib/hbase-common.jar;
add jar /usr/lib/hbase/lib/hbase-client.jar;
add jar /usr/lib/hbase/lib/zookeeper.jar;
add jar /usr/lib/hbase/lib/hbase-common-0.98.0.2.1.1.0-385-hadoop2-tests.jar;
add jar /usr/lib/hbase/lib/guava-12.0.1.jar;
hive> CREATE EXTERNAL TABLE hbase_table_hive(peData map, row_key int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "ratings:,:key")
TBLPROPERTIES ("hbase.table.name" = "hivehbase");
 
 
4.Now your HBase table is also visible on Hive as  a table. To view all the columns of this table simply perform a select:
hive>select * from hbase_table_hive;
OK
{"bookid":"book1","rating":"1","userid":"user1"}    NULL
{"bookid":"book1","rating":"3","userid":"user2"}    NULL
{"bookid":"book2","rating":"3","userid":"user2"}    NULL
{"bookid":"book4","rating":"1","userid":"user2"}    NULL

It will list all the keys in the system in column family 'ratings' . Similarly you can get all the columns of all column families. If there is no value for a column it will show key and value is null. 
Also you can convert this map JSON list of columns to separate column in another Hive step.

Friday, March 25, 2011

/dev/random/2


yes my blog has a new address now dotSlashA.in 
yes you guessed it right! & in-case u dint guessed here is what ./a.in is
"an antonym to ./a.out"
which again expresses my love for open-source.
Oh c'mon ./a.out is ... defined at Wikipedia.

So nowadays spending less time on SO & busy in Moving all my stuffs to ./a.in.

Though I believe ./a.in should be a collaborative site to post rammblings but.. may be later Or.. yes mail it to admin@dotSlashA.in to express interest to have a username created for you start contributing.

& yeah Annual Stack Overflow Meetup Day! in Indore!! Count me in.

./Saurabh

Wednesday, February 2, 2011

/dev/random



So Again its really long time since my last post.


Amazing this time is  this guy whose blog forced me to write this post.Okay enough praising.


Did we talked about commandlineflu?Nope ! I think I forgot .It's been one of the longest pending post idea .So start tricking other Users by 


Here ends my random post generator.
bash: ./saurabh: command not found

Wednesday, December 22, 2010

The Fun Part



So it isn't new but .. sounds great 
so lets go ahead ..

Here’s where it gets fun: many of these devices use hard-coded SSL keys that are baked into the firmware. That means that if Alice and Bob are both using the same router with the same firmware version, then both of their routers have the same SSL keys. All Eve needs to do in order to decrypt their traffic is to download the firmware from the vendor’s Web site and extract the SSL private key from the firmware image.