Have been fiddling with HBase? or new to HBase? and well-versed with our very favourite Hive. Here is a quick post to get all the columns in a HBase table.
Here is the usecase, Hbase is :
So here keys are the columns and their value is tuple for that JSONObject. In realtime scenarios a HBase table has a whole lot of JSONObjects and each object has whol lots of key-values. Now the hard part is how to find how many keys are there to make it structured or to convert it to Table format or any other business use-case.
 
 
 
 
 
Here is the usecase, Hbase is :
Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. Apache HBase is an open-source, distributed, versioned, non-relational databaseSince its not a relational database so no concept of tables, rows, columns. Just data, that could be in any form of key value pair. So its like a JSON where each JSONObject may have unique data regardless of any fixed structure.
So here keys are the columns and their value is tuple for that JSONObject. In realtime scenarios a HBase table has a whole lot of JSONObjects and each object has whol lots of key-values. Now the hard part is how to find how many keys are there to make it structured or to convert it to Table format or any other business use-case.
- Step1:
You have this HBase table 
On Hbase shellhivehbasecreate 'hivehbase', 'ratings'put 'hivehbase', 'row1', 'ratings:userid', 'user1'put 'hivehbase', 'row1', 'ratings:bookid', 'book1'put 'hivehbase', 'row1', 'ratings:rating', '1'put 'hivehbase', 'row2', 'ratings:userid', 'user2'put 'hivehbase', 'row2', 'ratings:bookid', 'book1'put 'hivehbase', 'row2', 'ratings:rating', '3'put 'hivehbase', 'row3', 'ratings:userid', 'user2'put 'hivehbase', 'row3', 'ratings:bookid', 'book2'put 'hivehbase', 'row3', 'ratings:rating', '3'put 'hivehbase', 'row4', 'ratings:userid', 'user2'put 'hivehbase', 'row4', 'ratings:bookid', 'book4'put 'hivehbase', 'row4', 'ratings:rating', '1' 
2.Now just get the column family names as below:
hbase(main):017:0> describe 'hivehbase'
Table hivehbase is ENABLED
hivehbase
COLUMN FAMILIES DESCRIPTION
{NAME => 'ratings', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Table hivehbase is ENABLED
hivehbase
COLUMN FAMILIES DESCRIPTION
{NAME => 'ratings', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Here column family name is 'ratings'
3.Follow https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration to integrate HBase to Hive and add Hbase-Hive Storage Handler to Hive libs
 
 
On the Hive shell 
hive>add jar /usr/lib/hbase/lib/hbase-common.jar;
add jar /usr/lib/hbase/lib/hbase-client.jar;
add jar /usr/lib/hbase/lib/zookeeper.jar;
add jar /usr/lib/hbase/lib/hbase-common-0.98.0.2.1.1.0-385-hadoop2-tests.jar;
add jar /usr/lib/hbase/lib/guava-12.0.1.jar;
hive> CREATE EXTERNAL TABLE hbase_table_hive(peData map, row_key int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "ratings:,:key")
TBLPROPERTIES ("hbase.table.name" = "hivehbase"); 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "ratings:,:key")
TBLPROPERTIES ("hbase.table.name" = "hivehbase");
4.Now your HBase table is also visible on Hive as  a table. To view all the columns of this table simply perform a select:
hive>select * from hbase_table_hive; 
OK
{"bookid":"book1","rating":"1","userid":"user1"} NULL
{"bookid":"book1","rating":"3","userid":"user2"} NULL
{"bookid":"book2","rating":"3","userid":"user2"} NULL
{"bookid":"book4","rating":"1","userid":"user2"} NULL
It will list all the keys in the system in column family 'ratings' . Similarly you can get all the columns of all column families. If there is no value for a column it will show key and value is null.
OK
{"bookid":"book1","rating":"1","userid":"user1"} NULL
{"bookid":"book1","rating":"3","userid":"user2"} NULL
{"bookid":"book2","rating":"3","userid":"user2"} NULL
{"bookid":"book4","rating":"1","userid":"user2"} NULL
It will list all the keys in the system in column family 'ratings' . Similarly you can get all the columns of all column families. If there is no value for a column it will show key and value is null.
Also you can convert this map JSON list of columns to separate column in another Hive step.
  
