Big Data

Introduction to Hadoop, Hive, and HBase

September 6, 2016October 30, 2024 by Kinshuk Dutta

Introduction to Hadoop, Hive, and HBase

Objective

By the end of this guide, you will have installed Hadoop, Hive, and HBase on your Mac, and you’ll be ready to start implementing Big Data projects. This blog covers installation steps, configuration instructions, a proposed architecture framework, sample projects, and suggestions for further learning.

Introduction to Hadoop, Hive, and HBase
Setting Up Hadoop, Hive, and HBase on macOS Sierra
- Prerequisites
- Installing Hadoop
- Installing Hive
- Installing HBase
Proposed Architecture Framework
Sample Project: Log Analysis with Hadoop, Hive, and HBase
Data Flow Architecture Diagram
Project Folder Structure
Next Steps in Learning Hadoop, Hive, and HBase

1. Introduction to Hadoop, Hive, and HBase

Hadoop, Hive, and HBase are core components of the Big Data ecosystem:

Hadoop: An open-source distributed framework for storing and processing large datasets.
Hive: A data warehousing and SQL-like query language layer on top of Hadoop.
HBase: A distributed, scalable, big data store built on top of Hadoop.

These tools work together to manage and analyze Big Data by storing, querying, and retrieving data in efficient, structured, and scalable ways.

2. Setting Up Hadoop, Hive, and HBase on macOS Sierra

Prerequisites

Hardware:

Model: MacBook Pro (MacBookPro12,1)
Processor: Intel Core i7, 3.1 GHz, 2 cores
Memory: 16 GB

Software:

OS: macOS Sierra – OS X 10.12
Package Manager: Homebrew 1.18
Java: JDK 1.8 or later (required for Hadoop)

Installing Hadoop

Set JAVA_HOME
- Verify Java installation:
  
  bash
  
  $ which java $ java -version
- Set JAVA_HOME:
  
  bash
  
  export JAVA_HOME=/Library/Java/Home echo $JAVA_HOME
- For persistence, add JAVA_HOME to your ~/.profile:
  
  bash
  
  echo "export JAVA_HOME=/Library/Java/Home" >> ~/.profile source ~/.profile
Install Homebrew
- Open Terminal and run:
  
  bash
  
  /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
Install Hadoop

bash

brew install hadoop
Configure Hadoop
- Edit hadoop-env.sh (located at /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/).
- Set HDFS directories in core-site.xml, mapred-site.xml, and hdfs-site.xml.
Start Hadoop

bash

hdfs namenode -format hstart

Installing Hive

Install Hive

bash

brew install hive
Configure Hive
- Set environment variables in ~/.bashrc:
  
  bash
  
  export HADOOP_HOME=/usr/local/Cellar/hadoop/3.3.0/libexec export HIVE_HOME=/usr/local/Cellar/hive/2.7.1/libexec
- Configure hive-site.xml for MySQL metastore and JDBC driver.
Start Hive

bash

hive hive> show tables;

Installing HBase

Install Zookeeper (required by HBase)

bash

brew install zookeeper brew services start zookeeper
Install HBase

bash

brew install hbase brew services start hbase
Start HBase

bash

hbase shell

3. Proposed Architecture Framework

Here’s a high-level architecture framework using Hadoop, Hive, and HBase.

Data Ingestion
- Source: Logs or streaming data, ingested using HDFS.
Data Processing with Hadoop
- Tools: Hadoop MapReduce and Hive.
- Function: Perform ETL, run SQL-like queries using Hive for data aggregation.
Data Storage in HBase
- Use Case: Real-time access for specific data subsets.
- Integration: Hive queries pull HDFS data into HBase for fast querying.
Data Analytics and Visualization
- Tools: Hive for reporting, HBase for rapid data access.
- Visualization: Connect to tools like Grafana for live dashboards.

4. Sample Project: Log Analysis with Hadoop, Hive, and HBase

Objective: Perform log analysis to monitor and flag unusual activity patterns in application logs.

Step 1: Set Up Data Ingestion with Hadoop

Load Log Files to HDFS

bash

hdfs dfs -mkdir /logs hdfs dfs -put /path/to/log/files /logs
Define Hive Table for Log Data

sql

CREATE EXTERNAL TABLE logs ( timestamp STRING, log_level STRING, message STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/logs';

Step 2: Process Data with Hive

Analyze Logs by Severity

sql

SELECT log_level, COUNT(*) AS count FROM logs GROUP BY log_level;
Filter and Save Suspicious Logs to HBase

bash

INSERT INTO TABLE hbase_table SELECT * FROM logs WHERE log_level = 'ERROR';

Step 3: Visualize Data

Use Grafana or a similar tool to set up dashboards for monitoring log trends.

5. Data Flow Architecture Diagram

6. Project Folder Structure

Below is a suggested folder structure for this project to keep code organized.

lua

log-analysis/
│
├── src/
│   ├── main/
│   │   ├── hadoop/
│   │   │   └── logAnalysisJob.scala
│   │   ├── hive/
│   │   │   └── hiveQueries.sql
│   │   └── hbase/
│   │       └── hbaseStore.scala
├── resources/
│   └── config/
│       ├── hadoop-env.sh
│       ├── hive-site.xml
│       └── hbase-site.xml
├── data/
│   └── logs/ 
│       └── sample.log
├── docs/
│   └── architecture-diagram.png
└── README.md

src/hadoop: Hadoop job files.
src/hive: Hive query files.
src/hbase: Scripts to interact with HBase.
resources/config: Configuration files for Hadoop, Hive, and HBase.
data: Directory for sample log files.

7. Next Steps in Learning Hadoop, Hive, and HBase

Deep Dive into Hadoop Ecosystem
- Learn advanced MapReduce concepts, YARN, and HDFS optimizations.
Mastering Hive
- Practice writing complex Hive queries, working with partitions and bucketing.
Real-Time Applications with HBase
- Study HBase architecture and explore schema design best practices.
Build End-to-End Projects
- Implement full data pipelines, integrating Hadoop, Hive, and HBase.
Learn Visualization
- Connect Hadoop/Hive/HBase with BI tools like Grafana or Power BI for real-time analytics.

Conclusion

With this guide, you now have Hadoop, Hive, and HBase installed and ready to use. You’ve also set up a sample project for analyzing log data, which showcases how to store and analyze Big Data using these tools. In future posts, we’ll cover deeper aspects of Hadoop, Hive, and HBase, including advanced configurations, optimization, and real-time applications. Happy Big Data journey!

Data-Nizant

Introduction to Hadoop, Hive, and HBase

Introduction to Hadoop, Hive, and HBase

Objective

Table of Contents

1. Introduction to Hadoop, Hive, and HBase

2. Setting Up Hadoop, Hive, and HBase on macOS Sierra

Prerequisites

Installing Hadoop

Installing Hive

Installing HBase

3. Proposed Architecture Framework

4. Sample Project: Log Analysis with Hadoop, Hive, and HBase

Step 1: Set Up Data Ingestion with Hadoop

Step 2: Process Data with Hive

Step 3: Visualize Data

5. Data Flow Architecture Diagram

6. Project Folder Structure

7. Next Steps in Learning Hadoop, Hive, and HBase

Conclusion