Big Data, OTF

Iceberg Basics

February 25, 2021November 4, 2024 by Kinshuk Dutta

In my recent post I tried explaining how different data collection mechanisms are available and how due to modern day requirement, modern data lakes were formed. Iceberg is one such solution that came out really strong.

What is Apache ICEBERG?

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table. With special emphasis on

User Experience
Reliability and Performance &
Open Standards

What makes it special is its unique table design for big data.

This is explained brilliantly and covered well in the recent blog titled “Apache Iceberg: A Different Table Design for Big Data” by Susan Hall. It explains why Apache Iceberg emerged from the Apache Software Foundation‘s incubator as a top-level project last May (2020).

Apache Iceberg is still evolving but this is how it fits the current comparison matrix with similar solutions.

This was published by Qubole

Installation

Install on MacOS using Homebrew

brew install --cask iceberg
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
rdkit
==> Updated Formulae
Updated 18 formulae.
==> Updated Casks
Updated 13 casks.

==> Downloading http://s.sudre.free.fr/Software/files/Iceberg.dmg
######################################################################## 100.0%
Warning: No checksum defined for cask 'iceberg', skipping verification.
==> Installing Cask iceberg
==> Running installer for iceberg; your password may be necessary.
Package installers may write to any location; options such as `--appdir` are ignored.
Password:
installer: Package name is Iceberg
installer: Installing at base path /
installer: The install was successful.
installer: The install requires restarting now.
?  iceberg was successfully installed!