Iceberg Basics
In my recent post I tried explaining how different data collection mechanisms are available and how due to modern day requirement, modern data lakes were formed. Iceberg is one such solution that came out really strong.
What is Apache ICEBERG?
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table. With special emphasis on
- User Experience
- Reliability and Performance &
- Open Standards
What makes it special is its unique table design for big data.
This is explained brilliantly and covered well in the recent blog titled “Apache Iceberg: A Different Table Design for Big Data” by Susan Hall. It explains why Apache Iceberg emerged from the Apache Software Foundation‘s incubator as a top-level project last May (2020).
Apache Iceberg is still evolving but this is how it fits the current comparison matrix with similar solutions.
Installation
Install on MacOS using Homebrew
brew install --cask iceberg
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 2 taps (homebrew/core and homebrew/cask).
==> New Formulae
rdkit
==> Updated Formulae
Updated 18 formulae.
==> Updated Casks
Updated 13 casks.
==> Downloading http://s.sudre.free.fr/Software/files/Iceberg.dmg
######################################################################## 100.0%
Warning: No checksum defined for cask 'iceberg', skipping verification.
==> Installing Cask iceberg
==> Running installer for iceberg; your password may be necessary.
Package installers may write to any location; options such as `--appdir` are ignored.
Password:
installer: Package name is Iceberg
installer: Installing at base path /
installer: The install was successful.
installer: The install requires restarting now.
? iceberg was successfully installed!