In my recent post I tried explaining how different data collection mechanisms are available and how due to modern day requirement, modern data lakes were formed. Iceberg is one such solution that came out really strong. What is Apache ICEBERG? Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table. With special emphasis on User Experience Reliability and Performance & Open Standards What makes it special is its unique table design for big data. This is explained brilliantly and covered well…
-
-
Data Lake The modern enterprise runs on data. However storing the same has always been challenging, expensive and it results in data silos. A data lake consists of a cost-effective and scalable storage system along with one or more compute engines. Data Lakes are consolidated, centralized storage areas for raw, unstructured, semi-structured, and structured data, taken from multiple sources and lacking a predefined schema. Data Lakes have been created to save data that “may have value.” It supports a broad range of essential functions from traditional decision support to business analytics to data science. The value of data and the insights…