Data Storage - Data-Nizant

Data Storage - OLAP

Advanced Apache Pinot: Sample Project and Industry Use Cases

November 16, 2023 - By Kinshuk Dutta

As we dive deeper into Apache Pinot, this post will guide you through setting up a sample project. This hands-on project aims to demonstrate Pinot’s real-time data ingestion and query capabilities and provide insights into its application in industry scenarios. Whether you’re looking to power recommendation engines, enhance user analytics, or build custom BI dashboards, this blog will help you establish a foundation with Apache Pinot. Introduction to the Sample Project The sample project will simulate a real-time analytics dashboard for a social media application. We’ll analyze user interactions in near-real-time, covering a setup from data ingestion through to visualization.…

Continue Reading
Data Storage - OLAP

Mastering Apache Druid: Performance Tuning, Query Optimization, and Advanced Ingestion Techniques

November 16, 2023 - By Kinshuk Dutta

Introduction In this third part of our Apache Druid series, we’ll explore how to get the most out of Druid’s powerful real-time analytics capabilities. After setting up your Druid cluster and understanding industry use cases, it’s time to learn the nuances of performance tuning, query optimization, and advanced ingestion techniques to maximize efficiency. This post will cover optimization strategies, advanced query configurations, and data ingestion tips to enhance performance and responsiveness. We’ll also revisit our E-commerce Sales Analytics Dashboard sample project from the previous post, applying these techniques to build a more robust and responsive real-time analytics solution. 1. Performance…

Continue Reading
Data Storage - OLAP

Advanced Apache Druid: Sample Project, Industry Scenarios, and Real-Life Case Studies

October 26, 2023 - By Kinshuk Dutta

Introduction Following our initial blog on Apache Druid basics, this guide dives into more advanced configurations and demonstrates a sample project. Apache Druid’s speed and scalability make it a go-to choice for real-time analytics across many industries. This blog covers setting up an analytics dashboard for a sample project, showcases Druid’s use in industry, and provides case studies highlighting the business benefits of Druid. Sample Project: E-commerce Sales Analytics Dashboard In this project, we’ll set up an analytics dashboard for an e-commerce platform. The dashboard will use Apache Druid to track, analyze, and visualize sales, customer behavior, and product interactions…

Continue Reading
Data Storage - OLAP

Apache Druid Basics

October 14, 2023 - By Kinshuk Dutta

What is Apache Druid? Apache Druid is a high-performance, real-time analytics database designed for fast and interactive queries on large datasets. It is optimized for applications that require quick, ad-hoc queries on event-driven data, such as real-time reporting, monitoring, and dashboarding. Key Features of Apache Druid Real-time Data Ingestion: Druid allows for continuous ingestion of data from various sources (e.g., Kafka, Kinesis, Hadoop) and can perform analytics in real-time as new data arrives. High Query Performance: Druid is designed to deliver sub-second query performance by combining a columnar storage format with distributed, massively parallel processing, making it ideal for high-performance,…

Continue Reading
Big Data - OTF

Iceberg Basics

February 25, 2021 - By Kinshuk Dutta

In my recent post I tried explaining how different data collection mechanisms are available and how due to modern day requirement, modern data lakes were formed. Iceberg is one such solution that came out really strong. What is Apache ICEBERG? Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Trino and Spark that use a high-performance format that works just like a SQL table. With special emphasis on User Experience Reliability and Performance & Open Standards What makes it special is its unique table design for big data. This is explained brilliantly and covered well…

Continue Reading
Big Data - Data Storage - OTF - Data Lake

Modern Data Lake

February 16, 2021 - By Kinshuk Dutta

Data Lake The modern enterprise runs on data. However storing the same has always been challenging, expensive and it results in data silos. A data lake consists of a cost-effective and scalable storage system along with one or more compute engines. Data Lakes are consolidated, centralized storage areas for raw, unstructured, semi-structured, and structured data, taken from multiple sources and lacking a predefined schema. Data Lakes have been created to save data that “may have value.” It supports a broad range of essential functions from traditional decision support to business analytics to data science. The value of data and the insights…

Continue Reading
Data Storage - Data Lake

Data Lake vs. Data Lakehouse: Evolution and Key Differences

February 16, 2021 - By Kinshuk Dutta

In recent years, data storage has undergone significant transformation. While data lakes have become central to modern data architecture, a new contender has emerged: the data lakehouse. With its blend of traditional data lake flexibility and data warehouse reliability, the lakehouse model aims to address some of the challenges that data lakes face today, including data integrity and workload diversity. This blog explores the evolution from data lakes to data lakehouses and highlights key differences that are redefining how organizations manage their data. The Role of Data Lakes in Modern Data Management A data lake is a centralized repository designed…

Continue Reading
NOSQL

Introduction to NoSQL | Mongo DB

July 20, 2019 - By Kinshuk Dutta

Table of Contents Introduction to NoSQL A Brief History of NoSQL MongoDB Install MongoDB on Mac Sample Project: Real-Time Data Storage with MongoDB Project Structure CRUD Operations Testing and Validation Conclusion and Next Steps Introduction to NoSQL A Brief History of NoSQL The journey of NoSQL databases began over several decades, with origins in hierarchical and file-based databases before becoming what we know today as “NoSQL.” NoSQL was first coined in 1998 by Carlo Strozzi to name a file-based database he developed. Ironically, this NoSQL database was actually relational but didn’t use SQL as its interface. Later, in 2009, the…

Continue Reading
NOSQL

Mastering MongoDB Realm

April 30, 2011 - By Kinshuk Dutta

Mastering MongoDB Realm: Advanced Features, Third-Party Integrations, and Custom UI April 30, 2011 by Kinshuk Dutta As we conclude our MongoDB series, this final installment dives deep into MongoDB Realm’s advanced features. We’ll explore integrating with third-party APIs, building custom UI components, managing granular permissions, and even setting up app-wide workflows. MongoDB Realm’s flexibility and extensive toolkit allow developers to build complex applications with robust backend features—all without managing a dedicated server. Table of Contents Introduction to Advanced MongoDB Realm Features Integrating with Third-Party APIs Custom UI Components with MongoDB Realm Granular Permissions and Role-Based Access Control App-Wide Workflows and…

Continue Reading
NOSQL

Exploring MongoDB Realm

April 15, 2011 - By Kinshuk Dutta

Exploring MongoDB Realm: Real-Time Sync, Serverless Applications, and Custom Functions April 15, 2011 by Kinshuk Dutta MongoDB Realm, an extension of MongoDB Atlas, provides powerful tools for building mobile and web applications with real-time synchronization, serverless functions, and a rich set of services for managing complex data workflows. Whether you’re building a mobile app that requires real-time data sync or a serverless application that responds to data changes instantly, MongoDB Realm simplifies development by handling much of the infrastructure complexity. In this blog, we’ll dive into the key features of MongoDB Realm and walk through building a sample application with…

Continue Reading