• AI, ML & Data Science - Big Data - iPaaS - Analytics & Reporting

    Big Data in 2024: From Hype to AI Powerhouse—What’s the Real Story?

    Introduction: A Decade of Big Data Blogging When I began writing about Big Data in 2013, it was an exciting new frontier in data management and analytics. My first blog, What’s So BIG About Big Data, introduced the core pillars of Big Data—the “4 Vs”: Volume, Velocity, Variety, and Veracity. As the years passed, I expanded into related topics with posts like Introduction to Hadoop, Hive, and HBase, Data Fabric and Data Mesh, and Introduction to Data Science with R & Python. Each blog marked the evolution of Big Data and reflected the shifting focus in the field as data…

  • Big Data - Analytics & Reporting - AI, ML & Data Science

    Demystifying the World of AI, ML, and Data Science: A New Structured Learning Journey

    Welcome to an exciting new chapter in exploring the world of AI, Machine Learning (ML), and Data Science! Over the years, I have posted on a variety of topics, covering everything from Python basics to the intricacies of neural networks. But now, it’s time for something bigger—a cohesive, structured series that will demystify these domains, guiding you step-by-step from foundational concepts to advanced applications. In this revamped series, I will reorganize my previously published blogs, presenting them in a logical progression so you can easily follow along, regardless of your current experience level. Alongside these, I’ll also introduce new posts…

  • Big Data

    SOLR Search – COOK BOOK

    Solr is the popular, blazing-fast, open-source enterprise search platform built on Apache Lucene™. Here is a example of how Solr might be integrated into an application This blog has a curated list of SOLR packages and resources. It starts with how to install and then show some basic implementation and usage. Installing Solr  Typically in order to install on my Mac, I always use Homebrew first update your brew:  brew update     Updated Homebrew from 37714b5ce to 373a454ac. then install solr: brew install solr  However this time I am going to show step by step installation on mac as explained in…

  • Big Data

    ELASTIC Search – COOKBOOK

    Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License),[2] while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene. Original author Shay Banon talking about Elasticsearch at Berlin Buzzwords 2010 Initial release 8 February 2010 Written in Java License Various (open-core model), e.g. Apache License 2.0(partially; open source), Elastic License (proprietary; source-available) Website…

  • Big Data

    Pinot™ Basics

    Weekend started, pored myself a glass of Long Meadow Ranch Anderson Valley Pinot Noir. It smelled like cherry cola, cinnamon, and a forest in autumn. Probably not the right time to think or even blog about OLAP. – Kinshuk Dutta Online analytical processing, or OLAP Is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar…

  • Data Storage - Data Lake

    Data Lake vs. Data Lakehouse: Evolution and Key Differences

    In recent years, data storage has undergone significant transformation. While data lakes have become central to modern data architecture, a new contender has emerged: the data lakehouse. With its blend of traditional data lake flexibility and data warehouse reliability, the lakehouse model aims to address some of the challenges that data lakes face today, including data integrity and workload diversity. This blog explores the evolution from data lakes to data lakehouses and highlights key differences that are redefining how organizations manage their data. The Role of Data Lakes in Modern Data Management A data lake is a centralized repository designed…

  • Big Data

    Big Data Search

    In order to understand the criticality of Big Data Search, we need to understand the enormity of data. A terabyte is just over 1,000 gigabytes and is a label most of us are familiar with from our home computers. Scaling up from there, a petabyte is just over 1,000 terabytes. That may be far beyond the kind of data storage the average person needs, but the industry has been dealing with data in these sorts of quantities for quite some time. In fact, way back in 2008, Google was said to process around 20 petabytes of data a day (Google doesn’t release information on how much data it processes today). To put…

  • SCALA - AI, ML & Data Science

    Scala Basics

    Originally posted October 2, 2018 by Kinshuk Dutta Table of Contents What is Scala? Comparison Between Scala and Java Installing Scala on macOS Setting Up Your Development Environment Scala Basics with REPL Data Types, Variables, and Immutability Next Steps in Scala Learning What is Scala? Scala is a general-purpose programming language that blends object-oriented and functional programming, providing powerful support for concurrency and a strong static type system. It’s designed to be concise and expressive, particularly in comparison to Java. Scala’s compatibility with Java makes it a popular choice in Big Data applications, notably with frameworks like Apache Spark. Comparison…

  • AI, ML & Data Science

    Spark Basics

    Spark Basics: A Complete Guide to Installing and Using Apache Spark on macOS Sierra Apache Spark is a powerful open-source tool designed for large-scale data processing, analytics, and machine learning. This guide walks you through installing Apache Spark on macOS Sierra, explains its core components, and provides practical project examples and real-life scenarios to help you understand its capabilities. Table of Contents Introduction to Apache Spark Setting Up Apache Spark on macOS Sierra Understanding Core Components of Apache Spark Getting Started with Spark Shell Sample Projects and Real-Life Scenarios Using Spark Real-Life Applications of Apache Spark Next Steps in Learning…