• Big Data

    SOLR Search – COOK BOOK

    Solr is the popular, blazing-fast, open-source enterprise search platform built on Apache Lucene™. Here is a example of how Solr might be integrated into an application This blog has a curated list of SOLR packages and resources. It starts with how to install and then show some basic implementation and usage. Installing Solr  Typically in order to install on my Mac, I always use Homebrew first update your brew:  brew update     Updated Homebrew from 37714b5ce to 373a454ac. then install solr: brew install solr  However this time I am going to show step by step installation on mac as explained in…

  • Big Data

    ELASTIC Search – COOKBOOK

    Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License),[2] while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene. Original author Shay Banon talking about Elasticsearch at Berlin Buzzwords 2010 Initial release 8 February 2010 Written in Java License Various (open-core model), e.g. Apache License 2.0(partially; open source), Elastic License (proprietary; source-available) Website…

  • Big Data

    Big Data Search

    In order to understand the criticality of Big Data Search, we need to understand the enormity of data. A terabyte is just over 1,000 gigabytes and is a label most of us are familiar with from our home computers. Scaling up from there, a petabyte is just over 1,000 terabytes. That may be far beyond the kind of data storage the average person needs, but the industry has been dealing with data in these sorts of quantities for quite some time. In fact, way back in 2008, Google was said to process around 20 petabytes of data a day (Google doesn’t release information on how much data it processes today). To put…

  • SCALA - AI, ML & Data Science

    Scala Basics

    Originally posted October 2, 2018 by Kinshuk Dutta Table of Contents What is Scala? Comparison Between Scala and Java Installing Scala on macOS Setting Up Your Development Environment Scala Basics with REPL Data Types, Variables, and Immutability Next Steps in Scala Learning What is Scala? Scala is a general-purpose programming language that blends object-oriented and functional programming, providing powerful support for concurrency and a strong static type system. It’s designed to be concise and expressive, particularly in comparison to Java. Scala’s compatibility with Java makes it a popular choice in Big Data applications, notably with frameworks like Apache Spark. Comparison…

  • AI, ML & Data Science

    Spark Basics

    Spark Basics: A Complete Guide to Installing and Using Apache Spark on macOS Sierra Apache Spark is a powerful open-source tool designed for large-scale data processing, analytics, and machine learning. This guide walks you through installing Apache Spark on macOS Sierra, explains its core components, and provides practical project examples and real-life scenarios to help you understand its capabilities. Table of Contents Introduction to Apache Spark Setting Up Apache Spark on macOS Sierra Understanding Core Components of Apache Spark Getting Started with Spark Shell Sample Projects and Real-Life Scenarios Using Spark Real-Life Applications of Apache Spark Next Steps in Learning…