• Data Virtualization - Data Integration - Integration

    Trino Series: Building a Sample Project on Local Installation

    Why a Trino Series Instead of Presto? If you followed the initial post in this series, you may recall we discussed the history of Presto and its recent transformation into what is now known as Trino. Originally developed as Presto at Facebook, this powerful SQL query engine has seen an incredible journey. The transition to Trino represents the evolution of PrestoSQL into a more robust, community-driven platform focused on advanced distributed SQL features. The rebranding to Trino wasn’t merely a name change—it reflects a shift toward greater community collaboration, improved flexibility, and extended support for analytics across a wide variety…

  • Big Data

    SOLR Search – COOK BOOK

    Solr is the popular, blazing-fast, open-source enterprise search platform built on Apache Lucene™. Here is a example of how Solr might be integrated into an application This blog has a curated list of SOLR packages and resources. It starts with how to install and then show some basic implementation and usage. Installing Solr  Typically in order to install on my Mac, I always use Homebrew first update your brew:  brew update     Updated Homebrew from 37714b5ce to 373a454ac. then install solr: brew install solr  However this time I am going to show step by step installation on mac as explained in…

  • AI, ML & Data Science

    Introduction to Data Science with R & Python

    What is Data Science? Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. Data science is related to data mining, machine learning, and big data. Data science is a “concept to unify statistics, data analysis, and their related methods” to “understand and analyze actual phenomena” with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, domain knowledge, and information science. (Wikipedia: Data science) R or Python? Data Scientist R vs Python Why use R for Data…

  • DevOps and IT Operations

    Deploying Application using JENKINS

    Introduction The aim of this blog is to provide a guideline for to build sophisticated continuous integration and continuous delivery pipelines. The Continuous Integration will be performed by JENKINS and many of its plugins. Especially the pipeline plugins. Jenkins is the open source platform agnostic tool written in Java for implementing DevOps pipeline. It is supported by a vast and generous open source community who helps in upgrading the product frequently. How does Jenkins Work? Jenkins work by using various plugins for different activities. The below image provides a visual representation of plugins per activity in Jenkins. Installation In order…

  • Big Data

    ELASTIC Search – COOKBOOK

    Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License),[2] while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene. Original author Shay Banon talking about Elasticsearch at Berlin Buzzwords 2010 Initial release 8 February 2010 Written in Java License Various (open-core model), e.g. Apache License 2.0(partially; open source), Elastic License (proprietary; source-available) Website…

  • DevOps and IT Operations

    CICD Basics

    Why do we need CICD? Reduces code risk by integrating code from various sources at all phases of SDLC. Increases confidence among coders/developers Better quality of code Branching and shipping mechanism enables ready to ship code Code lineage and lifecycle management using systematic versioning Code quality and trend analysis Faster and consistent time to market Reduces cost CI: Continuous Integration The CI part of CICD can be summarized with: you want all parts of what goes into making your application go to the same place and run through the same processes with results published to an easy to access place.…

  • DevOps and IT Operations

    Dev-Ops Basics

    What Is DevOps? DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from Agile methodology. It involves communication and collaboration among all participants in the software development life cycle (SDLC). DevOps focuses on creating an ongoing feedback loop of analyzing, building, and testing while leveraging automation to speed the entire software delivery process. To achieve this kind of seamless and constant loop of software development and testing, you need to create cross-functional teams that can work together…

  • MDM

    MDM Solution for the Finance Industry

    Master Data Management (MDM) Solutions for the Finance Industry: A Pathway to Operational Excellence In today’s financial landscape, data has become the cornerstone of decision-making, compliance, and operational efficiency. For financial institutions, managing data effectively is not just a competitive advantage—it’s essential for survival in a highly regulated and data-driven environment. Master Data Management (MDM) offers a structured approach to consolidating and governing critical data, ensuring that it remains accurate, consistent, and readily available across the organization. MDM solutions allow financial institutions to build a single source of truth for their most important data assets, such as customer information, legal…