Big Data

Pinot™ Basics

This entry is part 6 of 6 in the series Pinot Series

Weekend started, pored myself a glass of Long Meadow Ranch Anderson Valley Pinot Noir. It smelled like cherry cola, cinnamon, and a forest in autumn. Probably not the right time to think or even blog about OLAP.

– Kinshuk Dutta
Online analytical processing, or OLAP

Is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

The most popular Open Source OLAP Systems for Big Data that runs analytical queries over big volumes (terabytes-scale) of data with interactive latencies are:

In this blog we will try to cover the basics for Pinot. Pinot supports full SQL. My main interest is to try out the Presto-Pinot connector developed by Uber Engineering for users to perform joins on data in Pinot. And ultimately would like to create a dashboard using Superset as front end (on Pinot as backend).

Apache Pinot™ (Incubating)

Pinot was first developed by LinkedIn in 2014 as an internal analytics infrastructure. It originated from the demands to scale out OLAP systems to support low-latency real-time queries on huge volume data. It was later open-sourced in 2015 and entered ApacheIncubator in 2018

At the time of writing this Blog Pinot™ is still in incubating stage under the Apache projects. It is a realtime distributed OLAP datastore, designed to answer OLAP queries with low latency. Which is proven at scale in LinkedIn powers 50+ user-facing apps and serving 100k+ queries

Ingest and Query Options

If you are interested in knowing more about the Pinot story. I highly recommend reading the Introducing Apache Pinot 0.3.0 blog published on April 2020 by Mayank Srivastava from LinkedIn.

Installation

There are more than 1 way to run Pinot on your Mac. You can chose from either of the following:

  • Running Pinot locally: Download from latest apache release and Install Locally
  • Build From Source: Download from Source Build and Run
  • Docker

 

Running Pinot locally

This quick start guide will help you bootstrap a Pinot standalone instance on your Mac.

Download Apache Pinot

curl -o ./apache-pinot-incubating-0.6.0-bin.tar.gz https://mirror.olnevhost.net/pub/apache/incubator/pinot/apache-pinot-incubating-0.6.0/apache-pinot-incubating-0.6.0-bin.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  254M  100  254M    0     0   9.8M      0  0:00:25  0:00:25 --:--:-- 11.6M

Once you have the tar file, untarnished it

ls -l
total 524768
-rw-r--r--  1 kinshukdutta  staff  267003296 Feb 27 13:05 apache-pinot-incubating-0.6.0-bin.tar.gz
kinshukdutta@Kinshuks-MacBook-Pro-15 PINOT % tar -zxvf apache-pinot-incubating-0.6.0-bin.tar.gz 

navigate to directory containing the launcher scripts

ls -l
total 524768
drwxr-xr-x  11 kinshukdutta  staff        352 Feb 27 13:06 apache-pinot-incubating-0.6.0-bin
-rw-r--r--   1 kinshukdutta  staff  267003296 Feb 27 13:05 apache-pinot-incubating-0.6.0-bin.tar.gz
kinshukdutta@Kinshuks-MacBook-Pro-15 PINOT % cd apache-pinot-incubating-0.6.0-bin
kinshukdutta@Kinshuks-MacBook-Pro-15 apache-pinot-incubating-0.6.0-bin % ls -l
total 112
-rw-r--r--   1 kinshukdutta  staff    551 Oct 21 01:41 DISCLAIMER
-rw-r--r--   1 kinshukdutta  staff  22448 Nov  2 12:41 LICENSE
-rw-r--r--   1 kinshukdutta  staff  27682 Nov  2 12:41 NOTICE
drwxr-xr-x  19 kinshukdutta  staff    608 Nov  5 16:42 bin
drwxr-xr-x  18 kinshukdutta  staff    576 Oct 21 01:41 conf
drwxr-xr-x   4 kinshukdutta  staff    128 Feb 27 13:06 examples
drwxr-xr-x   3 kinshukdutta  staff     96 Feb 27 13:06 lib
drwxr-xr-x  37 kinshukdutta  staff   1184 Oct 21 01:41 licenses
drwxr-xr-x   6 kinshukdutta  staff    192 Feb 27 13:06 plugins

Build From Source

This one did not work for me at the moment. Ever since I updated to macOS Big Sur. At the end of the section I have the errors. I will be fixing them and will update the post. Click here to skip this and go to the demo section.

Clone a repo

git clone [email protected]:apache/incubator-pinot.git     

Cloning into 'incubator-pinot'...
Warning: Permanently added the RSA host key for IP address '140.82.114.4' to the list of known hosts.
remote: Enumerating objects: 849, done.
remote: Counting objects: 100% (849/849), done.
remote: Compressing objects: 100% (442/442), done.
remote: Total 239682 (delta 200), reused 553 (delta 103), pack-reused 238833
Receiving objects: 100% (239682/239682), 213.93 MiB | 8.95 MiB/s, done.
Resolving deltas: 100% (113197/113197), done.
Updating files: 100% (3302/3302), done.

Change working directory to the downloaded repo

cd incubator-pinot 
kinshukdutta@Kinshuks-MacBook-Pro-15 incubator-pinot % 

Build Pinot

mvn clean install -DskipTests -Pbin-dist

[INFO] Scanning for projects...
Downloading from central: https://repo.maven.apache.org/maven2/kr/motd/maven/os-maven-plugin/1.6.2/os-maven-plugin-1.6.2.pom
...
...
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-dependency-convergence) @ pinot ---
[INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce (enforce-maven-version) @ pinot ---
[INFO] --- jacoco-maven-plugin:0.7.7.201606060606:prepare-agent (default) @ pinot ---
[INFO] argLine set to -javaagent:/Users/kinshukdutta/.m2/repository/org/jacoco/org.jacoco.agent/0.7.7.201606060606/org.jacoco.agent-0.7.7.201606060606-runtime.jar=destfile=/Users/kinshukdutta/incubator-pinot/target/jacoco.exec -Xms4g -Xmx4g
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.6.0:process (process-resource-bundles) @ pinot ---
...
...
[INFO] Enabled default license matchers.
[INFO] Will parse SCM ignores for exclusions...
[INFO] Parsing exclusions from /Users/kinshukdutta/incubator-pinot/.gitignore
[INFO] Finished adding exclusions from SCM ignore files.
[INFO] 117 implicit excludes (use -debug for more details).
[INFO] 39 explicit excludes (use -debug for more details).
[INFO] 92 resources included (use -debug for more details)
[INFO] Rat check: Summary over all files. Unapproved: 0, unknown: 0, generated: 0, approved: 47 licenses.
[INFO] 
[INFO] --- maven-install-plugin:2.5.2:install (default-install) @ pinot ---
[INFO] Installing /Users/kinshukdutta/incubator-pinot/pom.xml to /Users/kinshukdutta/.m2/repository/org/apache/pinot/pinot/0.7.0-SNAPSHOT/pinot-0.7.0-SNAPSHOT.pom
[INFO] 
[INFO] ---------------------< org.apache.pinot:pinot-spi >---------------------
[INFO] Building Pinot Service Provider Interface 0.7.0-SNAPSHOT          [2/46]
[INFO] --------------------------------[ jar ]---------------------------------

Issues & Resolution

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  02:40 h
[INFO] Finished at: 2021-02-26T23:06:06-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project pinot-common: Execution default of goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile failed: Plugin org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1 or one of its dependencies could not be resolved: Failed to collect dependencies at org.xolstice.maven.plugins:protobuf-maven-plugin:jar:0.6.1 -> org.apache.maven.plugin-tools:maven-plugin-annotations:jar:3.5.2: Failed to read artifact descriptor for org.apache.maven.plugin-tools:maven-plugin-annotations:jar:3.5.2: Could not transfer artifact org.apache.maven.plugin-tools:maven-plugin-annotations:pom:3.5.2 from/to central (https://repo.maven.apache.org/maven2): Transfer failed for https://repo.maven.apache.org/maven2/org/apache/maven/plugin-tools/maven-plugin-annotations/3.5.2/maven-plugin-annotations-3.5.2.pom: Unknown host repo.maven.apache.org -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :pinot-common

Series Navigation<< Advanced Apache Pinot: Sample Project and Industry Use Cases