Big Data, Enterprise Application Integration, Integration, Data Integration, Data Virtualization

PRESTO / Trino Basics

This entry is part 1 of 5 in the series TRINO Series

Introduction: My Journey into Presto

My interest in Presto was sparked in early 2021 after an enriching conversation with Brian Luisi, PreSales Manager at Starburst. His insights into distributed SQL query engines opened my eyes to the unique capabilities and performance advantages of Presto. Eager to dive deeper, I joined the Presto community on Slack to keep up with developments and collaborate with like-minded professionals. This blog series is an extension of that journey, aiming to demystify Presto and share my learnings with others curious about distributed analytics solutions.

What is PRESTO

Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, PostgresSQL, Cassandra, Kafka, MongoDB and Teradata. One can even query data from multiple data sources within a single query. Wikipedia

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto

Often people are confused between PrestoDB and PrestoSQL. Follow the link to understand the difference between the two. In order to remove confusion in Dec 2020 PrestoSQL was rebranded as Trino.

Presto DB Presto SQL
Martin Traverso, David Phillips, Dain Sundstrom, and Eric Hwang created PrestoDB in 2012 while at Facebook. Martin, Dain, and David left Facebook to pursue building the Presto Open Source Community full-time, under the new name PrestoSQL in 2018
Presto project is supported by the Presto Foundation, an independent nonprofit organization with open and neutral governance, hosted under the Linux Foundation® The Trino project is community driven project under the non-profit Trino Software Foundation.
Community Contribution Comparison

While both Presto and Trino serve similar purposes, the TRINO community has emerged as particularly vibrant, attracting a diverse array of contributors and user scenarios. I’ve found the activity on the Trino community Slack invaluable for staying informed about the latest improvements and real-world applications.


Why Use PRESTO?

Presto offers a unique advantage by allowing you to query structured and unstructured data sources with high performance. This makes it an excellent choice for enterprises needing flexibility in handling mixed data formats while retaining SQL’s familiar syntax. Moreover, its distributed nature and scalability mean it can handle massive datasets efficiently, even with complex joins and aggregations.

Installation Process: Getting PrestoDB on Your Mac

One can follow the step by step installation process as described by the Presto 0.247 Documentation from here.

As you all are aware I use Homebrew for all my installation wherever possible. I will be doing the same for this one as well.

brew install presto

If presto db is already installed. You will can either reinstall it by

brew reinstall prestodb

or upgrade it by

brew upgrade prestodb

Installation log:

==> Downloading https://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-cli/0.247/presto-cli-0.247-executable.jar
==> Downloading from https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.247/presto-cli-0.247-executable.jar
######################################################################## 100.0%
==> Downloading https://search.maven.org/remotecontent?filepath=com/facebook/presto/presto-server/0.247/presto-server-0.247.tar.gz
==> Downloading from https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.247/presto-server-0.247.tar.gz
######################################################################## 100.0%
==> Reinstalling prestodb 
==> Caveats
Add connectors to /usr/local/opt/prestodb/libexec/etc/catalog/. See:
https://prestodb.io/docs/current/connector.html

To have launchd start prestodb now and restart at login:
  brew services start prestodb
Or, if you don't want/need a background service you can just run:
  presto-server run
==> Summary
?  /usr/local/Cellar/prestodb/0.247: 1,537 files, 940.2MB, built in 32 seconds

As part of the installation process. The following files are created at the

/usr/local/opt/prestodb/libexec/etc 

folder of the home directory.


kinshukdutta@Kinshuks-MacBook-Pro-15 ~ % cd /usr/local/opt/prestodb/libexec/etc
kinshukdutta@Kinshuks-MacBook-Pro-15 etc % ls -l
total 32
drwxr-xr-x  3 kinshukdutta  admin   96 Feb 16 21:38 catalog
-rw-r--r--  1 kinshukdutta  admin  201 Feb 16 21:38 config.properties
-rw-r--r--  1 kinshukdutta  admin  205 Feb 16 21:38 jvm.config
-rw-r--r--  1 kinshukdutta  admin   24 Feb 16 21:38 log.properties
-rw-r--r--  1 kinshukdutta  admin  114 Feb 16 21:38 node.properties

It also creates a sample JMX connector under the etc/catalog folder.

kinshukdutta@Kinshuks-MacBook-Pro-15 catalog % ls -l
total 8
-rw-r--r--  1 kinshukdutta  admin  18 Feb 16 21:38 jmx.properties

Next, add a connector. Here’s the list of available ones.

For PostgreSQL, create 

/usr/local/Cellar/prestodb/0.247/libexec/etc/postgresql.properties with:

connector.name=postgresql
connection-url=jdbc:postgresql://localhost:5432/externalDatabase
connection-user=postgres
connection-password=postgres

Starting PrestoDB

brew services start prestodb
==> Successfully started `prestodb` (label: homebrew.mxcl.prestodb)

Troubleshoot

Console log

JMX agent started and listening on Kinshuks-MacBook-Pro-15.local:64827
2021-02-17T11:53:37.234-0500	ERROR	main	com.facebook.presto.server.PrestoServer	Unable to create injector, see the following errors:

1) Error in custom provider, java.lang.NullPointerException: Cannot invoke "com.facebook.airlift.discovery.client.AnnouncementHttpServerInfo.getHttpUri()" because "this.httpServerInfo" is null
  at com.facebook.airlift.discovery.client.DiscoveryBinder.bindServiceAnnouncement(DiscoveryBinder.java:79)
  while locating com.facebook.airlift.discovery.client.ServiceAnnouncement annotated with @com.google.inject.internal.Element(setName=,uniqueId=365, type=MULTIBINDER, keyType=)
  while locating java.util.Set<com.facebook.airlift.discovery.client.ServiceAnnouncement>
    for the 2nd parameter of com.facebook.airlift.discovery.client.Announcer.<init>(Announcer.java:68)
  at com.facebook.airlift.discovery.client.DiscoveryModule.configure(DiscoveryModule.java:64)
  while locating com.facebook.airlift.discovery.client.Announcer
    for the 2nd parameter of com.facebook.airlift.discovery.client.DiscoveryModule.createMergingServiceSelectorFactory(DiscoveryModule.java:122)
  at com.facebook.airlift.discovery.client.DiscoveryModule.createMergingServiceSelectorFactory(DiscoveryModule.java:122)
  while locating com.facebook.airlift.discovery.client.MergingServiceSelectorFactory
  at com.facebook.airlift.discovery.client.DiscoveryModule.configure(DiscoveryModule.java:73)
  while locating com.facebook.airlift.discovery.client.ServiceSelectorFactory
    for the 1st parameter of com.facebook.airlift.discovery.client.ServiceSelectorProvider.setServiceSelectorFactory(ServiceSelectorProvider.java:50)
  at com.facebook.airlift.event.client.HttpEventModule.configure(HttpEventModule.java:43)
Caused by: java.lang.NullPointerException: Cannot invoke "com.facebook.airlift.discovery.client.AnnouncementHttpServerInfo.getHttpUri()" because "this.httpServerInfo" is null
	at com.facebook.airlift.discovery.client.DiscoveryBinder$HttpAnnouncementProvider.get(DiscoveryBinder.java:128)
	at com.facebook.airlift.discovery.client.DiscoveryBinder$HttpAnnouncementProvider.get(DiscoveryBinder.java:108)

Followed the solution as per the link and updated the postgresql.properties by removing the <databaseName>

Restarted the server. But this time, I was facing a different error. In terminal

Caused by: java.net.BindException: Address already in use
	at java.base/sun.nio.ch.Net.bind0(Native Method)
	at java.base/sun.nio.ch.Net.bind(Net.java:550)
	at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:249)
	at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:88)
	at com.facebook.airlift.http.server.HttpServerInfo.createChannel(HttpServerInfo.java:160)
	... 43 more

By default, most Java web application servers are listening on port 8080, and it can easily cause the popular 8080 port conflict error. In Mac OSX, you can use sudo lsof -i :8080 | grep LISTEN to find out what program is listening on port 8080 :

In terminal

lsof -i :8080 | grep LISTEN
java    46655 kinshukdutta   45u  IPv6 0x451da3ea396614ab      0t0  TCP *:http-alt (LISTEN)

The name “java” doesn’t tell you anything, to get the detail, ps the java PID 46655 like this :

% ps -ef 46655
  UID   PID  PPID   C STIME   TTY           TIME CMD
  501 46655     1   0 11:53AM ??         0:05.14 /usr/local/opt/openjdk/bin/java -Dzookeeper.log.dir=/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../logs -Dzookeeper.log.file=zookeeper-kinshukdutta-server-Kinshuks-MacBook-Pro-15.local.log -Dzookeeper.root.logger=INFO,CONSOLE -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p -cp /usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../zookeeper-server/target/classes:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../build/classes:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../zookeeper-server/target/lib/*.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../build/lib/*.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/zookeeper-prometheus-metrics-3.6.2.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/zookeeper-jute-3.6.2.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/zookeeper-3.6.2.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/snappy-java-1.1.7.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/slf4j-log4j12-1.7.25.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/slf4j-api-1.7.25.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/simpleclient_servlet-0.6.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/simpleclient_hotspot-0.6.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/simpleclient_common-0.6.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/simpleclient-0.6.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-transport-native-unix-common-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-transport-native-epoll-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-transport-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-resolver-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-handler-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-common-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-codec-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/netty-buffer-4.1.50.Final.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/metrics-core-3.2.5.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/log4j-1.2.17.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/json-simple-1.1.1.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jline-2.14.6.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-util-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-servlet-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-server-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-security-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-io-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jetty-http-9.4.24.v20191120.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/javax.servlet-api-3.1.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jackson-databind-2.10.3.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jackson-core-2.10.3.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/jackson-annotations-2.10.3.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/commons-lang-2.6.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/commons-cli-1.2.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../lib/audience-annotations-0.5.0.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../zookeeper-*.jar:/usr/local/Cellar/zookeeper/3.6.2_1/libexec/bin/../zookeeper-server/src/main/resources/lib/*.jar:/usr/local/etc/zookeeper: -Xmx1000m -Dapple.awt.UIElement=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/local/etc/zookeeper/zoo.cfg

The Zookeeper is using the port 8080. I stopped both KAFKA and zookeeper.

brew services stop kafka
Stopping `kafka`... (might take a while)
==> Successfully stopped `kafka` (label: homebrew.mxcl.kafka)
kinshukdutta@Kinshuks-MacBook-Pro-15 catalog % brew services stop zookeeper
Stopping `zookeeper`... (might take a while)
==> Successfully stopped `zookeeper` (label: homebrew.mxcl.zookeeper)

Restarted PrestoDB

presto-server run           

This time got the success message in terminal

2021-02-23T12:20:18.426-0500	INFO	main	com.facebook.presto.server.PluginManager	Registering connector tpch
2021-02-23T12:20:18.429-0500	INFO	main	com.facebook.presto.server.PluginManager	-- Finished loading plugin /usr/local/var/presto/data/plugin/tpch --
2021-02-23T12:20:18.450-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Loading catalog properties etc/catalog/postgresql.properties --
2021-02-23T12:20:18.452-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Loading catalog postgresql --
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	PROPERTY                                  DEFAULT     RUNTIME                           DESCRIPTION
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	case-insensitive-name-matching            false       false
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	case-insensitive-name-matching.cache-ttl  1.00m       1.00m
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	connection-password                       [REDACTED]  [REDACTED]
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	connection-url                            ----        jdbc:postgresql://localhost:5432
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	connection-user                           ----        postgres
2021-02-23T12:20:19.135-0500	INFO	main	Bootstrap	allow-drop-table                          false       false                             Allow connector to drop tables
2021-02-23T12:20:19.475-0500	INFO	main	com.facebook.airlift.bootstrap.LifeCycleManager	Life cycle starting...
2021-02-23T12:20:19.475-0500	INFO	main	com.facebook.airlift.bootstrap.LifeCycleManager	Life cycle startup complete. System ready.
2021-02-23T12:20:19.477-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Added catalog postgresql using connector postgresql --
2021-02-23T12:20:19.477-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Loading catalog properties etc/catalog/jmx.properties --
2021-02-23T12:20:19.478-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Loading catalog jmx --
2021-02-23T12:20:20.041-0500	INFO	main	Bootstrap	PROPERTY         DEFAULT  RUNTIME  DESCRIPTION
2021-02-23T12:20:20.041-0500	INFO	main	Bootstrap	jmx.dump-period  10.00s   10.00s
2021-02-23T12:20:20.041-0500	INFO	main	Bootstrap	jmx.dump-tables  []       []
2021-02-23T12:20:20.041-0500	INFO	main	Bootstrap	jmx.max-entries  86400    86400
2021-02-23T12:20:20.371-0500	INFO	main	com.facebook.presto.metadata.StaticCatalogStore	-- Added catalog jmx using connector jmx --
2021-02-23T12:20:20.393-0500	INFO	main	com.facebook.presto.security.AccessControlManager	-- Loading system access control --
2021-02-23T12:20:20.395-0500	INFO	main	com.facebook.presto.security.AccessControlManager	-- Loaded system access control allow-all --
2021-02-23T12:20:20.395-0500	INFO	main	com.facebook.presto.storage.TempStorageManager	-- Loading temp storage --
2021-02-23T12:20:20.399-0500	INFO	main	com.facebook.presto.storage.TempStorageManager	-- Loaded temp storage local --
2021-02-23T12:20:20.434-0500	INFO	main	com.facebook.presto.server.PrestoServer	======== SERVER STARTED ========

Accessing the Web UI:

After starting presto server, you can access the web UI using this link

Cluster Overview

Command-line interface CLI

The Presto CLI provides a terminal-based interactive shell for running queries. The CLI is a self-executing JAR file, which means it acts like a normal UNIX executable.

We can use the CLI to connect to the presto server.

cd /usr/local/Cellar/prestodb/0.247/libexec
kinshukdutta@Kinshuks-MacBook-Pro-15 libexec % ls -l
total 29960
drwxr-xr-x    6 kinshukdutta  admin       192 Feb  9 16:18 bin
drwxr-xr-x    7 kinshukdutta  admin       224 Feb 23 09:28 etc
drwxr-xr-x  169 kinshukdutta  admin      5408 Feb 23 09:27 lib
drwxr-xr-x   33 kinshukdutta  admin      1056 Feb 23 09:28 plugin
-rw-r--r--    1 kinshukdutta  staff  15335794 Feb  9 16:26 presto-cli-0.247-executable.jar

As a best practice. Take a backup of the original jar. Then make it executable with chmod +x, and run it.

On terminal

ls -l
total 59920
drwxr-xr-x    6 kinshukdutta  admin       192 Feb  9 16:18 bin
drwxr-xr-x    7 kinshukdutta  admin       224 Feb 23 09:28 etc
drwxr-xr-x  169 kinshukdutta  admin      5408 Feb 23 09:27 lib
drwxr-xr-x   33 kinshukdutta  admin      1056 Feb 23 09:28 plugin
-rw-r--r--    1 kinshukdutta  admin  15335794 Feb 23 14:23 presto-cli-0.247-executable-ori.jar
-rwxr-xr-x    1 kinshukdutta  staff  15335794 Feb  9 16:26 presto-cli-0.247-executable.jar

execute the CLI.

./presto-cli-0.247-executable.jar --server localhost:8080 --catalog postgresql --schema public
presto:public> 

Basic Operations

Use Case 1: Transferring selective records from one table to another.

Create a Table


presto:public> CREATE TABLE orders ( orderkey bigint, orderstatus varchar, totalprice double, orderdate date ) ;
CREATE TABLE
presto:public> 
Table Created in PostgresSQL

Create another table for active orders.

presto:public> CREATE TABLE activeorders ( orderkey bigint, orderstatus varchar, totalprice double, orderdate date );
CREATE TABLE

Using pgAdmin4 inserted data to the order table. Some orders are active, some are inactive.

The goal is to now fetch the active orders and push them into active orders table.

presto:public> INSERT INTO activeorders SELECT * FROM orders WHERE orders.orderstatus='active';
INSERT: 25 rows

Query 20210223_224621_00018_g3pg4, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:01 [25 rows, 0B] [22 rows/s, 0B/s]

presto:public>

For more specific guidance, check out Presto’s troubleshooting documentation.


Interactive Queries and Sample Use Case

One of the most compelling aspects of Presto is its ability to seamlessly join data from different sources. Here’s a simple use case to illustrate how you might use Presto to transfer selective records between tables in a PostgreSQL database:

Example: Transfer Active Orders to a New Table

sql
CREATE TABLE activeorders AS SELECT * FROM orders WHERE orderstatus = 'active';

The ease with which Presto allows users to move and analyze data makes it a powerful tool for interactive analytics.


Expanding on Community Resources

To deepen your knowledge and stay connected with others in the Presto ecosystem, I recommend joining the Prestocommunity on Slack. Here, you’ll find active discussions, support from fellow users, and updates on new developments.

As I continue this series, I aim to cover more advanced topics, including optimizing queries for performance and integrating Presto with other tools like Kafka for real-time analytics. By sharing these insights, I hope to make Presto more accessible and useful for both newcomers and seasoned data engineers.


Conclusion

Presto is a versatile and high-performance choice for big data analytics, especially in environments where querying multiple data sources in real-time is crucial. This series will continue to explore advanced functionalities and practical use cases to help you harness the full potential of Presto.

Stay tuned for the next post, where I’ll delve into query optimizations and performance tuning strategies. Join me on this journey as we uncover the power of distributed SQL with Presto!

 


Note on Terminology: Presto and Trino

In this series, you may see us use “Presto” and “Trino” interchangeably. This is because, in December 2020, PrestoSQL—the version most commonly used by the open-source community—was officially rebranded as Trino. This rebranding clarified the distinction between PrestoDB, which is managed by the Presto Foundation and rooted in the original Facebook project, and Trino, the community-driven evolution of PrestoSQL, led by the same key developers who initially built Presto.

We’ll primarily refer to Trino going forward, as it reflects the ongoing work and advancements made by the broader open-source community. However, the fundamentals remain the same, and much of the knowledge applies to both PrestoDB and Trino. This approach allows us to keep the series relevant while acknowledging Trino as the most up-to-date and actively maintained branch of the technology.


Series NavigationTrino Series: Building a Sample Project on Local Installation >>