Big Data

SOLR Search – COOK BOOK

Solr is the popular, blazing-fast, open-source enterprise search platform built on Apache Lucene™.

Here is a example of how Solr might be integrated into an application

This blog has a curated list of SOLR packages and resources. It starts with how to install and then show some basic implementation and usage.

Installing Solr

 Typically in order to install on my Mac, I always use Homebrew

first update your brew: 

brew update    
Updated Homebrew from 37714b5ce to 373a454ac.

then install solr:

brew install solr 

However this time I am going to show step by step installation on mac as explained in the Apache Solr Reference Guide:

Starting Solr

Once extracted, you are now ready to run Solr

bin/solr start

*** [WARN] *** Your open file limit is currently 2560.  
It should be set to 65000 to avoid operational disruption. 
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] ***  Your Max Processes Limit is currently 5568. 
It should be set to 65000 to avoid operational disruption. 
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8983 [-]  
Started Solr server on port 8983 (pid=71357). Happy searching!

This will start Solr in the background, listening on port 8983.

Check if Solr is Running

bin/solr status
Found 1 Solr nodes: 
Solr process 71357 running on port 8983
Sending stop command to Solr running on port 8983 … waiting up to 180 seconds to allow Jetty process 71357 to stop gracefully.

Interfaces

Use a web browser to see the admin console

htttp://localhost:8983/solr

Other Interfaces

  • Appleseed Search Web User Appleseed Search Web User interfaces – Angular JS 1 Search Interfaces for SolR, Elastic Edit Add topics.
  • Blacklight A multi-institutional open-source collaboration building a better discovery platform framework.
  • Solr PHP UI Solr client and user interface for search (UI).
  • AJAX Solr AJAX Solr is a JavaScript library for creating user interfaces to Apache Solr.
  • Spyglass Simple search results with Solr and EmberJS.
  • Splainer Angular JS Solr and Elasticsearch Diagnostic Search Services.
  • Solrstrap Solrstrap is a Query-Result interface for Solr.
  • ngSolr Easy faceted search for Apache Solr.
  • SOLR-AJAX Single Page Faceted Search Interface to Apache Solr/Lucene.
  • Solstice A simple Solr wrapper for AngularJS apps.
  • SolrDora A quick and easy way to explore the data in your Solr core.

Create a Core

bin/solr create -c 

This will create a core that uses a data-driven schema which tries to guess the correct field type when you add documents to the index.

To see all available options for creating a new core, execute:

bin/solr create -help

Once the core is created you can see that from the SOLR administration console

Indexing Exercise:

I followed the indexing exercise from the 

Indexing Tech products Example Data

This exercise will walk you through how to start Solr as a two-node cluster (both nodes on the same machine) and create a collection during startup. Then you will index some sample data that ships with Solr and do some basic searches.

Launch Solr in SolrCloud mode

To launch Solr, run: bin/solr start -e cloud on Unix or macOS

Searching Tools

Projects

  • Transformalize This tool expedites mundane data processing tasks like cleaning, reporting, and denormalization. Specifically can quickly process data from SQL/MySQL/PostgreSQL to Solr/ Elasticsearch.
  • JesterJ A new highly flexible, highly scaleable document ingestion system.
  • Spark-Solr Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
  • Flink Solr Connector Apache Flink Sink for Solr.
  • Apache Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
  • Storm Solr Tools for building Storm topologies for indexing data into SolrCloud.
  • [Kafka Connector for Solr Sink(https://github.com/MSurendra/kafka-connect-solr) Kafka Connect Solr for writing data to Solr.
  • SolrMQ SolrMQ is a plugin for Solr that allows you to send updates to Solr using an AMQP messaging queue. We use the RabbitMQ library.

Clients

  • SolrJ Java Solr Client.
  • SolrNet .NET Solr Client.
  • Solr Scala Client.
  • solrs An async, non-blocking solr client for java/scala, providing a query interface like SolrJ.
  • Scalikesolr Apache Solr. Client for Scala/Java.
  • Solr Play Scala Client A Scala library in Play framework for indexing and searching documents within an Apache Solr.
  • Python Solr Clients Reference to multiple Python Solr Clients.
  • Python: SolrClient SolrClient is a simple python library for Solr; built-in python3 with support for the latest features of Solr.
  • mysolr was born to be a fast and easy-to-use client for Apache Solr’s API and because existing Python clients didn’t fulfill these conditions.
  • rsolr A ruby client for Solr.
  • Sunspot Solar-powered searches for Ruby objects.
  • Solarium is a Solr client library for PHP.
  • Solr PHP extension The Solr extension allows you to communicate effectively with the Apache Solr Server in PHP.
  • Go-Solr A solr library written in Go.
  • go-solr Solr client in Go, core admin, add docs, update, delete, search, and more.
  • Gora A simple Solr client for Go.
  • CPAN Apache::Solr Perl Apache Solr.
  • Solrclj A Clojure client for Apache Solr.
  • flux A Clojure based Solr client.
  • solr-node-client A SOLR client for node.js. An SOLR client for indexing, adding, deleting, committing, and searching documents within an Apache Solr installation

Kinshuk Dutta
New York