What’s in a name in the Big Data world?

Datum Scientia
3 min readDec 20, 2020

What’s in a name? that which we call a rose by any other name would smell as sweet.”

Let us have Mr. William Shakespeare become a Big Data admin for a day and he shall surely go around scratching this line from every copy of Romeo and Juliet there is.

The Hadoop ecosystem is an amalgamation of numerous services, and the versioning of each service is maintained independently of each other. But the services are generally tightly coupled and hence it isn’t surprising that they often get interchanged and can potentially wreak a chain of confusion.

Let’s say the HiveServer2 Interactive service needed to be restarted but the user forgets to mention the “Interactive” keyword in the email cause he/she thought it wasn’t relevant. But in reality, it refers to a different service altogether and the service which actually needed to be restarted during the maintenance window was left, UNTOUCHED. Imagine the frustration and tons of emails which would need to float again to get it done right this time. So, we decided to clearly differentiate each of the components with similar terminology and save some time and emails for everyone running Hortonworks Data Platform (HDP) clusters.

HiveServer2

HiveServer2 (HS2) is a service that enables clients to execute queries against Hive. Successor to HiveServer1 (which is now deprecated). The 2 in the end signifies the version of the HiveServer and does not signify the version of Hive itself. Multiple connections and authentication are the USPs of this particular upgrade from HiveServer1. It is simply the service which accepts JDBC/ODBC connections to allow a user to execute hive queries.

As seen above, HiveServer2 is a service running on one of the node whose IP is 10.0.0.1 and is active on the 10000 port of the node

How do you hit exactly this service?

By passing the IP and the port of the service in the JDBC URL you try to connect with. We’ll discuss the significance and details about this service in a different post.

Which Hive version am I using if I’m connecting to HiveServer2?

You would be using Hive 1.2.x if you are connecting to the HiveServer2. The Hive version is responsible for the allowed functionalities and how a query is read and processed.

Is it also called Hive Thrift Server?

Yes, this can also be referred to as the Hive Thrift Server, but you’ll soon see why it is better to just call it HiveServer2.

HiveServer2 Interactive

The keyword here being “Interactive”. Live Long And Process (LLAP) is the new way of querying via Hive with the HDP 2.x.x series.

How is HS2 different from HS2 Interactive?

Though they share the first name, they are working on completely different versions of Hive and frameworks. Interactive specifically refers to the LLAP framework which the server enables in the background (details of which we will discuss in another post) to execute the query.

Do we need to have both on the cluster? Why?

You would generally observe both installed on the cluster, just to allow backward compatibility for users who may not yet be comfortable with the LLAP architecture and would like to use Hive the old known way.

How do I differentiate between the 2?

The URLs would be different, either based on a different node they are running on or via a different port on the same node (both scenarios depicted below).

Which version of Hive am I using when I’m using HiveServer2 Interactive?

You would be using Hive 2.x.x version of Hive with LLAP daemon framework in the background.

Is it also called Hive Thrift Server?

Yes, this can also be referred to as the Hive Thrift Server, but you see from above, the kind of confusion it would cause.

Can you restart Hive Thrift Server? Which one? The one on 10.0.0.1? There are two. Umm, I’m not sure, which one then. I’ll just restart both. You know what, just to be safe, let’s restart all the services, cause that’s how this world runs. RESTART EVERYTHING!

To summarize,

HiveServer2 — Hive 1.x — HDP 1.x

HiveServer2 Interactive — Hive 2.x (with LLAP) — HDP 2.x

We are just getting started with this, sequel to this post COMING SOON.

References:

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Overview

https://cwiki.apache.org/confluence/display/Hive/LLAP

--

--

Datum Scientia

We bring you the world of Big Data from the inside.