‘What’s in a name?’ returns

Datum Scientia
3 min readDec 28, 2020

Couldn’t think of a better name for the most awaited sequel (after Avengers and Game of Thrones, of course). Please go through the first post, before you go through this one, cause unlike the X-men movies, we actually follow a timeline.

Let’s get started busting some more naming myths.

Aren’t LLAP and Hive 2 the same thing?

Nope, not even close.

Surprised Pikachu

Hive 2 is the version of Hive which is responsible to understand the hive query and the logic of the operations in your query which can be broken into smaller tasks by the query engine. With the version upgrade, it supports more functions in your queries allowing you to write queries with more easy operations (for example, a function to allow extraction of month from a date field) and also overcomes the limitations of the previous version, bringing the queries closer to the ones written in the RDBMS world.

LLAP on the other hand, is a daemon (continuously running background process, always ready to quickly service requests) framework which is responsible for executing the small tasks and itself is NOT an engine (like MapReduce or TEZ). In layman terms, its sole responsibility is to execute the tasks utilising the resources of the cluster as instructed. It itself has no logic of its own and performs what it is told.

Ambari Hive View/Hive View 2.0

From the perspective of Ambari, Hive View is miles away from the conventional RDBMS views or materialised views we generally know. Hive View (Ambari) is like a window into the data on the cluster, a web-based data design tool that facilitates SQL analytics, development and design (something like SQL/Aginity Workbench) for analysts and DBAs.

Ambari 2.5 introduced Hive View 2.0 (a version upgrade for the web-based data design tool) with a brand-new user experience plus a slew of great new tools to help DBAs run Hive jobs faster and more effectively than ever before. With Hive View 2.0 users can create table, export DDL, export column information, check visual explain plan of the query, save queries in worksheets, store output in local or HDFS and compute statistics, all within a few clicks.

Ambari Hive Views are now deprecated on Ambari 2.7.5, now to provide access across data enterprises go with Zeppelin Notebook, DAS (Data Analytics Studio) or HUE on CDH (Cloudera Distribution).

Does Hive View 2.0 connect to Hive 2?

This is tricky, by default it is configured to connect to HiveServer2, but the admin can configure it to connect to HiveServer2 Interactive instead.

Views as we’ve known it from RDBMS world

In RDBMS systems views are logical subset of data from one or more tables. Basically a simpler manifestation of maybe a large simple/complicated query. Something like when someone is named Charles Philip Arthur George Mountbatten-Windsor, you don’t call them out with the same name every single time, you just find a more convenient shorter way to identify them, in this case, Prince Charles. The views as we know it don’t have a physical manifestation or aren’t storing any data of their own which means the underlying query is running in the background to fetch you the required results but this was optimised with the help of Indexed Views, to make accessing the data via a view quicker than conventionally firing the entire query and hence performed slightly better than the underlying query itself. But, there are no such optimisations provided for Hive views at the moment. Hive View DDLs can be found here. And with Hive2.x version, there is no support for Materialized Views either, but has been added with Hive 3.0.

References:

https://cwiki.apache.org/confluence/display/Hive/LLAP

https://hortonworks.com/blog/3-great-reasons-to-try-hive-view-2-0/

https://cwiki.apache.org/confluence/display/Hive/Materialized+views

--

--