Evolution of our Hadoop Architecture — Chapter 4: Divide and Conquer

4 min readJan 29, 2021

Finally, the most awaited day has come, here is the finale of the series. You’re the real MVP if you followed our journey from the pilot episode.

The previous chapters covered the LLAP saga of encountering issues and applying patches provided by the LLAP authors to help us continue our ETL and BI cars on the same road but with one slowing down the other. With such interruptions, the only logical solution was to isolate and run them on separate queues and servers.

“You mean adding one more hive interactive server, okay but why and how ?”

To climb up the ladder for scaling your workload with an ever-increasing ingestion workload is bound to challenge the performance consistency of LLAP. Keeping ETL and BI in separate rooms will help in dealing with the diverse downstream sub-problems more effectively, conflicted interactive query optimization and dynamic scaling ETL ingestion jobs for larger partition values. Following were the fundamental steps to redesign our Hadoop Architecture along with the runbook mentioned in the published appendix:

Node labeling for ETL and BI workload
Another Hive Server2 Interactive for ETL workload
Containerized mode of LLAP

Node Labelling: ETL & BI

To isolate the workloads, label the nodes and assign them to only run containers designated to specified jobs. Here how we performed capacity planning for segregating nodes:

Nodemanager max memory = 160GBBefore Separation LLAP (ETL + BI):
Total LLAP Heap Size = 10(Num of Daemons) * 54GB(Daemon Heap Size) = 540GBLLAP Resource Allocation -> 1TB (Total LLAP Heap Size = 540GB, Total Cache = 360GB, Total Headroom = 100GB, Nodes for Daemons = 10)ETL After separation:#To maintain equivalent heap size at container level
ETL -> 4 nodes = 160GB * 4 = 640GB ~ 540GB (Old Total LLAP Heap Size)LLAP After separation:BI (LLAP) -> 900GB (Total LLAP Heap Size = 540GB (Old Total LLAP Heap Size), Total Cache = 340GB, Total Headroom = 30GB, Nodes for Daemons = 6)Thus maintaining equivalent or same heap size in old and new configuration.

Later from here, we can increase the number of nodes based on resource requirements.

Add HiveServer2Interactive: ETL workload

Need for another interactive server for ETL as on available distributed version (HDP 2.6.5) Hive version 2.x was only available with LLAP that enables ORC headers while writing data. On recent distributions, it doesn’t have this dependency with LLAP, and provisioning of multiple interactive servers is one click away. But our journey was filled with challenges so needed a runbook to hack and implement multiple interactive servers.

Containerized mode of LLAP

Another daemon-backed LLAP service for ETL could lead to the same issues as mentioned in the third chapter. In order to use Hive 2.x, we enabled LLAP in container mode running with a pseudo-single daemon instance (to keep the LLAP service live) along with additional config changes as discussed in our runbook.

Impact Analysis: Pros & Cons

Pros:

Breaking record of ZERO incident reported from LLAP application
Increased performance due to a higher cache hit rate and lower cache eviction
Dynamic increase to ETL containers allocation based on the ingested table size

Cons:

Lowering data locality from ETL nodes to BI nodes
Losing cache elevation on ETL workloads
No live running ETL containers could be avoided by prewarm containers

This brings us to the end of our journey of “Evolution of our Hadoop Architecture”, it took us four months to reach this milestone hope we were able to answer most of your queries and drive you towards the right decisions for planning out the evaluation of Hive LLAP depending on your workloads.

Links to check:

Appendix:

Evolution of our Hadoop Architecture— Appendix: Steps to divide and conquer

Steps to host another interactive server with container backed LLAP on kerberized HDP 2.6.5 cluster

datumscientia.medium.com

Chapters:

Evolution of our Hadoop Architecture — Chapter 1: Everyone needs some cache

How our first problem led to the first major and quirky change to our Hadoop architecture

datumscientia.medium.com

Evolution of our Hadoop Architecture — Chapter 2: The LLAP Saga

As you’ve probably realized from the title, there are chapters before this one. So make sure you check them out to have…

datumscientia.medium.com

Evolution of our Hadoop Architecture — Chapter 3: The LLAP Saga Continues

And here is the third part of this quartet, to have the overall context make sure you check out earlier chapters here.

datumscientia.medium.com

Evolution of our Hadoop Architecture — Chapter 4: Divide and Conquer

Node Labelling: ETL & BI

Add HiveServer2Interactive: ETL workload

Containerized mode of LLAP

Impact Analysis: Pros & Cons

Pros:

Cons:

Links to check:

Evolution of our Hadoop Architecture— Appendix: Steps to divide and conquer

Steps to host another interactive server with container backed LLAP on kerberized HDP 2.6.5 cluster

Evolution of our Hadoop Architecture — Chapter 1: Everyone needs some cache

How our first problem led to the first major and quirky change to our Hadoop architecture

Evolution of our Hadoop Architecture — Chapter 2: The LLAP Saga

As you’ve probably realized from the title, there are chapters before this one. So make sure you check them out to have…

Evolution of our Hadoop Architecture — Chapter 3: The LLAP Saga Continues

And here is the third part of this quartet, to have the overall context make sure you check out earlier chapters here.

Written by Datum Scientia