Evolution of our Hadoop Architecture — Chapter 1: Everyone needs some cache

Datum Scientia
2 min readJan 4, 2021

If you landed here for the first time (but how?), would highly recommend going through the introduction before reading on to have the right context.

We started off on a basic Hadoop setup with about 50 AWS i3 instances (we also like to call it nodes) and during the migration we also enabled LLAP (please read this article to understand LLAP), the BI framework offering, supported by then Hortonworks.

We spent almost 3 months stabilizing the platform post migration. The indexing solution we used, Sparkline (now acquired by Oracle) which generated indexes on our most used data was massively impacted by the migration. The expected SLAs for the BI queries on Sparkline (via a modified thrift spark server) were to be between 1–3 seconds (which was the sole purpose of purchasing this framework). This leaves very little room for error and hence was easily noticeable with the erratic behavior of highly aggregated run-times. The RCA identified that the way Sparkline works was to create local caches of the indexes on each executor node and every time it spawned an executor on a node it hadn’t seen before, the local cache (~300G size/node) would have to be re-built from scratch and hence we were observing the intermittent nature. Trust me, this single line of RCA took weeks of effort to narrow down.

The above observation led to the first major change to our Hadoop architecture. Initial suggestions were to separate out and make a mini-cluster for Sparkline alone, which was not something we wanted to bear the overhead of operationally and depend on S3 for storage, impacting every other service. We somehow had to ensure that the executors spawned always sat on the designated nodes and never jumped around on other nodes prevent any cache rebuilding.

That led us to creating a logical layer isolation similar to how we create folders on the hard disks. The underlying hardware is shared but with the help of the folders, we are able to segregate relevant data together, which led to the birth of node labels in our cluster. As the name suggests, we had to label the nodes and assign a particular queue to the labels and in turn assign a specific user to access the queue (learn how to create queues here and node labels here).

User -> Queue -> Node Label -> Application (Sparkline)

So in short, users assigned to queue, queue assigned to label and in our case label assigned to an application. And this way we established a logical partition within the cluster and resolved the original problem of Sparkline executor spawning and jumping around across the nodes.

Ever since this slight tweak in the YARN architecture, we were able to resolve the earlier issue of dispersed caching across the nodes. So the cluster was essentially labelled as Sparkline and non-Sparkline, which entertained every other application we had.

Watch out for the next set of chapters and follow us through this evolution story!

Links to check:

--

--