Apache Phoenix: The critical co-processor metadata caching

Apache Phoenix is a pretty powerful query analytic layer built on top of Apache HBase. In this particular post, I will go through one of the ‘not so obvious’ issue while using Apache Phoenix.

Normally, Apache Phoenix is a relatively thin layer and thus majority of the day-to-day maintenance issues (like performance degradation, query failures or other reliability related issues) are generated from HBase. However, that’s not always the case and it’s a good idea to always check Phoenix client and its co-processor whenever you can’t find the root cause from HBase.

Now, this particular issue that I am going to cover here caused very serious performance degradation that actually rendered the entire Phoenix cluster not really usable. To make the situation even worse, there was no clear failure messages for this problem.

I will start from the general setup of the cluster and the symptom of the issue. The cluster mentioned here consists of around 25 nodes with a total capacity of around 100T. There is one Phoenix table in this cluster that occupied majority of the space. That table were both write and read intensive. While things were running fine for a pretty long time after the launch of the table, it suddenly went south in a given day. Below were the three main symptoms:

  1. The write/insertion speed suddenly dropped to around 10% of the original speed
  2. The read operation for the largest table become essentially not responsive (timeout all the time)
  3. Requests per second literally dropped to 0 for all region servers except one according to the HBase master page (key to the problem)

Some normal handling operations like the restart of HBase and the restart of HDFS were tried multiple times but none was able to resolve the problem. After many rounds of debugging, what got narrowed down was that all other small tables worked fine except the largest table. This resulted in the our action to stop writing and reading from that largest table and re-created a new table instead. After this operation, the Phoenix cluster went back to normal for about 3 months until the problem reoccurred.

This time, I decided to dig deeper and see what I can find. After certain amount of log reading, code reading and of course, guessing, finally the root cause of the problem was identified. In Phoenix, there was a thing called metadata cache. As all the Phoenix table information was stored in a region called “system.catalog”, the region becomes critical to the normal operation of the system (more like a Master). Queries you executed will need to go through this system.catalog first to obtain ‘necessary’ table information (I did not dig deep enough to fully understand the details). Due to the critical role this region had, Phoenix implemented a cache in the co-processor (client also have one) to help improving the performance. Now the problem comes, when you run out of this cache, the performance degraded dramatically. One reason for this is due to the fact that this region is not split-able and thus all your query will be congested into one node. And thus this is the reason why we saw the symptom where only one region server seems to have received requests during the incident (all requests are blocked by this region that serves the Phoenix metadata).

Long story short, below is the two critical configurations that need to be changed to have enough memory allocation.

  1. phoenix.coprocessor.maxMetaDataCacheSize (needs to be changed on all HBase region server’s hbase-site.xml)
  2. phoenix.client.maxMetaDataCacheSize (this is the client side cache)

In our case, the above parameters were changed to 100MB (from the 20MB/10MB default).

Haijie Wu

Leave a Reply

Your email address will not be published. Required fields are marked *