Slowness in Java Application due to increased FullGC events – G1GC

In this Blog, we will see one of the issues and solutions which I found when one one of our production servers , our java application is becoming slow due to more gc pauses . 

I will explain this particular approach which can be one of the reasons for initiating more gc pauses . 

To understand this article , one must have basic knowledge of G1GC algorithm of java for garbage collection . 

Don’t worry if you don’t have knowledge of G1GC , I will make articles on basics of G1GC later and then you can read this article again . 

So, Let’s start from what issue we are facing   

Issue :  applications become unresponsive in between very frequently . 

Analysis :  

  •  after debugging from jvm level stats from JMX bean dumping  it was clear that GC collection time was increased so much in between 
  • Heap Also increasing 

After that we enabled gc log  by using -Xlog:gc=debug:file=/tmp/gc.log  in jvm arguments while starting application . 

Analyzing gc.log , we found Full GC is triggering many times  and whenever FullGC triggers , it generally stop the application for sometime , in java language we call it STW (Stop the World) . 

Generally there are following type of events in G1GC : 

  • Minor: Eden + Survivor From -> Survivor To
  • Mixed: Minor + (# reclaimable Tenured regions / -XX:G1MixedGCCountTarget) regions of Tenured
  • Full GC: All regions evacuated
  • Minor/Mixed + To-space exhaustion: Minor/Mixed + rollback + Full GC

In a smoothly running application, batches of Minor events alternate with batches of Mixed events should be there only . Full GC events and To-space exhaustion are things you absolutely don’t want to see when running G1GC, they need to be detected and eliminated and if they are running they should be run by some external events like (jstack,jmap etc …) . 

For in depth details of these events , as already stated I will make a blog series on explaining G1GC concepts   , for now you can search on the net . 

Now, coming back to our debugging , 

We checked that no external command for taking thread dump or heap dump or histogram was made  that can possibly initiate Full GC event . 

So , the question now was why this full GC is Triggering  . 

On Further Researching we found that Humongous objects can be one of the reasons for triggering the Full GC event . 

Now what is Humongous objects  ? ? ? 

A Brief Definition is  : Any single data allocation ≥ G1HeapRegionSize/2 is considered a Humongous object, which is allocated out of contiguous regions of Free space, which are then added to Tenured. As Humongous objects are allocated out of Free space. Allocation failures trigger GC events. If an allocation failure from Free space triggers GC, the GC event will be a Full GC, which is very undesirable in most circumstances. To avoid Full GC events in an application with lots of Humongous objects, one must ensure the Free space pool is large enough as compared to Eden that Eden will always fill up first  . 

So , We started checking if our application is generating Humongous objects . 

And from gc.log we found that lots of Humongous objects are created which were the reasons for triggering Full GC events  . 

I made following commands to check the Humongous objects specially in linux : 

Step 1. : run following command on your gc.log

Command 1 :

grep "source: concurrent humongous allocation" /tmp/gc.log | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' > humoungous_humongoud_size.txt

Command 2 :

awk -F',' '{sum+=$1} END{print sum;}' humoungous_humongoud_size.txt

It will give you the size of humongous objects generated in my application.

We have java less than Oracle JDK 8u45 version and for java  greater than this , it is written in release notes that these Humongous objects also get collected in Minor events also . 

Search for “G1 now collects unreachable Humongous objects during young collections” in the

Release Notes JDK 

So then we upgraded our jdk and issue frequency was minimized too much as these objects are now not triggering any major event like FullGC .

But one should also care about generating these large object

So , we also checked and analyzed one of the heaps and corrected the code not to generate these big objects  if not needed . 

I hope this blog will be helpful .  Please comment , share and subscribe  . 

Debug Java Application Issues – Part -2 (Issue Debugging – Application Unresponsive)

In last blog we learned about diff. thread states in java, in this blog we will understand how to use that knowledge to debug application issues using fasthreadanalyzer tool.

Lets take one type of issue we see with Java Applications:

Application Unresponsive Issue

When we say application in unresponsive<it can mean different things to diff. people>, here we mean that application is not responding to external api calls.

Lets take an example of a Spring Boot application not responding to the Http API Calls. There can be several reasons to it:

  • Total Http Thread in tomcat (or whatever container spring is using) are consumed
    • Causes :
      • It could be because of some high cpu consuming work is done in those threads and all those threads are stuck doing that work <now to connect with thread states – those threads consuming cpu would be in RUNNABLE state , so we should be looking for lots of RUNNABLE state thread in the jstack>
      • It could be because of those threads are waiting on some external IO <now to connect with thread states – those threads are logically stuck on some IO to complete, means those threads would be in WAITING/BLOCKED state, we should be looking for threads with such states>
    • How to debug via jstack:
      • Take multiple jstacks
      • Now in every jstack to understand the what diff. thread are doing in which states they are stuck we will use the tool https://fastthreadanalyzer.com/. Just upload your jstack in this tool.
      • They will show you a table like this <thread group (generally http threads are part of group names http-bio-443-exec) vs count (total count of threads in this group) vs thread states(count of each thread state for these threads)>:

Now to make sense of the information above we will first see whether our http threads are available or not

  • if thread count on tool and thread count configured in tomcat.conf (or any other container configuration) ir equal to max thread count , that means all http threads are doing something and new requests cannot be processed
    • Yes
      • Now if stuck we will see what they are doing by seeing their thread states if most of them on running
        • means something in you application is taking long time to complete or system is bombarded with many http calls
      • Now if you see these threads are stuck on waiting/timed waiting/blocked that means most probably these threads are doing some IO and waiting on it
    • No
      • Some other issue may be related to JVM

Now to dig further exactly where the threads are waiting or stuck , you can click on the corresponding thread group and system will show what those threads are doing group by similar stack and their thread state, eg:

Now you can see the threads grouped by State and Stack , using this information you can figure out which service in the application is actually consuming the threads.

There could be many other reasons like JVM stuck , machine unresponsive we are not going in detail for them.

With the fast thread tool you can debug many such issues, we will cover more diff. type of issues in future posts.