Slowness in Java Application due to increased FullGC events – G1GC

In this Blog, we will see one of the issues and solutions which I found when one one of our production servers , our java application is becoming slow due to more gc pauses . 

I will explain this particular approach which can be one of the reasons for initiating more gc pauses . 

To understand this article , one must have basic knowledge of G1GC algorithm of java for garbage collection . 

Don’t worry if you don’t have knowledge of G1GC , I will make articles on basics of G1GC later and then you can read this article again . 

So, Let’s start from what issue we are facing   

Issue :  applications become unresponsive in between very frequently . 

Analysis :  

  •  after debugging from jvm level stats from JMX bean dumping  it was clear that GC collection time was increased so much in between 
  • Heap Also increasing 

After that we enabled gc log  by using -Xlog:gc=debug:file=/tmp/gc.log  in jvm arguments while starting application . 

Analyzing gc.log , we found Full GC is triggering many times  and whenever FullGC triggers , it generally stop the application for sometime , in java language we call it STW (Stop the World) . 

Generally there are following type of events in G1GC : 

  • Minor: Eden + Survivor From -> Survivor To
  • Mixed: Minor + (# reclaimable Tenured regions / -XX:G1MixedGCCountTarget) regions of Tenured
  • Full GC: All regions evacuated
  • Minor/Mixed + To-space exhaustion: Minor/Mixed + rollback + Full GC

In a smoothly running application, batches of Minor events alternate with batches of Mixed events should be there only . Full GC events and To-space exhaustion are things you absolutely don’t want to see when running G1GC, they need to be detected and eliminated and if they are running they should be run by some external events like (jstack,jmap etc …) . 

For in depth details of these events , as already stated I will make a blog series on explaining G1GC concepts   , for now you can search on the net . 

Now, coming back to our debugging , 

We checked that no external command for taking thread dump or heap dump or histogram was made  that can possibly initiate Full GC event . 

So , the question now was why this full GC is Triggering  . 

On Further Researching we found that Humongous objects can be one of the reasons for triggering the Full GC event . 

Now what is Humongous objects  ? ? ? 

A Brief Definition is  : Any single data allocation ≥ G1HeapRegionSize/2 is considered a Humongous object, which is allocated out of contiguous regions of Free space, which are then added to Tenured. As Humongous objects are allocated out of Free space. Allocation failures trigger GC events. If an allocation failure from Free space triggers GC, the GC event will be a Full GC, which is very undesirable in most circumstances. To avoid Full GC events in an application with lots of Humongous objects, one must ensure the Free space pool is large enough as compared to Eden that Eden will always fill up first  . 

So , We started checking if our application is generating Humongous objects . 

And from gc.log we found that lots of Humongous objects are created which were the reasons for triggering Full GC events  . 

I made following commands to check the Humongous objects specially in linux : 

Step 1. : run following command on your gc.log

Command 1 :

grep "source: concurrent humongous allocation" /tmp/gc.log | sed 's/.*allocation request: \([0-9]*\) bytes.*/\1/' > humoungous_humongoud_size.txt

Command 2 :

awk -F',' '{sum+=$1} END{print sum;}' humoungous_humongoud_size.txt

It will give you the size of humongous objects generated in my application.

We have java less than Oracle JDK 8u45 version and for java  greater than this , it is written in release notes that these Humongous objects also get collected in Minor events also . 

Search for “G1 now collects unreachable Humongous objects during young collections” in the

Release Notes JDK 

So then we upgraded our jdk and issue frequency was minimized too much as these objects are now not triggering any major event like FullGC .

But one should also care about generating these large object

So , we also checked and analyzed one of the heaps and corrected the code not to generate these big objects  if not needed . 

I hope this blog will be helpful .  Please comment , share and subscribe  . 

Debug Java Application Issues – Part -2 (Issue Debugging – Application Unresponsive)

In last blog we learned about diff. thread states in java, in this blog we will understand how to use that knowledge to debug application issues using fasthreadanalyzer tool.

Lets take one type of issue we see with Java Applications:

Application Unresponsive Issue

When we say application in unresponsive<it can mean different things to diff. people>, here we mean that application is not responding to external api calls.

Lets take an example of a Spring Boot application not responding to the Http API Calls. There can be several reasons to it:

  • Total Http Thread in tomcat (or whatever container spring is using) are consumed
    • Causes :
      • It could be because of some high cpu consuming work is done in those threads and all those threads are stuck doing that work <now to connect with thread states – those threads consuming cpu would be in RUNNABLE state , so we should be looking for lots of RUNNABLE state thread in the jstack>
      • It could be because of those threads are waiting on some external IO <now to connect with thread states – those threads are logically stuck on some IO to complete, means those threads would be in WAITING/BLOCKED state, we should be looking for threads with such states>
    • How to debug via jstack:
      • Take multiple jstacks
      • Now in every jstack to understand the what diff. thread are doing in which states they are stuck we will use the tool Just upload your jstack in this tool.
      • They will show you a table like this <thread group (generally http threads are part of group names http-bio-443-exec) vs count (total count of threads in this group) vs thread states(count of each thread state for these threads)>:

Now to make sense of the information above we will first see whether our http threads are available or not

  • if thread count on tool and thread count configured in tomcat.conf (or any other container configuration) ir equal to max thread count , that means all http threads are doing something and new requests cannot be processed
    • Yes
      • Now if stuck we will see what they are doing by seeing their thread states if most of them on running
        • means something in you application is taking long time to complete or system is bombarded with many http calls
      • Now if you see these threads are stuck on waiting/timed waiting/blocked that means most probably these threads are doing some IO and waiting on it
    • No
      • Some other issue may be related to JVM

Now to dig further exactly where the threads are waiting or stuck , you can click on the corresponding thread group and system will show what those threads are doing group by similar stack and their thread state, eg:

Now you can see the threads grouped by State and Stack , using this information you can figure out which service in the application is actually consuming the threads.

There could be many other reasons like JVM stuck , machine unresponsive we are not going in detail for them.

With the fast thread tool you can debug many such issues, we will cover more diff. type of issues in future posts.

Debug Java Application Issues – Part -1 (Understand Java Thread States)

The purpose of this blog series is to learn how to debug your java application issues, for this firstly we will understand what different thread states are there.

Lets understand what all are the different states of java stack :

In the above diagram you can 6 states of Java Thread:

  • New : When a new thread is created, it is in the new state. The thread has not yet started to run when thread is in this state. When a thread lies in the new state, it’s code is yet to be run and hasn’t started to execute.
  • Runnable/Running : A thread that is ready to run is moved to runnable state. In this state, a thread might actually be running or it might be ready run at any instant of time. It is the responsibility of the thread scheduler to give the thread, time to run. A multi-threaded program allocates a fixed amount of time to each individual thread. Each and every thread runs for a short while and then pauses and relinquishes the CPU to another thread, so that other threads can get a chance to run. When this happens, all such threads that are ready to run, waiting for the CPU and the currently running thread lies in runnable state.
  • Timed Waiting : A thread lies in timed waiting state when it calls a method with a time out parameter. A thread lies in this state until the timeout is completed or until a notification is received. For example, when a thread calls sleep or a conditional wait, it is moved to a timed waiting state.
  • Waiting : A thread is in the waiting state when it waits for another thread on a condition. When this condition is fulfilled, the scheduler is notified and the waiting thread is moved to runnable state.
  • Blocked : A thread is in the blocked state when it tries to access a protected section of code that is currently locked by some other thread. When the protected section is unlocked, the schedule picks one of the thread which is blocked for that section and moves it to the runnable state. A thread in this state cannot continue its execution any further until it is moved to runnable state. Any thread in these states does not consume any CPU cycle.
  • Terminated : A thread terminates because of either of the following reasons:
    • Because it exists normally. This happens when the code of thread has entirely executed by the program.
    • Because there occurred some unusual erroneous event, like segmentation fault or an unhandled exception.

Sample Code for creating threads with diff. thread states:

public class ThreadStatesDemo {

    public static class WaitingThread extends Thread {
        public void run() {
            Object o = new Object();
            try {
                synchronized (o) {
            } catch (InterruptedException e) {

    public static class SleepingThread extends Thread {
        public void run() {
            try {
            } catch (InterruptedException e) {

    public static class RunningThread extends Thread {
        public void run() {
            for (int i = 1; i > 0;) {


    public static class TimedWaitingThread extends Thread {
        public void run() {
            Object o = new Object();
            try {
                synchronized (o) {

            } catch (InterruptedException e) {

    public static Integer mutex = 0;

    public static class BlockedThread extends Thread {
        public void run() {
            try {
                synchronized (mutex) {

            } catch (InterruptedException e) {


    public static class BlockingThread extends Thread {
        public void run() {
            synchronized (mutex) {
                for (int i = 1; i > 0;) {


    public static void main(String[] args) {
        Thread wTh = new WaitingThread();
        Thread sTh = new SleepingThread();

        Thread rTh = new RunningThread();
        Thread twTh = new TimedWaitingThread();
        twTh.setName("timed waiting");
        Thread bldTh = new BlockedThread();
        Thread blcTh = new BlockingThread();

        try {
        } catch (InterruptedException e) {


When you will run the application , and take a jstack via jstack command , you will get some output like: this:

#command to take jstack 

jstack -l <pid>

2021-10-18 17:20:34
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.221-b11 mixed mode):

"blocking" #15 prio=5 os_prio=0 tid=0x00007f1ee411e800 nid=0xc99 runnable [0x00007f1eae09d000]
   java.lang.Thread.State: RUNNABLE
        at ThreadStatesDemo$
        - locked <0x000000076e5c0bb0> (a java.lang.Integer)

   Locked ownable synchronizers:
        - None

"blocked" #14 prio=5 os_prio=0 tid=0x00007f1ee411c800 nid=0xc98 waiting for monitor entry [0x00007f1eae19e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ThreadStatesDemo$
        - waiting to lock <0x000000076e5c0bb0> (a java.lang.Integer)

   Locked ownable synchronizers:
        - None

"timed waiting" #13 prio=5 os_prio=0 tid=0x00007f1ee411b000 nid=0xc97 in Object.wait() [0x00007f1eae29f000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x000000076e938550> (a java.lang.Object)
        at ThreadStatesDemo$
        - locked <0x000000076e938550> (a java.lang.Object)

   Locked ownable synchronizers:
        - None

"running" #12 prio=5 os_prio=0 tid=0x00007f1ee4119000 nid=0xc96 runnable [0x00007f1eae3a0000]
   java.lang.Thread.State: RUNNABLE
        at ThreadStatesDemo$

   Locked ownable synchronizers:
        - None

"sleeping" #11 prio=5 os_prio=0 tid=0x00007f1ee4117000 nid=0xc95 waiting on condition [0x00007f1eae4a1000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at ThreadStatesDemo$

   Locked ownable synchronizers:
        - None

"waiting" #10 prio=5 os_prio=0 tid=0x00007f1ee4115800 nid=0xc94 in Object.wait() [0x00007f1eae5a2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x000000076e7fae38> (a java.lang.Object)
        at java.lang.Object.wait(
        at ThreadStatesDemo$
        - locked <0x000000076e7fae38> (a java.lang.Object)

   Locked ownable synchronizers:
        - None

Now in this stack trace you can see the Thread State via the line:

java.lang.Thread.State: <THREAD STATE>


java.lang.Thread.State: WAITING (on object monitor)

In the code to create various Thread States a thread a created for each state or multiple thread for same state<representing how a particular thread state is reached>.

Let’s see:

  • Thread State RUNNABLE – a thread with code wise just working in a single loop.
  • Thread State WAITING – a thread with name waiting and code wise called wait() on a Object.
  • Thread State BLOCKED – a thread with name blocked and code wise trying to get into synchronized block on a Object whose lock is already taken by thread with name blocking <blocking thread almost always take the lock first because of sleep in Blocked Thread>
  • Thread State TIMED_WAITING – a thread with name timed_waiting and code wise called wait() with time<100000> on a Object and a thread with name sleeping and code wise called sleep on Thread Object.

Now i think you got an understanding of what all java thread states are there and how thread can end up in such state, there are more ways also apart from them.

How to start your journey into Microservices Part -2 (DDD)

In the last blog we learned that there are various areas of designing microservices.

In this blog we will work on decomposition of a larger problem into smaller microservices and look at the subtle but very important aspects on why to breakup and how simple it could be.

For this we will be using DDD. We will not go in detail of what exactly is ddd , we will work on the strategic design part of DDD, but lets still iterate over some of the principles of DDD.

One principle behind DDD is to bridge the gap between domain experts and developers by using the same language to create the same understanding. You must have seen cases where it becomes difficult for the product managers / experts to iteratively add features as the language between the pm and dev is different.

Another principle is to reduce complexity by applying object oriented design and design patters to avoid reinventing the wheel.

But what is a Domain? A Domain is a “sphere of knowledge”, for instance the business the company runs. A Domain is also called a “problem space”, so the problem for which we have to design a solution.

Lets choose a business problem like building a E-Commerce site<Amazon> on which we will try to apply DDD. We will not be able to go in complete depth of the business problem. I am writing very basic features required for it to work:

  1. User searches for a product
  2. User place a order for that product
  3. User pays online
  4. Delivery Management processes that order and start the delivery
  5. Delivery update the Order Status

Now we will try to design this in traditional way, lets create the schema:

  • Tables
    • Product
      • id
      • name
      • image url
      • price
      • count available
      • ….
    • Product Packaging Info
      • id
      • product id
      • size
      • isFragile
      • …..
    • Order
      • product id
      • user id
      • delivery address details
      • paid
    • Order Delivery Status
      • id
      • order id
      • delivery company name
      • delivery company id
      • delivery company code
    • Delivery Company
      • id
      • name
      • ….
    • User
      • id
      • name
      • address details
    • User Preferences
      • id
      • name
      • preferences

We can create a table structure like this in our system<lot more tables will be there>. I think by looking at those tables you could understand not all tables to be understood by every dev, eg: someone working on delivery management might not be interested in UserPreferences <used for searching> and someone working on searching might not be interested in OrderDeliveryStatus.

By this you can understand that we need to break the structure in smaller areas.

To design this in a way which helps to put more business context and smaller structure to manage . Lets put DDD.

As Domain in DDD means we are talking about a specific set of area <knowledge> . In larger sense E-commerce domain can be classified internally by various subdomain like:

  1. Identity Management
    1. User
  2. Inventory Management
    1. Product
  3. Product Search
    1. Product
    2. UserPreferences
  4. Order
    1. Order
    2. Order Delivery Status
  5. Delivery Management
    1. Order
    2. Product Packaging Info
    3. Delivery Company
    4. Order Deliver Status
  6. Billing

The separated Domain can easily be visualized. In DDD terms this is called a Context Map, and it is the starting point for any further modeling.Essentially what we have done is breakup the larger problem into smaller interconnected ones.

Now we need to align the Subdomain aka problem space to our solution design, we need to form a solution space. A solution space in DDD lingo is also called a Bounded Context, and it is the best to align one problem space/Subdomain with one solution space/Bounded Context.

In this we can think of each sub domain as a diff. microservice. Microservices are not complete without their dependencies . lets see them in context of bounded context:

  • Product Search – dependent on – Inventory Management
  • Delivery Management – dependent on – Order
  • Product Search – dependent on – User
  • Billing – dependent on – order
  • … so on

you can see that their is dependency between order and billing service and they talk via a common shared objects model which both of them can represent separately rather that using a single complete model which is cumbersome eg: order object in order service<care about status> is different from order in billing service<care about amount> . Now this is benefit of breaking them into smaller and separated domains.

To define such contracts one can also use ContextMapper.

There are certain outcomes one should take out of this:

  • Breaking into smaller pieces is very important so that it becomes easy for people to work on diff. part of the business
  • It is very simple when we are clear about the business sub domains .

After this i recommend you guys to go in more depth of DDD and look one more example here.

In next we will look about authentication mechanisms.

Tool for debugging Java Stack Traces – Untangle Threads

FastThreadAnalyzer is the tool which helps to debug you java stack traces.

It provides various insights about the stacks , club the similar stack together and provide you direction or in some cases the exact issue.

Some of the screenshots of the tools:

Threads of same thread pool grouped via thread state:

provides a clear view and you can deduce very easily that whether all threads of a particular pool should be busy or not

Now you see these threads group via similar stacks , which provides great insights on what type of flow are stuck:

Overall Thread State Wise Grouping:

Thread Dependencies:

very easily you can see because of which thread ids these threads are blocked.

There are various other type of grouping of stacks as well like based on length of stacks , deadlocks and so on.

If your org works on java that this should be the default tool to debug the application issues.

Bulk Updation/Insertion of Database Tables in Java using Hibernate – Optimized Way

Hibernate is the most popular orm framework used to interact with databases in java . In this article  we will see what are the various ways using which bulk selection and updation in any table can be done and what is the most effective way when using the hibernate framework in java . 

I  experimented with three ways which are as follows : 

  • Using Hibernate’s Query.list() method.
  • Using ScrollableResults with FORWARD_ONLY scroll mode.
  • Using ScrollableResults with FORWARD_ONLY scroll mode in a StatelessSession.

To decide which one gives best performance for our use case, following tests i performed using the above three ways listed.

  • Select and update 1000 rows.

Let’s see the Code and results by applying above three ways to the operation stated above one by one. 

Using Hibernate’s Query.list() method.

Code Executed : 

   List rows;
        Session session = getSession();
        Transaction transaction = session.beginTransaction();
        try {
            Query query = session.createQuery("FROM PersonEntity WHERE id > :maxId ORDER BY id").setParameter("maxId",
            rows = query.list();
            int count = 0;
            for (Object row : rows) {
                PersonEntity personEntity = (PersonEntity) row;
                //Always flush and clear the session after updating 50(jdbc_batch_size specified in rows
                if (++count % 50 == 0) {
        } finally {
            if (session != null && session.isOpen()) {

Tests Results : 

  • Time taken:- 360s to 400s
  • Heap Pattern:- gradually increased from 13m to 51m(from jconsole). 

Using ScrollableResults with FORWARD_ONLY scroll mode.

With this we are expecting that it should consume less memory that the 1st approach . Let’s see the results 

Code Executed : 

Session session = getSession();
        Transaction transaction = session.beginTransaction();
        ScrollableResults scrollableResults = session
                .createQuery("FROM PersonEntity WHERE id > " + MAX_ID_VALUE + " ORDER BY id")
        int count = 0;
        try {
            while ( {
                PersonEntity personEntity = (PersonEntity) scrollableResults.get(0);
                if (++count % 50 == 0) {
        } finally {
            if (session != null && session.isOpen()) {

Tests Results : 

  • Time taken:- 185s to 200s
  • Heap Pattern:- gradually increased from 13mb to 41mb (measured same using jconsole)

Using ScrollableResults with FORWARD_ONLY scroll mode in a StatelessSession.

A stateless session does not implement a first-level cache nor interact with any second-level cache, nor does it implement transactional write-behind or automatic dirty checking, nor do operations cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate’s event model and interceptors.   

These type of session is always recommended in case of bulk updation as we really do not need these overheads of hibernate features in these type of usecases . 

Code Executed : 

 StatelessSession session = getStatelessSession();
        Transaction transaction = session.beginTransaction();
        ScrollableResults scrollableResults = session
                .createQuery("FROM PersonEntity WHERE id > " + MAX_ID_VALUE + " ORDER BY id")
        try {
            while ( {
                PersonEntity personEntity = (PersonEntity) scrollableResults.get(0);
        } finally {
            if (session != null && session.isOpen()) {

Tests Results : 

  • Time taken:- 185s to 200s
  • Heap Pattern:- gradually increased from 13mb to 39mb

I also performed the same tests with 2000 rows and the results obtained were as follows:-


  • Using list():- time taken:- approx 750s, heap pattern:- gradually increased from 13mb to 74 mb
  • Using ScrollableResultSet:- time taken:- approx 380s, heap pattern:- gradually increased from 13mb to 46mb
  • Using Stateless:- time taken:- approx 380s, heap pattern:- gradually increased from 13mb to 43mb

Blocker Problem with all above approaches Tried

ScrollableResults and Stateless ScrollableResults give almost the same performance which is much better than Query.list(). But there is still one problem with all the above approaches. Locking, all the above approaches select and update the data in same transaction, this means for as long as the transaction is running, the rows on which updates have been performed will be locked and any other operations will have to wait for the transaction to finish.

Solution : 

There are two things which we should do here to solve above problem : 

  •  we need to select and update data in different transactions.
  • And updation of these types should be done in Batches

So again I performed the same tests as above but this time update was performed in a different transaction which was commited in batches of 50.

Note:- In case of Scrollable and Stateless we need a different session also, as we need the original session and transaction to scroll through the results.

Results using Batch Processing

  • Using list():- time taken:- approx 400s, heap pattern:- gradually increased from 13mb to 61 mb
  • Using ScrollableResultSet:- time taken:- approx 380s, heap pattern:- gradually increased from 13mb to 51mb
  • Using Stateless:- time taken:- approx 190s, heap pattern:- gradually increased from 13mb to 44mb

Observation:- This temporal performance of ScrollableResults dropped down to become almost equal to Query.list(), but performance of Stateless remained almost same.

Summary and Conclusion

As from all the above experimentation  , in cases where we need to do bulk selection and updation, the best approach in terms of memory consumption and time is as follows : 

  • Use ScrollableResults in a Stateless Session.
  • Perform selection and updation in different transactions in batches of 20 to 50 (Batch Processing) (Note -*-  Batch size  can depend on the case to case basis)

  Sample Code with the best approach

  StatelessSession session = getStatelessSession();
        Transaction transaction = session.beginTransaction();
        ScrollableResults scrollableResults = session
                .createQuery("FROM PersonEntity WHERE id > " + MAX_ID_VALUE + " ORDER BY id")
        int count = 0;
        try {
            StatelessSession updateSession = getStatelessSession();
            Transaction updateTransaction = updateSession.beginTransaction();
            while ( {
                PersonEntity personEntity = (PersonEntity) scrollableResults.get(0);
                if (++count % 50 == 0) {
                    updateTransaction = updateSession.beginTransaction();
        } finally {
            if (session != null && session.isOpen()) {

With the   java frameworks like spring and others this code may be even more smaller , like one not needing to  take care of session closing etc . Above code is written in plain java using hibernate. 

Please  try with large data and comment us the results , Also if you have some other better approach to do this please comment . 

Thank You for reading the article

How to start your journey into Microservices Part -1

Architecting an application using Microservices for the first timers can be very confusing. This article is very relevant if

  • You are you beginning to develop an application that can scale like Amazon, Facebook, Netflix, and Google.
  • You are doing this for the first time.
  • You have already done research and decided that microservices architecture is going to be your secret sauce.

Microservices architecture is believed to be the simplest way of scaling without limits. However, when you get started, a lot of considerations are going to confuse you. Questions arose as I spent time learning about it online or discussing it with a team:

  1. What exactly is a microservice?
    1. Some said it should not exceed 1,000 lines of code.
    2. Some say it should fit one bounded context (if you don’t know what a bounded context is, don’t bother with it right now; keep reading).
  2. Even before deciding on what the “micro”service will be, what exactly is a service?
  3. Microservices do not allow updating multiple entities at once; how will I maintain consistency between entities
  4. Should I have a single database cluster for all my microservices?
  5. What is this eventual consistency thing everyone is talking about?
  6. How will I collate data which is composed of multiple entities residing in different services?
  7. What would happen if one service goes down? How would the dependent services behave?
  8. Should I make a sync invocation between microservices to always get consistent data?
  9. How will I manage version upgrades to a few or all microservices? Is it always possible to do it without downtime?
  10. And the last unavoidable question – how do I test the entire application as an integrated application?

Hmm… All of the above questions must be answered to able to be understand and deploy applications based on microservices.

Lets first list down all things we should cover to understand Microservices :

  • Decomposition – Breaking of system in Microservices and contracts between them
  • Authentication – how authentication info is passed from service to service , how diff. services validate the session
  • Service Discovery – hard coded , external system based like consul , eureka , kubernetes
  • Data Management – Where to store the data , whether to share the DB or not
  • Auditing – very important in business applications to have audit information about who updated or created something
  • Transactional Messaging – when you require very high consistency between DB operation and its event passed onto diff. services
  • Testing – What all to test , Single Service , Cross Service Concerns
  • Deployment Pattern – serverless , docker based
  • Developer Environment Setup – All Services running on developer machine , or single setup
  • Release Upgrades – How to do zero downtime release upgrades , blue green deployments
  • Debugging – Pass tracing id between services , track time taken between services , log aggregation
  • Monitoring – API time taken , System Health Check
  • UI Development – Single Page Applications or Micro Front Ends and client side composition
  • Security – for internal users

The First Thing we should learn is how to Decompose or build the Services:

Domain Driven Design:

While researching the methodology to break up the application into these components and also define the contract between them, we found the Domain Driven Design philosophy to be the most convincing. At a high level, it guided us on

  • How to break a bigger business domain into smaller bounded contexts(services).
  • How to define contracts between them(services).

These smaller components are the microservices. At a finer level, domain driven design (aka DDD) provides tactical methods to help us with

  • How to write code in a single bounded context(services).
  • How to break up the entities(business objects within the service).
  • How to become eventually consistent(as our data is divided into multiple services we cannot have all of them consistent every moment).

After getting the answer to “how to break the application into microservices,” we needed a framework for writing code on these guidelines. We could come up with a framework of our own, but we chose to not to reinvent the wheel. Lagom , Axon, Eventuate are all java based , frameworks which provides all the functionality that we require to do microservices, like

  1. Writing services using DDD.
  2. Testing services.
  3. Integration with Kafka, Rabbit Mq …, the messaging framework for message passing between services.

This is Part 1 in a series of articles. I have explained how we got our direction on getting started with microservices. In the next article, we will discuss about a sample Application and breakup of that using DDD .

Recommended References

Thanks to the blogs by Vaugh VernonUdi DahanChris Richardson, and Microsoft. A few specific references:

  1. Youtube for Event Sourcing, CQRS, Domain Driven Design, Lagom.
  4. Domain Driven Design Distilled and implementing Domain-Driven Design, by Vaugh Vernon.
  5. by Chris Richardson.
  6. Event Storming, by Alberto.

Want to be a better programmer – Read , Read , Read – But How?

If one wants to become a better programmer one thing is for sure that one needs to understand others code.

For beginners when we start writing the code, when using libraries very basic like List , Set , we get stuck in basic things like:

  • What this class do?
  • How to use this class?

Now this comes because we are missing one important trick:

  • First associate a purpose for the class be reading its definition and by name.
  • Now if you have the purpose you will be able to automatically make out what the functions in this class should be
  • Same goes for the functions first associate a purpose.

Now if you start reading the code by understanding the purpose of the class, you will be able to understand and use that class.

Now the next thing you should do is when you understood the purpose is :

  • look at their internal implementations
  • understand the data structure or variables they have declared
  • try to reason about the purpose why they have declared like that
  • Any pors / cons – alternative implementation you can think of.

I bet if one starts doing this in initial part of their programming career, this would help them:

  • to reason lots about other libraries
  • understand new libraries faster
  • also while writing new libraries one will be able to choose right data structures.

Lets see a demo of the above theory:

We all know there is Collections in java which looks somethings like this:

Java Collection Structure

Now lets start first by assigning purpose and see how things follow automatically :

  • Collection Interface – it says one can have some objects inside me.
    • so the functions should be in this class
      • one function to add a object
      • one function to remove a object
  • List Interface – it says its a collection but has a ordering for objects means you get in the order you put
    • so the extra functions in this class should be
      • get on a particular position get(int i)
      • add on a particular position
  • Set Interface – it also says it is a collection but it contains only unique objects and does not care about ordering
    • so the functions in this class
      • one is contains to check whether this already exist as it will not add a object twice

You see by just understanding the class purpose we could easily make out the functions , their purpose.

I am leaving the next part to you guys on looking at the internal implementation. Will make a next blog for this.

Naming is the most important thing

There are only two hard things in Computer Science: cache invalidation and naming things.

This quote is one of my favorite programming quotes and we will discuss the reason why here

We will discuss the naming things part here.

In your entire career in coding what you will do most is reading others code or reading others libraries or new technologies and understanding them, solving bugs in the existing code. If you agree lets move forward.

Now a small story, there was a bug in our existing system and one of my fellow programmer was working on that. The bug was in the core libraries written years back and to add to his woes there was no documentation neither external and nor on code.

He was working on it for days with no result and we were discussing the problem and he said i was unable to understand the code, now i start debugging with him and the bug was solved in minutes (not to brag about my skills), and he said how you are able to go to the problem class and function so easily. I said its just intuition from the naming of the classes and function.

Here i understood its still not natural or important for people the naming convention.

Story part over.

Lets discuss that – what is this intuition thing all about . What i learnt in my coding career that before naming a class first associate a purpose with that class and name the class such that one can understand most part of the purpose with that name and then start adding function in that class and any function if not in alignment to that class purpose then that function should not be there and also through the function name one should be able to make out its implementation without looking at the code.

Now people who read lots of code will realize that’s how most of the good code are written and most of the libraries you will come around will follow this.

Now with the intuition you got from the name without reading the code one can make out the functions and so on. This thing happen in real life as well, if i say something a screen you know mostly what it does (if you think in right context).

Similarly if we start naming our classes or functions correctly this not just help us but help other programmers who will be working on it later.

So be very careful about when you name anything (class , function ) it should convey the purpose.

Also as a developer you should and must build the intution from naming and read code.

Start naming things correctly and read code with the right intuition – dont just jump into the code.

A Database Session Leak Can Slow Down Your Database

In this article, I will explain some consequences of session leaks, which I faced when an issue came to me.

Session Leak is very common in developers’ lives. In this article, I will explain some consequences of session leaks, which I faced when an issue came to me and how I came to the root cause of the issue.

Issue and Analysis

IssueLoad is increasing on the server and Postgres queries are consuming CPU and taking time.

I had to resolve this issue and find the root cause of it.

Analysis: After all my debugging, I came to the following analysis:

There is a table that has 200 rows, but the number of live tuples showing there is more than that (around 60K). We are using Postgresql 9.3.

The following are the queries that I ran.

select count(*) from subscriber_offset_manager; 
200 (1 row) 

SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup ; 
schemaname | relname | n_live_tup | n_dead_tup 
public | subscriber_offset_manager | 61453 | 5 (1 row)

But as seen from pg_stat_activity and pg_locks, we are not able to track any open connection.

SELECT query, state,locktype,mode FROM pg_locks JOIN pg_stat_activity USING (pid) WHERE relation::regclass = 'subscriber_offset_manager'::regclass ; 
query | state | locktype | mode 
(0 rows)

I also tried full vacuum on this table. Below were the results:

  • All the times no rows were removed
  • Many times all the live tuples become dead tuples.

Here is the output of the running vacuum command:

vacuum FULL VERBOSE ANALYZE subscriber_offset_manager; 
INFO: vacuuming "public.subscriber_offset_manager" 
INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in714 pages 
DETAIL: 67720 dead row versions cannot be removed yet. CPU 0.01s/0.06u sec elapsed 0.13 sec. 

INFO: analyzing "public.subscriber_offset_manager" 
INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and67720 dead rows; 200 rows in sample, 200 estimated total rows VACUUM 
after that i checked for live and dead tuples for that table as follows : 

SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup 

schemaname | relname | n_live_tup | n_dead_tup 
public | subscriber_offset_manager | 200 | 67749

After 10 seconds:

SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup ;

schemaname | relname | n_live_tup | n_dead_tup
public | subscriber_offset_manager | 68325 | 132

All the dead tuples moved to live tuples instead of cleaning up.

One more interesting observation: When I stop my Java app and then do a full vacuum, it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from the Java app.

After all the research and analysis and help for stack overflow and after following many links, I found the following root cause.

Root Cause:

When there is one long-running transaction or a database session leak, then dead tuples are created after the start time of that transaction and will not be cleaned up by the vacuum for all the tables for that database. This is due to the PostgreSQL vacuum process that checks for a transaction ID less than the transaction ID of the oldest transaction for cleaning dead rows. The transaction ID is generated globally.

When I checked, I found a transaction that was opened for too long and when I killed it, the vacuum worked fine.

Please read the below-given links for detailed info and effects of not ending database transaction and some Postgres internals. These links helped me a lot in solving the issue.

1.Question asked by me on StackOverFlow:

2. When Postgresql Vacuum does not Work:

3. Consequences of not ending a database transaction:

4.Some Postgres internals and how the vacuum works internally in Postgres