1. TROUBLESHOOTING CPU SPIKES IN JAVA APPLICATIONS

Possible causes of high CPU usage in Java apps might be related to:

  • Garbage Collector (GC) executing Major or Full collections too frequently without freeing much memory as a result of a memory leak in one of the applications served by the servlet container or a leak in the servlet container itself.
  • GC often executing Major or Full collections (similar to previous suggestion) because in fact, the application needs more memory.
  • Issues with the application like resource contention, long running jobs, expensive computation, …

The intention of this entry is to document a simple process to troubleshoot high CPU usage in Java apps once it has been observed that it’s not related to GC.

2. FIND THE JVM PROCESS

ps aux | grep java
tomcat   26551  128 30.9 3946348 1215360 ?     Sl   13:43  26:21 /usr/java/latest/bin/java ........ org.apache.catalina.startup.Bootstrap start

3. GENERATE THREAD DUMP OF THE JAVA MAIN THREAD FOUND IN 2.

JVM main thread = Linux process, other Java threads = Lightweight process (LWP)

jstack 26551 > threaddump.log

4. FIND SUSPECT LIGHTWEIGHT PROCESSES (LWPs) WITH HIGH CPU USAGE

top -H
top - 13:59:28 up 29 days, 11:15,  1 user,  load average: 5.15, 4.75, 3.38
Tasks: 512 total,   8 running, 504 sleeping,   0 stopped,   0 zombie
Cpu0  : 84.9%us,  9.4%sy,  0.0%ni,  3.7%id,  0.7%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu1  : 84.7%us,  8.0%sy,  0.0%ni,  5.0%id,  0.7%wa,  0.0%hi,  1.7%si,  0.0%st
Mem:   3924512k total,  3784340k used,   140172k free,   148876k buffers
Swap:  2093052k total,        0k used,  2093052k free,  1995456k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
26605 tomcat    20   0 3853m 1.1g  12m R 32.2 29.8   3:10.85 java
26602 tomcat    20   0 3853m 1.1g  12m R 14.1 29.8   1:26.46 java
26771 tomcat    20   0 3853m 1.1g  12m S  2.6 29.8   0:11.74 java
26558 tomcat    20   0 3853m 1.1g  12m S  2.3 29.8   0:06.19 java
26742 tomcat    20   0 3853m 1.1g  12m S  2.3 29.8   0:11.74 java
26792 tomcat    20   0 3853m 1.1g  12m S  2.3 29.8   0:11.82 java
26555 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:06.07 java
26556 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:05.95 java
26743 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.63 java
26744 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.89 java
26748 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.60 java
26770 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.76 java
26772 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.67 java
26776 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.51 java
26794 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.70 java
26795 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.91 java
26796 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.53 java
26800 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.70 java
26806 tomcat    20   0 3853m 1.1g  12m S  2.0 29.8   0:11.56 java

It can be observed there are two LWPs using 32% and 14% of the CPU where the rest stays around the same percentage.

5. CONVERT LWP IDs FROM DECIMAL TO HEXADECIMAL FOR PIDs WITH HIGH CPU USAGE

Decimal Hexadecimal
26605 67ED
26602 67EA

6. OPEN THREADDUMP FROM 3. AND LOCATE HEX VALUES 0x67ed and 0x67ea

....
"Dispatcher-Thread-5" daemon prio=10 tid=0x00007faee4862800 nid=0x67ed runnable [0x00007faf0f8ea000]
   java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Start.match(Pattern.java:3408)
    at java.util.regex.Matcher.search(Matcher.java:1199)
    at java.util.regex.Matcher.find(Matcher.java:592)
    at java.util.regex.Matcher.replaceAll(Matcher.java:902)
    at java.lang.String.replaceAll(String.java:2162)
    at com.xxxxx.xxxxx.common.log4j.layout.PCIPatternLayout.format(PCIPatternLayout.java:122)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302)
    at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:263)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    - locked <0x0000000781277710> (a com.xxxxxx.xxxxxx.common.pci.log4j.DateRollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.AsyncAppender$Dispatcher.run(AsyncAppender.java:583)
    - locked <0x0000000781276590> (a org.apache.log4j.helpers.AppenderAttachableImpl)
    at java.lang.Thread.run(Thread.java:745)

...
"Dispatcher-Thread-2" daemon prio=10 tid=0x00007faee47b9000 nid=0x67ea runnable [0x00007faf0fbed000]
   java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
    at java.util.regex.Pattern$Start.match(Pattern.java:3408)
    at java.util.regex.Matcher.search(Matcher.java:1199)
    at java.util.regex.Matcher.find(Matcher.java:592)
    at java.util.regex.Matcher.replaceAll(Matcher.java:902)
    at java.lang.String.replaceAll(String.java:2162)
    at com.xxxxxx.xxxxxx.common.log4j.layout.PCIPatternLayout.format(PCIPatternLayout.java:122)
    at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302)
    at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:263)
    at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    - locked <0x000000078126a9b0> (a org.apache.log4j.RollingFileAppender)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.AsyncAppender$Dispatcher.run(AsyncAppender.java:583)
    - locked <0x000000078126a968> (a org.apache.log4j.helpers.AppenderAttachableImpl)
    at java.lang.Thread.run(Thread.java:745)

7. TROUBLESHOOT

From previous stack trace samples, seems high CPU usage is related to log4j async appender, in this case very likely involving PCIPatternLayout.format() method.

8. CONCLUSION

In both instances I was involved with such troubleshooting, the solution was to uplift snmp4j to the latest version at the time of the issue was found and to remove custom layout from log4j appender configuration file for this application.

If you found this post helpful and would like to receive updates when content like this is published, sign up to the newsletter.