Thursday, 30 November 2017

Java Performance tuning


Overview
Not every application requires tuning. If an application performs as well as expected, you don't need to exert additional efforts to enhance its performance. However, it would be difficult to expect an application would reach its target performance as soon as it finishes debugging. This is when tuning is required. Regardless of the implementation language, tuning an application requires high expertise and concentration. Also, you may not use the same method for tuning a certain application to tune another application. This is because each application has its unique action and a different type of resource usage. For this reason, tuning an application requires more basic knowledge compared to the knowledge required to write an application. For example, you need knowledge on virtual machines, operating systems and computer architectures. When you focus on an application domain based on such knowledge, you can successfully tune an application.
Sometimes Java application tuning requires only changing JVM options, such as Garbage Collector, but sometimes it requires changing the application source code. Whichever method you choose, you need to monitor the process of executing the Java application first. For this reason, the issues this article will deal with are as follows:
•           How can I monitor a Java application?
•           What JVM options should I give?
•           How can I know if modifying source codes is required or not?

Knowledge Required to Tune the Performance of Java Applications

Java applications operate inside Java Virtual Machine (JVM). Therefore, to tune a Java application, you need to understand the JVM operation process. I have previously blogged about Understanding JVM Internals where you can find great insights about JVM.
The knowledge regarding the process of the operation of JVM in this article mainly refers to the knowledge of Garbage Collection (GC) and Hotspot. Although you may not be able to tune the performance of all kinds of Java applications only with the knowledge on GC or Hotspot, these two factors influence the performance of Java applications in most cases.
It is noted that from the perspective of an operating system JVM is also an application process. To make an environment in which a JVM can operate well, you should understand how an OS allocates resources to processes. This means, to tune the performance of Java applications, you should have an understanding of OS or hardware as well as JVM itself.
Another aspect is that knowledge of Java language domain is also important. It is also important to understand lock or concurrency and to be familiar with class loading or object creation.
When you carry out Java application performance tuning, you should approach it by integrating all this knowledge.

JVM distribution model

A JVM distribution model is related with making a decision on whether to operate Java applications on a single JVM or to operate them on multiple JVMs. You can decide it according to its availability, responsiveness and maintainability. When operating JVM on multiple servers, you can also decide whether to run multiple JVMs on a single server or to run a single JVM per server. For example, for each server, you can decide whether to run a single JVM using a heap of 8 GB, or to use four JVMs each using a heap of 2 GB. Of course, you can decide the number of JVMs running on a single server depending on the number of cores and the characteristics of the application. When comparing the two settings in terms of responsiveness, it might be more advantageous to use a heap of 2 GB rather than 8 GB for the same application, for it takes shorter to perform a full garbage collection when using a heap of 2 GB. If you use a heap of 8 GB, however, you can reduce the frequency of full GCs. You can also improve responsiveness by increasing the hit rate if the application uses internal cache. Therefore, you can choose a suitable distribution model by taking into account the characteristics of the application and the method to overcome the disadvantage of the model you chose for some advantages.

JVM architecture

Selecting a JVM means whether to use a 32-bit JVM or a 64-bit JVM. Under the same conditions, you had better choose a 32-bit JVM. This is because a 32-bit JVM performs better than a 64-bit JVM. However, the maximum logical heap size of a 32-bit JVM is 4 GB. (However, actual allocatable size for both 32-bit OS and 64-bit OS is 2-3 GB.) It is appropriate to use a 64-bit JVM when a heap size larger than this is required.


The next step is to run the application and to measure its performance. This process includes tuning GC, changing OS settings and modifying codes. For these tasks, you can use a system monitoring tool or a profiling tool.
It should be noted that tuning for responsiveness and tuning for throughput could be different approaches. Responsiveness will be reduced if stop-the-world occurs from time to time, for example, for a full garbage collection despite a large amount of throughput per unit time. You also need to consider that a trade-off could occur. Such trade-off could occur not only between responsiveness and throughput. You may need to use more CPU resources to reduce memory usage or put up with reduction in responsiveness or throughput. As opposite cases could likewise occur, you need to approach it according to the priority.
 
JVM Options

I will explain how to specify suitable JVM options mainly for a web application server. Despite not being applied to every case, the best GC algorithm, especially for web server applications, is the Concurrent Mark Sweep GC. This is because what matters is low latency. Of course, when using the Concurrent Mark Sweep, sometimes a very long stop-the-world phenomenon could take place due to fractions. Nevertheless, this problem is likely to be resolved by adjusting the new area size or the fraction ratio.
Specifying the new area size is as important as specifying the entire heap size. You had better specify the ratio of the new area size to the entire heap size by using –XX:NewRatio or specify the desired new area size by using the –XX:NewSize option. Specifying a new area size is important because most objects cannot survive long. In web applications, most objects, except cache data, are generated when HttpResponse to HttpRequestis created. This time hardly exceeds a second. This means the life of objects does not exceed a second, either. If the new area size is not large, it should be moved to the old area to make space for newly created objects. The cost for GC for the old area is much bigger than that for the new area; therefore, it is good to set the size of the new area sufficiently.

If the new area size exceeds a certain level, however, responsiveness will be reduced. This is because the garbage collection for the new area is basically to copy data from one survivor area to another survivor area. Also, the stop-the-world phenomenon will occur even when performing GC for the new area as well as the old area. If the new area becomes bigger, the survivor area size will increase, and thus the size of the data to copy will increase as well. Given such characteristics, it is good to set a suitable new area size by referring to theNewRatio of HotSpot JVM by OS.


Measuring the Performance of Applications

The information to acquire to grasp the performance of an application is as follows:
TPS (OPS): The information required to understand the performance of an application conceptually.
Request Per Second (RPS): Strictly speaking, RPS is different from responsiveness, but you can understand it as responsiveness. Through RPS, you can check the time it takes for the user to see the result.
RPS Standard Deviation: It is necessary to induce even RPS if possible. If a deviation occurs, you need to check GC tuning or interworking systems. 
To obtain a more accurate performance result, you should measure it after warming up the application sufficiently. This is because byte code is expected to be compiled by HotSpot JIT. In general, you can measure actual performance values after applying load to a certain feature for at least 10 minutes by using nGrinder load testing tool.

Tuning in Earnest

You don't need to tune the performance of an application if the result of the execution of nGrinder meets the expectation. If the performance does not meet the expectation, you need to carry out tuning to resolve problems. Now you will see the approach by case.

In the event the Stop-the-World takes long

Long stop-the-world time could result from inappropriate GC options or incorrect implementation. You can decide the cause according to the result of a profiler or a heap dump. This means you can judge the cause after checking the type and number of objects of a heap. If you find many unnecessary objects, you had better modify source codes. If you find no particular problem in the process of creating objects, you had better simply change GC options.
To adjust GC options appropriately, you need to have GC log secured for a sufficient period of time. You need to understand in which situation the stop-the-world takes a long time. For more information on the selection of appropriate GC options, read my colleague's blog about How to Monitor Java Garbage Collection.

In the event CPU usage rate is low

When blocking time occurs, both TPS and CPU usage rate will decrease. This might result from the problem of interworking systems or concurrency. To analyze this, you can use an analysis on the result of thread dump or a profiler. For more information on thread dump analysis, read How to Analyze Java Thread Dumps.
You can conduct a very accurate lock analysis by using a commercial profiler. In most cases, however, you can obtain a satisfactory result with only the CPU analyzer in jvisualvm.

In the event CPU usage rate is high

If TPS is low but CPU usage rate is high, this is likely to result from inefficient implementation. In this case, you should find out the location of bottlenecks by using a profiler. You can analyze this by using jvisuavm, TPTP of Eclipse or JProbe.

Approach for Tuning

You are advised to use the following approach to tune applications.
First, you should check whether performance tuning is necessary. The process of performance measuring is not easy work. You are also not guaranteed to obtain a satisfactory result all the time. Therefore, if the application already meets its target performance, you don't need to invest additionally in performance.
The problem lies in only a single place. All you have to do is to fix it. The Pareto principle applies to performance tuning as well. This does not mean to emphasize that the low performance of a certain feature results necessarily from a single problem. Rather, this emphasizes that we should focus on one factor that has the biggest influence on the performance when approaching performance tuning. Thus, you could handle another problem after fixing the most important one. You are advised to try to fix just one problem at a time.
You should consider the balloon effect. You should decide what to give up to get something. You can improve responsiveness by applying cache but if the cache size increases, the time it takes to carry out a full GC will increase as well. In general, if you want a small amount of memory usage, throughput or responsiveness could be deteriorated. Thus, you need to consider what is most important and what is less important.
So far, you have read the method for Java application performance tuning. To introduce a concrete procedure for performance measurement, I had to omit some details. Nevertheless, I think this could satisfy most of the cases for tuning Java web server applications

Happy Learning

Iterator Design Pattern


Motivation
One of the most common data structures in software development is what is generic called a collection. A collection is just a grouping of some objects. They can have the same type or they can be all cast to a base type like object. A collection can be a list, an array, a tree and the examples can continue.
But what is more important is that a collection should provide a way to access its elements without exposing its internal structure. We should have a mechanism to traverse in the same way a list or an array. It doesn't matter how they are internally represented.
The idea of the iterator pattern is to take the responsibility of accessing and passing trough the objects of the collection and put it in the iterator object. The iterator object will maintain the state of the iteration, keeping track of the current item and having a way of identifying what elements are next to be iterated.

Intent
Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
The abstraction provided by the iterator pattern allows you to modify the collection implementation without making any changes outside of collection. It enables you to create a general purpose GUI component that will be able to iterate through any collection of the application.

Applicability
Use the Iterator pattern
1. to access an aggregate object's contents without exposing its internal representation.
2. to support multiple traversals of aggregate objects.
3. to provide a uniform interface for traversing different aggregate structures (that is, to support polymorphic iteration).

Benefit(s)
1. It supports variations in the traversal of an aggregate.
2. Iterators simplify the Aggregate interface.
3. More than one traversal can be pending on an aggregate.

Happy Learning.

Chain of Responsibility Design Pattern


Motivation
In writing an application of any kind, it often happens that the event generated by one object needs to be handled by another one. And, to make our work even harder, we also happen to be denied access to the object which needs to handle the event. In this case there are two possibilities: there is the beginner/lazy approach of making everything public, creating reference to every object and continuing from there and then there is the expert approach of using the Chain of Responsibility.
The Chain of Responsibility design pattern allows an object to send a command without knowing what object will receive and handle it. The request is sent from one object to another making them parts of a chain and each object in this chain can handle the command, pass it on or do both. The most usual example of a machine using the Chain of Responsibility is the vending machine coin slot: rather than having a slot for each type of coin, the machine has only one slot for all of them. The dropped coin is routed to the appropriate storage place that is determined by the receiver of the command.

Intent
It avoids attaching the sender of a request to its receiver, giving this way other objects the possibility of handling the request too.
The objects become parts of a chain and the request is sent from one object to another across the chain until one of the objects will handle it.

Applicability
Use Chain of Responsibility when
1. more than one object may handle a request, and the handler isn't known a prior. The handler should be ascertained automatically.
2. you want to issue a request to one of several objects without specifying the receiver explicitly.
3. the set of objects that can handle a request should be specified dynamically.

Benefit(s)
1. Reduced coupling.
2. Added flexibility in assigning responsibilities to objects.

Disadvantage(s)
1. Receipt isn't guaranteed.

Happy Learning.

Abstract Factory Design Pattern


Motivation
Modularization is a big issue in today's programming. Programmers all over the world are trying to avoid the idea of adding code to existing classes in order to make them support encapsulating more general information. Take the case of a information manager which manages phone number. Phone numbers have a particular rule on which they get generated depending on areas and countries. If at some point the application should be changed in order to support adding numbers form a new country, the code of the application would have to be changed and it would become more and more complicated.
In order to prevent it, the Abstract Factory design pattern is used. Using this pattern a framework is defined, which produces objects that follow a general pattern and at runtime this factory is paired with any concrete factory to produce objects that follow the pattern of a certain country. In other words, the Abstract Factory is a super-factory which creates other factories (Factory of factories).

Intent
Abstract Factory offers the interface for creating a family of related objects, without explicitly specifying their classes.

Where to use
Abstract Factory should be used when:
A system should be configured with one of multiple families of products
A system should be independent of how its products are created, composed and represented
Products from the same family should be used all together, products from different families ahould not be used together and this constraint must be ensured.
Only the product interfaces are revealed, the implementations remains hidden to the clients.

Common Usage
Examples of abstract factories:
java.awt.Toolkit - the abstract superclass of all actual implementations of the Abstract Window Toolkit. Subclasses of Toolkit are used to bind the various components to particular native toolkit implementations(Java AWT).
javax.swing.LookAndFeel - an abstract swing factory to swithct between several look and feel for the components displayed(Java Swing).
java.sql.Connection - an abstract factory which create Statements, PreparedStatements, CallableStatements,... for each database flavor.
Example: Gui Look & Feel in Java

Applicability
Use the Abstract Factory pattern when
1. a system should be independent of how its products are created, composed, and represented.
2. a system should be configured with one of multiple families of products.
3. a family of related product objects is designed to be used together, and you need to enforce this constraint.
4. you want to provide a class library of products, and you want to reveal just their interfaces, not their implementations.

Benefit(s)
1. It isolates concrete classes.
2. It makes exchanging product families easy.
3. It promotes consistency among products.

Disadvantage(s)
1. Supporting new kinds of products is difficult.

Happy Learning.

Factory Design Pattern


Motivation
The Factory Design Pattern is probably the most used design pattern in modern programming languages like Java and C#. It comes in different variants and implementations. If you are searching for it, most likely, you'll find references about the GoF patterns: Factory Method and Abstract Factory.

Intent
creates objects without exposing the instantiation logic to the client.
refers to the newly created object through a common interface

Where to use
Factory pattern should be used when: - a framework delegate the creation of objects derived from a common superclass to the factory - we need flexibility in adding new types of objects that must be created by the class

Common Usage
Along with singleton pattern the factory is one of the most used patterns. Almost any application has some factories. Here are a some examples in java:
- factories providing an xml parser: javax.xml.parsers.DocumentBuilderFactory or javax.xml.parsers.SAXParserFactory
- java.net.URLConnection - allows users to decide which protocol to use.

Applicability
Use the Factory Method pattern when 1. a class can't anticipate the class of objects it must create. 2. a class wants its subclasses to specify the objects it creates. 3. classes delegate responsibility to one of several helper subclasses, and you want to localize the knowledge of which helper subclass is the delegate.

Benefit(s)
1. Provides hooks for subclasses.
2. Connects parallel class hierarchies.

Disadvantage(s)
A potential disadvantage of factory methods is that clients might have to subclass the Creator class just to create a particular ConcreteProduct object. Subclassing is fine when the client has to subclass the Creator class anyway, but otherwise the client now must deal with another point of evolution.


Happy Learning.

Singleton Design Pattern


Motivation
Sometimes it's important to have only one instance for a class. For example, in a system there should be only one window manager (or only a file system or only a print spooler). Usually singletons are used for centralized management of internal or external resources and they provide a global point of access to themselves.
The singleton pattern is one of the simplest design patterns: it involves only one class which is responsible to instantiate itself, to make sure it creates not more than one instance; in the same time it provides a global point of access to that instance. In this case the same instance can be used from everywhere, being impossible to invoke directly the constructor each time.

Intent
Ensure that only one instance of a class is created.
Provide a global point of access to the object.

Where to use
Singleton pattern should be used when we must ensure that only one instance of a class is created and when the instance must be available through all the code. A special care should be taken in multi-threading environments when multiple threads must access the same resources through the same singleton object.

Common Usage
There are many common situations when singleton pattern is used:
- Logger Classes
- Configuration Classes
- Accessing resources in shared mode
- Other design patterns implemented as Singletons: Factories and Abstract Factories, Builder, Prototype

Example: Lazy Singleton in Java, Early Singleton in Java

Applicability
Use the Singleton pattern when 1. there must be exactly one instance of a class, and it must be accessible to clients from a wellknown access point. 2. when the sole instance should be extensible by subclassing, and clients should be able to use an extended instance without modifying their code.

Benefit(s)
1. Controlled access to sole instance.
2. Reduced name space.
3. Permits refinement of operations and representation.
4. Permits a variable number of instances.
5. More flexible than class operations.

Happy Learning.