Benchmarking synchronous and asynchronous logging

Log4j 1.2 as well as logback have supported asynchronous logging for many years by the way of AsyncAppender. This appender essentially collects newly created logging events, as produced by the application, into a circular buffer. The events in this circular buffer are then processed by a dedicated worker thread which writes the events to their destination, be it a file, a remote server or a database.

Thus, from the application's point of view, the cost of logging is reduced to placing a logging event in a buffer local to the JVM. Such a handing-off operation can be performed with a throughput in the ballpark of roughly 1'000'000 operations per second or a service duration of a microsecond per operation. If the destination of the event is a database with a throughput of 200 operations per second, the performance gain can be tremendous. This very favorable situation occurs if the rate of logging event arrival is slower than the rate of service, that is the rate at which events are transmitted to destination and taken off the circular buffer.

Asynchronous logging is not a panacea however. It still has to obey they laws of the physical universe. In particular, in case the rate of arrival of logging events is consistently higher than the rate at which the events can be written to destination, the circular buffer within AsyncAppender will eventually become full, as infinite sized buffers do not exist in our universe, and the application will be able to hand off events to AsyncAppender's buffer only at the rate the events are actually written to destination. This is an essential limitation of asynchronous logging and all queued/buffered systems in general. By the way, computers can be viewed as a system of queues.

Here is a more concrete example: suppose for a given hardware configuration FileAppender can write to disk at the rate of 100'000 events per second, that is with a duration of 10 microseconds per event written, then, if your application logs 101'000 events per second, then the throughput observed by your application will be 100'000 events written per second. AsyncAppender's buffer will not be able to compensate for arrival of the extra 1000 events per second. We say that the AsyncAppender is overloaded by the arrival rate of logging events.

When the AsyncAppender is overloaded, your application will be paying the cost of moving events in and out of AsyncAppender's circular buffer in addition to sending events to destination. Thus, when we talk about asynchronous logging performance, we need to consider the performance of logging when the AsyncAppender is overloaded. A second consideration is whether asynchronous logging buckles under the pressure of many producer threads all contending for AsyncAppender's circular buffer.

As mentioned earlier, if the rate of event arrival is slower than the service rate, the application will only have to hand-off logging events to a circular buffer, significantly lowering the perceived cost of logging. At the same time, we must also consider the worst-case scenario where the rate of logging event arrival is higher than the service rate. Such a scenario is not that uncommon as slow appenders abound. Therefore, as a worst-case analysis, an asynchronous logging benchmark should yield information about the overhead of asynchronous logging (when overloaded) and its behavior in presence of many producer threads.

We have chosen to use FileAppender in conjunction with AsyncAppender. Not only is FileAppender the most often used appender, it is fast enough so that differences in asynchronous logging overhead can be noticeable. Had we used DBAppender, all asynchronous implementations would have the same performance with little or no noticeable differences.

Comparing log4j, log4j2 and logback

The source code for the benchmarks can be found in the logback-perf github repository. We compare the performance of log4j version 1.2.17, log4j 2.14.1 and logback 1.3.0-alpha10.

We have taken care to compare apples to apples. All tests specify the same buffer size, i.e. 256KB, for all FileAppender instances across frameworks and using the same identical pattern, that is "%d %p [%t] %c - %m%n". Lossy behavior (when the buffer is full) is turned off for logback's AsyncAppender. It is turned off by default for log4j 1.2 as well as log4j2.

Regarding the circular buffer size for AsyncAppender or log4j's LMAX ring buffer, we have chosen to leave the default values as is. By default, log4j 1.2 uses a circular buffer size of 128, log4j2 ring buffer is set to 262'144 by default and logback is set to a buffer size of 256. In principle, the much larger ring buffer should favor log4j2.

Below are the benchmark figures when ran under Windows 10, JDK 16, Intel i7-6770HQ CPU and an WDC NVMe WDS512G1XOC hard disk with an advertised (and actual) 800MB/sec sequential write throughput.

Threads log4j synchronous log4j asynchronous log4j2 synchronous log4j2 asynchronous logback 1.3.0 synchronous logback 1.3.0 asynchronous Unit
1 987.08 745.34 884.33 844.67 2,139.83 1,760.30 ops/ms
2 542.27 716.09 1,220.76 819.40 2,276.77 1,821.36 ops/ms
4 639.86 676.35 1,406.60 770.27 1,836.99 1,799.39 ops/ms
8 633.13 726.21 1,257.63 733.25 1,787.62 1,774.99 ops/ms
16 585.13 693.74 1,211.31 722.34 1,813.09 1,815.10 ops/ms
32 643.85 657.08 1,203.27 704.08 1,782.81 1,751.21 ops/ms
64 576.67 696.21 1,236.37 726.15 1,740.27 1,644.81 ops/ms

Here is the table presented as a chart.

Click to enlarge

The above results show that throughput in synchronous logging is actually higher than that of asynchronous logging. This is true across frameworks. This result can be attributed to the fact that in case of a fast appender, the extra work involved in accessing an overloaded ring buffer can be significant. In case of log4j2, this extra work is as high as 70%.

Another interesting point is that throughput degrades as the number of producer threads increase. Fortunately, the degradation is usually within a few percentage points across frameworks.

logback version 1.3-alpha10 performs about 3 times faster than log4j and about 1.6 times faster than log4j2 in case of synchronous logging. For asynchronous logging, logback 1.3 performs 2.5 faster than log4j and 2.3 times faster than log4j2.

While these points may be somewhat subtle, the not so subtle observation is that logback version 1.3-alpha10 performs about 3 times faster than log4j and about 1.6 times faster than log4j2 in case of synchronous logging. For asynchronous logging, logback 1.3 performs 2.5 faster than log4j and 2.3 times faster than log4j2.

Note that with 2'200'000 events per second, and 209 bytes written per event, logback's FileAppender is generating output at 474 MB/sec, or at 59% of the hard drive's throughput capacity of 800 MB/sec. If nothing else, this is quite a testament for the performance of the JVM.

Given that logback uses off the shelf JDK components in contrast to log4j2 which uses the LMAX Disruptor, that is highly optimized code pervasively using 64-byte cache alignment, a.k.a. mechanical sympathy, the above results can be quite the surprise.

How can code optimized for CPU cache alignment be significantly slower than code which is supposedly less optimized?

While we do not have a definite answer, it is likely that algorithmic differences account for logback's better performance. Moreover, in a multicore CPU running many threads, concurrent access requires use of the shared L3 cache, perhaps mitigating the advantages of L1 and L2 caches.

Replication of results

The results have been replicated by the author on several other computers, including an Intel i7-8565U running Windows, an Intel Xeon E5-2650L running Linux under a KVM hypervisor and an AMD A10 Micro-6700T running Linux.