JCrete2013:Your profiler is lying to you
|Your profiler is lying to you|
|Convenor: Kirk Pepperdine|
Talk motivated by audience having good experience with profiler and reports of tuning by folklore.
We have to be critical with profiler results. One good approach is to know what to look for *before* actually starting a profiler. There is no profiler that is not lying. For instance, Kirk ran 5 profilers on an application, 2 of which agreed on the main issue and the 3 other reporting other problems.
Some profilers comes with a default configuration that hides the problem you are looking for. Even with a proper environment, the right filters and the right profiler, it can still lie to you.
Example of lie : Sampling requires a safepoint, so threads can run into troubles without reporting them in the profiler if they don't run into a safepoint. No one actually knows exactly how often code gets salted with safepoints. They are put in obvious places like I/O operations, JNI but it is possible to completely freeze a JVM with a code that has no safepoint.
Profiler use the same safepoints than GC and JIT-ing, biaised locking, deoptimizing, stacktrace generation... Now the tricky part is when you take a thread dump and have two threads locking the same monitor. It means they have been stopped at different times, but since stacktrace generation requires safepoints, how can we explain that ?
Turns on memory profiler with every CPU profiling operation so that allocation spots are more visible.
It is really hard to understand what is going on in a system without remembering queuing theory. There is no silver bullet for system tuning, you have to know the constraining resources of your system to size it properly. Before, we could define the number of threads in a system based on the number of CPU/cores. Today, a system is often constrained by network operations, not by CPU power.
Microbenchmarking is dangerous if you monitor a piece of code that saturates CPU, because you might be looking at a wrong spot. However, you can benchmark you infrastructure to know your bandwidth, to estimate the resources you can use.
At the end of the day, profiling is just about collecting data and statically analyzing it. You want the bad data points to be weighted so that they are more visible.
Our code is reshaped, inlined, optimized, and still, we get a perfect stacktrace of our original code everytime. So we do not get information about the real code that is executed.
The next profiling steps consists in looking at the CPU counters and try to correlate them with program execution. And Intel VTune does not support Java properly anymore. We lack some of the tooling C has for that level of details. And the JVM lacks hooks to get low level information like String table size, safepointing operations or code cache utilization.
Weak generational hypothesis is not really representing our applications anymore. Strong generational hypothesis can be applied for cache-intensive applications, which means GC strategies might also be invalid. Currently, Zing's C4 is the only GC to address these issues.
Recommendations go here