JCrete2013:Your profiler is lying to you

From WikiEducator
Jump to: navigation, search
Your profiler is lying to you
Convenor: Kirk Pepperdine
  • Marek Nowicki
  • Pierre Laporte
  • Angelo D'Agnano
  • Dmitry Vyazelenko
  • Chris Newland
  • Stelios Ntilis
  • ...
Day 2, Session 2 - Your profiler is lying to you - JCrete2013

Free content media streamed from Wikimedia Commons

Download: .ogg

Talk motivated by audience having good experience with profiler and reports of tuning by folklore.

When you are profiling, it implies that you are benchmarking (no one wants to profile production). And the problem is that a profiler will always find something wrong in your application.

We have to be critical with profiler results. One good approach is to know what to look for *before* actually starting a profiler. There is no profiler that is not lying. For instance, Kirk ran 5 profilers on an application, 2 of which agreed on the main issue and the 3 other reporting other problems.

Some profilers comes with a default configuration that hides the problem you are looking for. Even with a proper environment, the right filters and the right profiler, it can still lie to you.

Thanks to Martin, there is a mailing list that regroups interrested people. This allowed us to find words to simply describe facts, like collaborative omission we talked about yesterday.

Example of lie : Sampling requires a safepoint, so threads can run into troubles without reporting them in the profiler if they don't run into a safepoint. No one actually knows exactly how often code gets salted with safepoints. They are put in obvious places like I/O operations, JNI but it is possible to completely freeze a JVM with a code that has no safepoint.

Profiler use the same safepoints than GC and JIT-ing, biaised locking, deoptimizing, stacktrace generation... Now the tricky part is when you take a thread dump and have two threads locking the same monitor. It means they have been stopped at different times, but since stacktrace generation requires safepoints, how can we explain that ?

Turns on memory profiler with every CPU profiling operation so that allocation spots are more visible.

It is really hard to understand what is going on in a system without remembering queuing theory. There is no silver bullet for system tuning, you have to know the constraining resources of your system to size it properly. Before, we could define the number of threads in a system based on the number of CPU/cores. Today, a system is often constrained by network operations, not by CPU power.

Microbenchmarking is dangerous if you monitor a piece of code that saturates CPU, because you might be looking at a wrong spot. However, you can benchmark you infrastructure to know your bandwidth, to estimate the resources you can use.

At the end of the day, profiling is just about collecting data and statically analyzing it. You want the bad data points to be weighted so that they are more visible.

Jolokia is a nice way of accessing data on a remote server through JMX.

We have to be careful when we instrument code, since it is quite common to generate a lot more garbage by turning of profiler, thus increasing pressure on the GC and hiding the other important problems.

Our code is reshaped, inlined, optimized, and still, we get a perfect stacktrace of our original code everytime. So we do not get information about the real code that is executed.

The next profiling steps consists in looking at the CPU counters and try to correlate them with program execution. And Intel VTune does not support Java properly anymore. We lack some of the tooling C has for that level of details. And the JVM lacks hooks to get low level information like String table size, safepointing operations or code cache utilization.

The PermGen going away might also be a huge problem in the coming years, because part of it is going to the heap, and part of it in meta space. The class loading model is not good at all but was contained in Perm, now it is being released in the wild. So with a class leak, the problem space will grow but not be reported by heap reporting.

Weak generational hypothesis is not really representing our applications anymore. Strong generational hypothesis can be applied for cache-intensive applications, which means GC strategies might also be invalid. Currently, Zing's C4 is the only GC to address these issues.


Recommendations go here

  • ...
  • ...