From WikiEducator
Jump to: navigation, search
Convenor: Martin Thompson

We want short feedback loops with our CI platforms. But when it is related to capacity planning, can we have some sort of continuous estimating the amount of resources that will be needed to handle a certain load.

Problem : it increases development time. But at the end of the day, it speeds up development by - helping the developpers build another mental model of the application - fixing performance issues earlier - do proper capacity planning

So how can we make sure that everyone in a team build this right mental model ? How can we make sure that collective ownership of the code also applies for performance ?

Most value of an application is in the one that changes often, and this is what we always work on. So we should focus our performance tests on the parts that actually makes us gain money.

Onion approach : - profile the application from the outside, with a high-level mental model - go deeper after

Use build and performance wall : - Build wall as usual - Performance wall telling you if you meet your requirements, and TRENDS (are you trending up, right below the threshold ?)

Be careful not to start premature optimization !

Also, this does assume that performance testing is correct : - Diverse dataset - Production-like volume of data - Production-like scenarios - Do not microbenchmark

For instance, if you test your system's with only 2 clients at a time and define throughput = 1/response time, things may fit in your L3 cache, but once in production, your application breaks.

Also, when you fix a problem, be sure to make sure it increases the overall performance, and not create an even worse bottleneck. You have to be able to back off if you it was not a good decision.

For instance, if you have 3 problems and fixing #1 and #2 makes #3 vanish, then life is good. However, if fixing #1 and #2 puts so much load on #3 that the system breaks, then back off if #3 is too expensive to fix.

People are too specialized, we have to know a bit of everything (Hardware, OS, JVM, virtualization, frameworks).

Missing abstraction (business idea that you do not find at a single place in the code), feature envy, lack of cohesion in code bring bad performance, but were introduced for the sake of performance ! Or even worse, it makes the system fragile and then any change breaks multiple features.

High cohesive, low coupling, small composeable methods and most of the time, the code will be fast thanks to the JIT taking the good decisions. If even you cannot understand a method, it is very unlikely that the JIT compiler will be able to do something about it.

Card Marking (in GC cycles to avoid scanning the whole heap) often involve false sharing between threads and ties threads together, so you have to to enable conditional card marking. Fork/Join framework is going to suck since conditional card marking is disabled by default.

Use Network probes in production to have an idea of the shape of data that is transferred and of the packets sizes. On test platforms, make sure these probes generates requests that can be replayed.

Know that the load injectors suck, find one that you know, and compensate the suckiness. Because it will fail you at some point.

Design your application to know your queues length, your resource utilization, everything that is related to how it performs. Then build reports on top of that.

The 'Metrics' project released by Yammer is actually pretty good at giving a lot of valuable information about a system. Netflix projects are also very good at that. Code Hale has done lots of talks about measuring performance of systems.

Debate about sampling. Good or bad ? Are you going to catch statically significant results before they kill your system ? Otherwise you will need a lot of infrastructure to support those measurements !


Recommendations go here

  • ...
  • ...