Perfmon2: a leap forward in performance monitoring

Abstract
This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools. 1. Introduction: Justifying performance monitoring There are multiple reasons why performance tuning of an application or a subsystem is still worth the effort, even in an era where hardware is seen as "cheap" and manpower is considered expensive. The first case is when computers are purchased for tens of millions of euros, so that even economies of just a few percent can compensate for the salaries of the people doing performance work. A second case, which is fairly recent, is when computer centres fill up to their power and thermal limits with the consequence that no more servers can be installed. If additional capacity is nevertheless needed, one may either be forced to exchange some of the equipment with more thermally-efficient hardware (if it exists) or turn to performance tuning in order to squeeze out more performance. A third case is when high-cost personnel wait for computers to complete their calculations. Percentage gains in turn-around time can then be translated directly into more efficient manpower. An additional incentive is no doubt personal pride of the software designer/programmer. Ideally, one wants performance analysis to be performed throughout the entire development cycle of an application, so that the application does not exhibit "inefficient" behaviour or excessive consumption of computing resources. 2. The initial Itanium development When the Itanium processor was designed (jointly by HP and Intel) a Performance Monitoring Unit (PMU) was added to the processor architecture as a complete and consistent facility. The PMU presented a well-defined interface to the operating system for both the programming and the corresponding data collection. For counting events, a vast number of counters were added - so that, for instance, every cycle in the execution pipelines could be accounted for or every action in the cache hierarchy could be explained. In addition, several advanced features, such as Branch Trace Buffers, were introduced. When Linux was initially ported to the Itanium in the late nineties (1), Stéphane Eranian from HP Labs took on the task of developing the required software to exploit the PMU in order to monitor the