|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
In software engineering, performance analysis, more commonly profiling, is the investigation of a program's behavior using information gathered as the program runs (i.e. it is a form of dynamic program analysis, as opposed to static code analysis). The usual goal of performance analysis is to determine which parts of a program to optimize for speed or memory usage.
Use of profilersA profiler is a performance analysis tool that measures the behavior of a program as it runs, particularly the frequency and duration of function calls. (An instruction set simulator which is also - by necessity - a profiler can measure the totality of a programs behaviour from invokation to termination.) The output may be a stream of recorded events (a trace) or a statistical summary of the events observed (a profile) or an ongoing interaction with the hypervisor. Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters. The usage of profilers is called out in the performance engineering process. As the summation in a profile often is done related to the source code positions where the events happen, the size of measurement data is linear to the code size of the program. In contrast, the size of a (full) trace is linear to the program's execution time, making it somewhat impractical. For sequential programs, a profile is usually enough, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring the full trace to get an understanding of the problem.
HistoryPerformance analysis tools existed on IBM/360 and IBM/370 platforms from the early 1970s, usually based on timer interrupts which recorded the Program status word (PSW) at set timer intervals to detect "hot spots" in executing code. This was an early example of sampling (see below). In early 1974, Instruction Set Simulators permitted full trace and other performance monitoring features. Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. In 1982, gprof extended the concept to a complete call graph analysis (Gprof: a Call Graph Execution Profiler [1]) In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM [2]. ATOM is a platform for converting a program into its own profiler. That is, at compile time, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique, modifying a program to analyze itself, is known as "instrumentation". In 2004, both the Gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time. [3] Profiler Types based on OutputFlat profilerFlat profilers compute the average call times, from the calls, and do not breakdown the call times based on the callee or the context. Call-Graph profilerCall Graph profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved. Methods of data gatheringEvent based profilersThe programming languages listed here have event-based profilers:
Statistical profilersSome profilers operate by sampling. A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. Sampling profiles are typically less accurate and specific, but allow the target program to run at near full speed. Some profilers instrument the target program with additional instructions to collect the required information. Instrumenting the program can cause changes in the performance of the program, causing inaccurate results and heisenbugs. Instrumenting can potentially be very specific but slows down the target program as more specific information is collected. The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods. [4] Some of the most commonly used statistical profilers are GNU's gprof, Oprofile, AMD's CodeAnalyst and SGI's Pixie. Instrumentation
Hypervisor/Simulator
Simple manual techniqueA variation on the sampling technique, called "deep sampling", places less importance on accurate measurement, and more importance on achieving optimization. It requires no instrumentation and takes only a small number of random time samples (e.g. 20). This can be done by running the program under a debugger and interrupting it manually at unpredictable times using a Break key or Esc key. (These samples are taken during the time when the program is being subjectively slow. If needed, a temporary outer loop can be added, to make it run long enough to take manual samples.) At each sample, the entire call stack consisting of the program counter plus the stack of return addresses is recorded. Then the call stack samples are examined for addresses that appear on multiple samples. Any address that is on the call stack X% of the time identifies an instruction which, if removed, will save X% of running time. For example, if the program is spending X = 60% of it's time doing something wasteful, such as an unnecessary function call (whether distributed over one invocation or many), it will be seen doing that on 12 out of 20 samples, more or less. The more time is being wasted, the fewer samples are needed to see it. Since the exact instruction is indicated by the samples, there is no guesswork and no need to look elsewhere for the problem (no matter how large the software is). When it is repaired, a performance factor of 1/(1-X) is gained. In this case 1/(1-0.6) = 2.5x. The entire process can be repeated multiple times, until no more wasteful activities are captured, often resulting in substantial cumulative speedups. A name for a wasteful piece of program code is "slug" (slowness bug). It is not really a bug because it does not produce incorrect results. New programs generally contain both bugs and slugs. Bugs are usually removed during program testing, while slugs are usually not, unless performance analysis is employed during development. There are different kinds of slugs. Generally, things that could be done intentionally to make a program run longer can also occur unintentionally. One commonly accepted kind of slug is a hot spot, which is a tight inner loop where the program counter spends much of its time. For example, if one often finds at the bottom of the call stack a linear search algorithm instead of binary search, this would be a true hot spot slug. However, if another function is called in the search loop, such as string compare, then the string compare function would be found at the bottom of the stack, and the call to it in the loop would be at the next level up. In this case, the loop would still be a slug, but it would not be a hot spot. In all but the smallest programs, hot spot slugs are rare, but slugs are quite common. Data structures that are too general for the problem at hand might also impair performance. For example, if a collection of objects remains small, a simple array with linear search could be much faster than something like a "dictionary" class, complete with hash coding. With this kind of slug, the program counter is most often found in "housekeeping" such as dynamic memory allocation/de-allocation as these collections are being constructed and then 'destructed'. Another common motif is that a powerful function is written to collect a set of useful information (from a database, for example). Then that function is called multiple times, rather than taking the trouble to save the results from a prior call. A possible explanation for this could be that it is beyond a programmer's comprehension that a function call might take a million times as long to execute as an adjacent assignment statement. A contributing factor could be information hiding, in which external users of a module can be ignorant of what goes on inside it. See also
References
External links
|
| All Right Reserved © 2007, Designed by Stylish Blog. |