程序员社区

Identifying and Eliminating Performance Bottlenecks (code profiler)

When working with large programs, even knowing where to focus our optimization efforts can be difficult. In this section
we describe how to use code profilers, analysis tools that collect performance data about a program as it executes. We also present a general principle of system optimization known as Amdahl’s law.

Program Profiling

Program profiling involves running a version of a program in which instrumentation code has been incorporated to determine how much time the different parts of the program require. It can be very useful for identifying the parts of a program we should focus on in our optimization efforts. One strength of profiling is that it can be performed while running the actual program on realistic benchmark data.

Unix systems provide the profiling program

G

P

R

O

F

GPROF

GPROF. This program generates two forms of information. First, it determines how much CPU time was spent for each of the functions in the program. Second, it computes a count of how many times each function gets called, categorized by which function performs the call. Both forms of information can be quite useful. The timings give a sense of the relative importance of the different functions in determining the overall run time. The calling information allows us to understand the dynamic behavior of the program.

Profiling with

G

P

R

O

F

GPROF

GPROF requires three steps, as shown for a C program prog.c, which runs with command line argument file.txt:

  1. The program must be compiled and linked for profiling. With gcc (and other C compilers) this involves simply including the run-time flag ‘-pg’ on the command line:
    unix> gcc -O1 -pg prog.c -o prog
  2. The program is then executed as usual:
    unix> ./prog file.txt
    It runs slightly (around a factor of 2) slower than normal, but otherwise the only difference is that it generates a file gmon.out.
  3. G

    P

    R

    O

    F

    GPROF

    GPROF is invoked to analyze the data in gmon.out.
    unix> gprof prog

The first part of the profile report lists the times spent executing the different functions, sorted in descending order. As an example, the following listing shows this part of the report for the three most time-consuming functions in a program:

% 		cumulative 		self 					self 	total
time 	  seconds 	 	seconds 	calls 		s/call 	s/call 	name
97.58 		173.05 		173.05 		1 			173.05 	173.05 	sort_words
2.36 		177.24 		4.19 		965027 		0.00 	0.00 	find_ele_rec
0.12 		177.46 		0.22 		12511031 	0.00 	0.00 	Strlen

Each row represents the time spent for all calls to some function. The first column indicates the percentage of the overall time spent on the function. The second shows the cumulative time spent by the functions up to and including the one on this row. The third shows the time spent on this particular function, and the fourth shows how many times it was called (not counting recursive calls). In our example, the function sort_words was called only once, but this single call required 173.05 seconds, while the function find_ele_rec was called 965,027 times (not including recursive calls), requiring a total of 4.19 seconds. Function Strlen computes the length of a string by calling the library function strlen. Library function calls are normally not shown in the results by gprof. Their times are usually reported as part of the function calling them. By creating the “wrapper function” Strlen, we can reliably track the calls to strlen, showing that it was called 12,511,031 times, but only requiring a total of 0.22 seconds.

The second part of the profile report shows the calling history of the functions. The following is the history for a recursive function find_ele_rec:

						  158655725 		find_ele_rec [5]
				4.19 0.02 965027/965027 	insert_string [4]
[5] 	2.4 	4.19 0.02 965027+158655725 	find_ele_rec [5]
				0.01 0.01 363039/363039 	new_ele [10]
				0.00 0.01 363039/363039 	save_string [13]
						  158655725 		find_ele_rec [5]

This history shows both the functions that called find_ele_rec, as well as the functions that it called. The first two lines show the calls to the function: 158,655,725 calls by itself recursively, and 965,027 calls by function insert_string (which is itself called 965,027 times). Function find_ele_rec in turn called two other functions, save_string and new_ele, each a total of 363,039 times.

/576

赞(0) 打赏
未经允许不得转载:IDEA激活码 » Identifying and Eliminating Performance Bottlenecks (code profiler)

一个分享Java & Python知识的社区