Debugging and Optimizing
BIG CHANGE IN PROGRESS…
This page is dedicated to debugging methods for HPC codes. These tools and tips are the results my own experience, HPC developers should know these basic options to save time in their work. All methods here provide a way to trace the related bug, which means finding the exact code line that is generating the bug.
Compilers used :
Tools used :
When developing HPC programs, bugs encountered are often the sames. Here is a list of most common bugs :
There are many other types of bugs, but these are the most common and the most easy to solve when using the appropriate tools.
How to get the exit code of a program ?
~$ gfortran myokprog.f90 ~$ ./a.out Hello world ! ~$ echo $? 0
~$ gfortran mybugprog.f90 ~$ ./a.out Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: #0 0x7FFC993C87D7 #1 0x7FFC993C8DDE #2 0x7FFC9901FC2F Segmentation fault (core dumped) $ echo $? 139
Most of the time, these compilation options will find your bug :
Compiler | Compiler options |
---|---|
gfortran | -Wuninitialized -O -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid -fbounds-check -fimplicit-none -ftrapv |
gcc | -g -Wall |
ifort | -g -traceback -fpe0 -check all -ftrapuv -fp-stack-check -warn all -no-ftz |
icc | Test 1 : -g -traceback -check=uninit -fp-stack-check -no-ftz Test 2 : -g -traceback -check-pointers=rw |
If C code, try FPE strategy (see below).
If not enough, compile with :
Compiler | Compiler options |
---|---|
gfortran | -g -fbacktrace |
gcc | -g |
ifort | -g -traceback |
icc | -g -traceback |
And launch the program with valgrind :
~$ valgrind myprog.exe
Most of the time it will get the error. If not, then see the chart below.
There are three types of FPE :
Behavior : FPE will not generate an error at runtime or at compilation time (GCC/INTEL).
Compiler | Way to trace bug |
---|---|
gfortran | Compiler flags : -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid. The fpe will be explicitly displayed at runtime. |
ifort | Compiler flags : -g -traceback -fpe0. The fpe will be explicitly displayed at runtime. |
Example :
program myprog implicit none real(8) :: d1,d2,d3 d2 = 10.0d0 d3 = 0.0d0 d1 = d2 / d3 end program myprog
~$ gfortran -g -fbacktrace -ffpe-trap=zero,underflow,overflow,invalid myprog.f90 ~$ ./a.out Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7FC0FE8877D7 #1 0x7FC0FE887DDE #2 0x7FC0FE4DEC2F #3 0x4006DD in myprog at myprog.f90:7 Floating point exception (core dumped)
Bug is at line 7.
Compiler | Way to trace bug |
---|---|
gcc and icc | Add #include <fenv.h> in the main source file, then use feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW); juste after main. Compiler flags : -g. The fpe will generate a floating point error at runtime. Then use gdb to get informations on the code line generating the fpe. |
Example :
#include <fenv.h> int main(int argc, char **argv) { feenableexcept(FE_DIVBYZERO| FE_INVALID|FE_OVERFLOW); double d1,d2,d3; d2 = 10.0; d3 = 0.0; d1 = d2 / d3; }
~$ gcc -g myfile.c -lm ~$ ./a.out Floating point exception (core dumped) ~$ gdb a.out (gdb) run Starting program: /home/spehn/a.out Program received signal SIGFPE, Arithmetic exception. 0x0000000000400637 in main (argc=1, argv=0x7fffffffdf78) at myfile.c:9 9 d1 = d2 / d3; (gdb)
Bug is at line 9.
When you try to read a non initialized variable. The program may not stop, and all following calculations will be based on a random value. This is common with MPI programs (Ghosts, etc).
Three main types of initialized variables :
Behavior :
Memcheck of Valgrind will let the program run and use uninitialized values, keeping track of these operations. It will only complain when a variable “goes out” of the program (printing in the terminal, writing in a file, etc). The error will be indicated at the line of this print/write. To get more informations on the variable uninitialized, use --track-origins=yes as Valgrind flag.
Compiler | Way to trace bug |
---|---|
gfortran | - static variable : Compiler options : -Wuninitialized -O -g -fbacktrace. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- dynamic variable : Compiler options : -g -fbacktrace. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. | |
ifort | - static variable : Compiler options : -check all. The error will be explicitly displayed at runtime. Possibility to replace all uninitialized values by a huge number, use -ftrapuv |
- dynamic variable : Compiler options : -g -traceback. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
program myprog implicit none real(8) :: d1,d2 d1 = d2*10.0d0 end program myprog
~$ gfortran -Wuninitialized -g -fbacktrace myprog.f90 myprog.f90: In function ‘myprog’: myprog.f90:6:0: warning: ‘d2’ is used uninitialized in this function [-Wuninitialized] d1 = d2*10.0d0 ^ ~$ ifort -fpp -Duninitstatic myprog.f90 -g -check all ~$ ./a.out forrtl: severe (193): Run-Time Check Failure. The variable 'myprog_$D2' is being used without being defined Image PC Routine Line Source a.out 0000000000402336 Unknown Unknown Unknown libc.so.6 00007F3785537EC5 Unknown Unknown Unknown a.out 0000000000402229 Unknown Unknown Unknown
Error is coming from variable D2. Adding -traceback would provide line information.
program myprog real(8), allocatable, dimension(:) :: d1,d2 allocate(d1(1:10), d2(1:10)) d1(3) = d2(4)*10.0d0 print *,d1(3),d2(4) deallocate(d1) end program myprog
~$ ifort myprog.f90 -g -traceback ~$ valgrind --track-origins=yes ./a.out [...] ==21655== Conditional jump or move depends on uninitialised value(s) ==21655== at 0x448595: cvt_ieee_t_to_text_ex (in /home/sphen/Downloads/a.out) ==21655== by 0x426F22: for__format_value (in /home/sphen/Downloads/a.out) ==21655== by 0x40AD5A: for_write_seq_lis_xmit (in /home/sphen/Downloads/a.out) ==21655== by 0x4025C6: MAIN__ (myprog.f90:7) ==21655== by 0x402335: main (in /home/sphen/Downloads/a.out) ==21655== Uninitialised value was created by a heap allocation ==21655== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==21655== by 0x406518: for_alloc_allocatable (in /home/sphen/Downloads/a.out) ==21655== by 0x4024C5: MAIN__ (myprog.f90:5) ==21655== by 0x402335: main (in /home/sphen/Downloads/a.out) [...]
Error is at line 7 and variable was created at line 5.
Compiler | Way to trace bug |
---|---|
gcc | - static variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- dynamic variable : Compiler options : -g. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” To get more informations, use gdb and ask backtrace. |
|
icc | - static variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
- dynamic variable : Compiler options : -g -traceback. Use Valgrind --track-origins=yes. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
|
- not allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gfortran | - free a non allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. | |
- not freed memory : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
ifort | - free a non allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -traceback. The error will be explicitly displayed at runtime. | |
- not freed memory : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gcc | - free a non allocated variable : Compiler options : -Wuninitialized or -Wall. Will display a warning at compilation time. To get more informations, use Valgrind. The error will be a “Conditional jump or move depends on uninitialized value(s)” |
- allocate an already allocated variable : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
- not freed memory : Compiler options : -g -fbacktrace. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
icc | - free a non allocated variable : Compiler options : -Wuninitialized. Will display a warning at compilation time. -g -traceback -check=uninit. The error will be explicitly displayed at runtime. |
- allocate an already allocated variable : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
|
- not freed memory : Compiler options : -g -traceback. Use Valgrind --leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace -fbounds-check. The error will be explicitly displayed at runtime. |
ifort | Compiler options : -g -traceback -check all (or -check bounds). The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind, the error will be a “Invalid read/write of size 8/16”. Or patch gcc and recompile it with bounds checking (http://sourceforge.net/projects/boundschecking/) |
icc | Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime. Warning : check-pointers=rw makes all other debugging options not working when activated, be careful. |
IO errors are often very explicit. No need to use a debugging tool. However, Valgrind and fpe options can detect some related errors (bad reading = bad initialized value or = fpe, etc.)
Do not forget to set -g -fbacktrace (gfortran) or -g -traceback (icc/ifort) to get useful error information.
Simply be careful by securing all read/write (get output code and check it).
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
ifort | Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
icc | Compiler options : -g -traceback. Use Valgrind –leak-check=full. Look for LEAK SUMMARY, definitely lost. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
ifort | Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
icc | Compiler options : -g -traceback. Use Valgrind. Look for “Stack overflow in thread X” or “Access not within mapped region”. gdb will catch it with backtrace but not a lot of informations. |
Compiler | Way to trace bug |
---|---|
gfortran | Compiler options : -g -fbacktrace. The error will be explicitly displayed at runtime. |
ifort | Compiler options : -g -traceback. The error will be explicitly displayed at runtime. |
Compiler | Way to trace bug |
---|---|
gcc | Compiler options : -g. Use gdb. Ask for backtrace after error, lot of informations. |
icc | Compiler options : -g -traceback -check-pointers=rw. The error will be explicitly displayed at runtime. Warning : check-pointers=rw makes all other debugging options not working when activated, be careful. |
This page is dedicated to debugging tools and options, and to optimizing tools and options. Of course there are other solutions, but I found these extremely useful.
Note that a code should be tested with a debugger at each new implementation/modification ! And one should never try to optimize a code without debugging first.
I will start from scratch, considering you do not know the basis.
There are many ways of debugging. The most common way is to print “hello1”, “hello2”, etc everywhere in the code, and see which was the last “hello” and by iteration, converging to the bug. This “brute force” method can be used in some cases, but require a lot of time, and some errors (memory errors most of the time) appears randomly, making them difficult to locate.
First, backup your code. When debugging, you may do bad things.
Then the best approach is to use an iterative way of methods and tools :
In any cases: do not try to debug more than 3 hours, and take some rest every hours. Even if you think so, you will not be efficient, you will make the code worse, you will correct your error but add others, and become crazy. Debug the morning, and do something else the afternoon.
Before using a debugger or an optimizing tool, the first things to use are Compilers options (also called flags). They are sufficient for more than 90 percent bugs and the optimization level is enough for most codes serial (i.e. non multithread/multiprocess) codes.
The compilation flags I often use :
Debug :
-g -Wuninitialized -O -fbacktrace -fbounds-check -ffpe-trap=zero,underflow,overflow,invalid -ftrapv -fimplicit-none -fno-automatic
Preprocessing :
-cpp -DMYPREPRO
Light Debug :
-g -debug -traceback -fp-stack-check
Hard Debug :
-g -debug -traceback -check all -implicitnone -warn all -fpe0 -fp-stack-check -ftrapuv -heap-arrays -gen-interface -warn interface
Preprocessing :
-fpp -DMYPREPRO
Important : In order to use debuggers tools that we will see after, gcc/g++/gfortran programs need to be compiled using the -g flag (and no others debug flags) and the optimization flags desired. In the same way, icc/icpc/ifort programs need to be compiled using the -g -traceback flags and the optimization flags desired.
Consider this program in fortran (fortran is very similar to C) :
Program Bug implicit none real, allocatable, dimension(:) :: tab allocate(tab(1:10)) tab(:) = 1.0 call Buggy() deallocate(tab) contains Subroutine Buggy() print *, tab(11) End Subroutine Buggy End Program Bug
If compiled using no options or optim options and then execute :
gfortran bug.f90
./a.out
You get, with no errors or warnings :
1.85398793E-40
The same using ifort compiler. You know this result is absurd, but you want to locate the error. When compiled with debug options :
gfortran bug.f90 -g -Wuninitialized -O -fbacktrace -fbounds-check -ffpe-trap=zero,underflow,overflow,invalid -ftrapv -fimplicit-none -fno-automatic ./a.out
You get :
At line 15 of file bug.f90 Fortran runtime error: Array reference out of bounds for array 'tab', upper bound of dimension 1 exceeded (11 > 10) Backtrace for this error: + function buggy (0x400A70) at line 15 of file bug.f90 + function bug (0x400BE4) at line 9 of file bug.f90 + /lib/libc.so.6(__libc_start_main+0xfd) [0x7fa49f04ac4d]
Which is simple to use: you made an error, line 15 of file bug.f90, the array tab has been called with 11 when it's size is not more than 10 (in fortran, arrays start at 1).
Now, using ifort :
ifort bug.f90 -g -debug -traceback -check all -implicitnone -warn all -fpe0 -fp-stack-check -ftrapuv -heap-arrays -gen-interface -warn interface ./a.out
forrtl: severe (408): fort: (2): Subscript #1 of the array TAB has value 11 which is greater than the upper bound of 10 Image PC Routine Line Source a.out 000000000046AA2E Unknown Unknown Unknown a.out 00000000004694C6 Unknown Unknown Unknown a.out 0000000000422242 Unknown Unknown Unknown a.out 0000000000404AFB Unknown Unknown Unknown a.out 0000000000405011 Unknown Unknown Unknown a.out 000000000040356E bug_IP_buggy_ 15 bug.f90 a.out 0000000000403252 MAIN__ 8 bug.f90 a.out 0000000000402B8C Unknown Unknown Unknown libc.so.6 00007F9CBFFFBEA5 Unknown Unknown Unknown a.out 0000000000402A89 Unknown Unknown Unknown
Which is also easy to understand (using line and source, you can see that main call buggy at line 8, and that buggy created the error at line 15).
Using these methods, you can locate most of bugs.
If it is not enough, or if your bug disappear using these options (can append), then you may need to use a debugger.
Some says gdb is better, others valgrind is better. In fact, both are good. I am just used to valgrind, so I will present this one. Note that valgrind can also be used to profile the code, check memory leaks, test cache use, etc. We will see that in the optimisation section. Note also that valgrind support MPI implementation if built with it. Last point: valgrind will slow down A LOT your execution and is extremely talkative. If the bug appears after a long time of run, and that you know in which part of the code it occurs, you may use special flags to tell valgrind monitor only this part (see valgrind documentation).
Let's re-use our previous code. To use valgrind, you have to compile using -g option, combined with optimisation flags if your code use them in normal time.
gfortran bug.f90 -g -O3 valgrind ./a.out ==25150== Memcheck, a memory error detector ==25150== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==25150== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==25150== Command: ./a.out ==25150== ==25150== Invalid read of size 4 ==25150== at 0x4F13EF0: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0) ==25150== by 0x4F15AAE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0) ==25150== by 0x4F165FE: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0) ==25150== by 0x40093B: MAIN__ (bug.f90:15) ==25150== by 0x4007AC: main (bug.f90:9) ==25150== Address 0x5c634e8 is 0 bytes after a block of size 40 alloc'd ==25150== at 0x4C2CD7B: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==25150== by 0x4008B1: MAIN__ (bug.f90:6) ==25150== by 0x4007AC: main (bug.f90:9) ==25150== 0.00000000 ==25150== ==25150== HEAP SUMMARY: ==25150== in use at exit: 0 bytes in 0 blocks ==25150== total heap usage: 23 allocs, 23 frees, 12,076 bytes allocated ==25150== ==25150== All heap blocks were freed -- no leaks are possible ==25150== ==25150== For counts of detected and suppressed errors, rerun with: -v ==25150== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
OK, more difficult to understand, but valgrind locate near everything and is made for more advanced users, you will have to deal with it.
Some error report using valgrind are :
Last things on valgrind :
To use it in parallel, using MPI :
mpirun -np 4 valgrind ./myprog.exe
Note that valgrind will display many identic errors, even when there are only one (because you may repeat this error a lot of time). Try to find the first error, and then use this message as a starting point.
But ! Some libs (like MPI libs, etc) also contain bugs, often at start up, and valgrind will display them. I strongly suggest you add a print at the beginning of your code (at first line), and then when analysing valgrind output, do not consider errors before this print.
To get all performances from compilers, and considering you are compiling on the same computer architecture (CP, MB, etc) you are running calculations, use the following options :
Standard :
-O2 -march=native -mtune=native
Hard optimization (use carefully, may slow down or gave wrong results) :
-O4 -ffast-math -fforce-addr -fstrength-reduce -frerun-cse-after-loop -fexpensive-optimizations -fcaller-saves -funroll-loops -funroll-all-loops -fno-rerun-loop-opt
-O2 -fast
If you get problems with -fast (probably because of static missing libraries), replace with -xHost -no-prec-div -ipo. If you still have problems (linking), replace -ipo by -ip.