Optimizing a code is a full time job. This page regroups few tips and major compiler options to get good performances.
To reach higher performances, many tutorials are available on Internet.
Before looking for hard optimization, developers should track common mistakes like bad memory management, bad IO, independent operations inside loops, stripping, etc. Algorithms used should also be checked.
I try to keep these pages up to date, but some flags may be deprecated.
The order for a good optimization is :
Algorithm optimization (thinking future parallelization) ⇒ Code optimization ⇒ Parallelization
http://wiki.gentoo.org/wiki/GCC_optimization/en
https://software.intel.com/en-us/articles/step-by-step-optimizing-with-intel-c-compiler
To get all performances from compilers, and considering you are compiling on the same computer architecture (CP, MB, etc) you are running calculations, use the following options :
Standard :
-O3 -march=native -mtune=native
Hard optimization (use carefully, may slow down or give wrong results) :
-O4 -ffast-math -fforce-addr -fstrength-reduce -frerun-cse-after-loop -fexpensive-optimizations -fcaller-saves -funroll-loops -funroll-all-loops -fno-rerun-loop-opt
-O3 -fast -xHost
If you get problems with -fast (probably because of static missing libraries), replace with -O3 -xHost -no-prec-div -ipo. If you still have problems (linking), replace -ipo by -ip. At the end, -O3 -xHost is enough if all other flags do not work properly.
Warning: if you need precision (like with long computation: cfd, etc), do not use -no-prec-div, only -xHost. -no-prec-div may reduce precision of division operations.
To get informations on what is vectorized by the compiler, add :
-ftree-vectorizer-verbose=2
To change verbosity, change the number at the end (0 to 6)
To get informations on what is vectorized by the compiler, add :
Note: vec-report is now deprecated. Use qopt-report. Report is saved in an optrpt file.
-qopt-report=1
To change verbosity, add a number at the end (0 to 5) use :
-qopt-report=5
To get more info on SIMD used, user can use -fcode-asm -Faasm.s to get assembly language used by compiler:
ifort -O3 -qopt-report=1 test.f90 -fcode-asm -Faasm.s
Then take a look in asm.s. For example, SSE2 instructions would be:
009a5 f2 44 0f 58 c7 addsd %xmm7, %xmm8 #test.f90:35.83 009aa f2 44 0f 59 c0 mulsd %xmm0, %xmm8 #test.f90:35.83
And if you used -xHost or similar optimization options, AVX and FMA may show up:
00942 c4 42 e9 a9 d8 vfmadd213sd %xmm8, %xmm2, %xmm11 #test.f90:34.25 00991 c5 4d 58 0b vaddpd (%rbx), %ymm6, %ymm9 #test.f90:35.61