====== Using HDF5-PHDF5 for data storage ======
\\
The aim of this tutorial is to help new HDF5 users to find how to use this library, from the source package, through binaries building, to serial and parallel (MPI) write and read.
The following knowledge are required to follow this tutorial :
* Linux bash basis, to compile and install library
* Fortran 90 basis, to use HDF5 (C HDF5 is similar; just remove the "_f" at end of subroutine.)
* MPI basis, to use MPI HDF5 write/read
Note that you can find the official and full documentation here : http://www.hdfgroup.org/HDF5/doc/UG/index.html
\\
===== Why HDF5 ? =====
\\
* It's a standard, which means it can be read directly by multiple of data processing software (Paraview, TechPlot, etc).
* It helps to structure data, with the ability to group them like with folders, and to tag them with metadata.
* It has the ability to compress data on the fly, resulting in significant speedup for some I/O operations.
* It can be used in parallel (using MPIIO), meaning that all MPI threads can write/read in the same file at the same time, resulting in significant speedup for large I/O operations.
* It improves super calculator stability by avoiding multiple files.
* However, it is difficult to implement, and the documentation is sometime difficult to read.
* Compression is not supported with MPI I/O operations.
* Poor performances if using MPIIO without tuning, prefer posix and post-process to HDF5 if you do not know the machine tuning parameters.
\\
===== Install HDF5 =====
\\
==== Preparing sources ====
\\
You can download sources from the HDF5 web site: http://www.hdfgroup.org/ftp/HDF5/current/src/
We will suppose you do not have root privilege, so library will need to be installed in $HOME directory.
cd
mkdir $HOME/HDF5-Build
mkdir $HOME/HDF5
cd $HOME/HDF5-Build
wget http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.10/src/hdf5-1.8.10.tar.gz
tar xvzf hdf5-1.8.10.tar.gz
cd hdf5-1.8.10
\\
==== Compile, check and install ====
\\
First thing to do is to check which compiler we are using to compile the library. Note that after that, the library will be dependent of this compiler, and the code using this library will probably have to be compiled also with the same compiler.
We will consider the gcc compiler, but it is the same with intel. Just replace gcc by icc and gfortran by ifort when needed.
Check that gcc and gfortran are available, with MPI wrappers if you plan to use parallel HDF5 :
MPI :
which mpicc; which mpif90
/usr/bin/mpicc
/usr/bin/mpif90
Serial :
which gcc; which gfortran
/usr/bin/gcc
/usr/bin/gfortran
Then, let prepare the makefile.\\
MPI :
./configure CC=mpicc FC=mpif90 --prefix=$HOME/HDF5 --libdir=$HOME/usr/lib --bindir=$HOME/usr/bin --enable-parallel --enable-shared --with-pic --enable-fortran --with-zlib=/usr/lib
Serial :
./configure CC=gcc FC=gfortran --prefix=$HOME/HDF5 --libdir=$HOME/usr/lib --bindir=$HOME/usr/bin --enable-parallel --enable-shared --with-pic --enable-fortran --with-zlib=/usr/lib
For Ubuntu 12.04 or > based ubuntu systems, replace --with-zlib=/usr/lib with --with-zlib=/usr/lib/x86_64-linux-gnu.
Now, make, test (check) and install. These commands can take a lot of time to achieve. If you get an error, try to find it on internet. If you get an error with zlib, consider if you need compression in HDF5. If not, remove the --with-zlib=/usr/lib. Note also that parallel HDF5 (PHDF5) cannot use compression, no need to activate it if you plan to use MPI HDF5 only.
make
make check
make install
cd $HOME
rm -R HDF5-Build
(If you get an error at test, see in Libraries section of this website for more information).
You are done for the install. Now, you can load the library using this :
export LD_LIBRARY_PATH=$HOME/HDF5-Build/lib:$LD_LIBRARY_PATH
export PATH=$HOME/HDF5-Build/bin:$PATH
Or if you want to make it permanent :
echo 'export LD_LIBRARY_PATH=$HOME/HDF5-Build/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
echo 'export PATH=$HOME/HDF5-Build/bin:$PATH' >> ~/.bashrc
\\
===== Using HDF5 =====
\\
==== Hello World ! ====
\\
In the following section, we will suppose that you are using MPI version of HDF5. If not, simply replace mpif90 with gfortran when used. Create the working directory:
mkdir HDF5-WORK
cd HDF5-WORK
Create an empty file in it, hello.f90, and past the following code, we will explain later:
PROGRAM hello
USE HDF5
IMPLICIT NONE
integer, parameter :: charsize = 13, ndim = 1
integer(8), dimension(ndim) :: nbcharacter
character(len=charsize) :: message
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id ! Identifiers
integer :: error
nbcharacter = charsize
message = 'hello world !'
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Create a new file using default properties.
CALL h5fcreate_f('helloworld.h5', H5F_ACC_TRUNC_F, file_id, error)
! Create the dataspace.
CALL h5screate_simple_f(ndim, nbcharacter, dspace_id, error)
! Create the dataset with default properties.
CALL h5dcreate_f(file_id, "message for you", H5T_NATIVE_CHARACTER, dspace_id, dset_id, error)
! Write data for the dataset.
CALL h5dwrite_f(dset_id, H5T_NATIVE_CHARACTER, message, nbcharacter, error)
! End access to the dataset and release resources used by it.
CALL h5dclose_f(dset_id, error)
! Terminate access to the data space.
CALL h5sclose_f(dspace_id, error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
END PROGRAM hello
Now, we need to compile the file. Do not try to compile it in one line. It is possible with HDF5 for C code, but not with Fortran. Remove the -lz if you didn't used zlib when building HDF5 lib.
mpif90 -c -I $HOME/HDF5/include hello.f90
mpif90 hello.o -o a.out -L$HDF5FOLDER/lib -lhdf5_fortran -lhdf5 -lz
./a.out
If you get the following error: "./a.out: error while loading shared libraries: libhdf5_fortran.so.7: cannot open shared object file: No such file or directory", it means your LD_LIBRARY_PATH does not contain the path to the HDF5 libs. Check it using "echo $LD_LIBRARY_PATH". See in precedent section for adding HDF5 libs to LD_LIBRARY_PATH.
Now, we can check the file created using the h5dump tool (again, if system says h5dump does not exist, check your PATH environment variable).
h5dump helloworld.h5
HDF5 "helloworld.h5" {
GROUP "/" {
DATASET "message for you" {
DATATYPE H5T_STRING {
STRSIZE 1;
STRPAD H5T_STR_SPACEPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 13 ) / ( 13 ) }
DATA {
(0): "h", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d", " ", "!"
}
}
}
}
We can observe the 'hello world !' message, which is an 13 size long array of H5T_STRING of size 1 (~character).
Now that everything works, we can take a look inside.
\\
==== HDF5 structure ====
\\
The HDF5 file can be seen as a drive with a file system, mounted in the program.\\
For example, when you plug in an USB key, you can organize data inside with folders, folder's names, etc. And the USB key contains all data inside. An HDF5 file is the same: it has a root '/', and you can add folders (called group by HDF5 standard), like '/Grid/' or '/Fields/Species/' etc. In each folder (group), you can add dataset, which are containers for data to be stored. Each dataset, contain a headers and the data themself. The headers contain information (also called metadata) on the data stored (Type, Size, Organization, Compression Level, etc). This includes DataType and Attribute (which will be seen after). The data are stored in a container called dataspace.\\
The following representation may be clearer:
{{ :software:development:hdf5-folder.png |}}
The similarity with file system here is obvious. The HDF5 contains a hierarchy structure, with Folders called Groups in HDF5 language. Each group can contain multiple groups or/and datasets. Here, to access velocity U data, the address is /Fields/Velocities/U.
Note that each dataset contains a header and a dataspace. Note also that you can add special information in dataset, alone or side to side with the datasapce, called Attributes. Here, /Information contains the time in simulation corresponding to the data, and the copyright for example.\\
Important: this is not a real representation on how it is organized inside the HDF file. Dataspace are linked in reality to the dataset, allowing multiple links to a single entity. However, this representation can be used for a majority of cases: a lot of users will never use more than one link at a time with each entity.
HDF5 files are a container, organized like a file system, and each data can be seen as a package containing the data with information on it. Another comparison can be made with media files: movies and pictures. A movie file contain an header (with all information to read the movie's data: resolution, codec used to compress, frame rate, etc) and the raw data in a data container. Same things for jpeg images (information: resolution, compression level, date of creation, etc., and after a container for the data of the image).
\\
==== Creating a file, navigate in it ====
\\
Going back to code, first thing is to create an HDF5 file, create groups in it, open and close groups, and close the file.
You need to add the USE HDF5 in your program to specify the program to use HDF5 headers. Then, before creating an HDF5 file, you need to load Fortran HDF interface using h5open_f :
CALL h5open_f (error)
If error is different from 0, then the program failed to open interface.
Then, we create the file 'myfile.h5':
CALL h5fcreate_f('myfile.h5', H5F_ACC_TRUNC_F, file_id, error)
* First parameter is the file name, a character.
* Then the file access properties, there are 2 main ways :
* H5F_ACC_TRUNC_F: If the file already exists, the file is opened with read-write access, and new data will overwrite any existing data. If the file does not exist, it is created and opened with read-write access.
* H5F_ACC_EXCL: If the file already exists, H5Fcreate fails. If the file does not exist, it is created and opened with read-write access.
* file_id is the file identifier, like open(file='Dudule.dat',file_id) in Fortran.
* The error flag.
We then create the dataspace that will hold Data. Note that we first create the dataspace, which will be attributed to a dataset later.
CALL h5screate_simple_f(ndim, nbcharacter, dspace_id, error)
* ndim is the number of dimensions the array stored will have. (i.e. : Temp(:) is 1D, so ndim = 1, Temp(:,:) is 2D, so ndim =2, etc).
* nbcharacter is here the number of elements of the array. Note that if ndim = 1, nbcharacter is dimension(1:1), if ndim = 2, nbcharacter is dimension(1:2) etc.
* dspace_id is the identifier of the dataspace.
* The error flag.
Then, we create a dataset on which we attach a dataspace (using it's id previously obtained):
CALL h5dcreate_f(file_id, "message for you", H5T_NATIVE_CHARACTER, dspace_id, dset_id, error)
* file_id is the file identifier of the HDF5 file on which we want to write.
* "message for you" is here the name given to the dataset. It is the name of the data stored, for example: Temperature, Velocity, Pressure, etc.
* H5T_NATIVE_CHARACTER is the format of data stored. The following formats are available :
* H5T_NATIVE_INTEGER : for integer(4) in fortran
* H5T_NATIVE_REAL : for real(4) in fortran
* H5T_NATIVE_DOUBLE : for real(8) in fortran
* H5T_NATIVE_CHARACTER : for character in fortran
* Others formats are available, see in the documentation (String, etc)
* dspace_id is the identifier of the dataspace that will be inside this dataset.
* dset_id is the identifier of this new dataset.
* The error flag.
Now that the container (the dataset, linked to the dataspace) is created, we just need to write the data itself inside it.
CALL h5dwrite_f(dset_id, H5T_NATIVE_CHARACTER, message, nbcharacter, error)
* dset_id is the dataset identifier where the dataspace is.
* H5T_NATIVE_CHARACTER the data format
* message the array to be stored, the data in fortran program
* nbcharacter the number of elements of the array of data stored
* The error flag.
Then, just need to close resources, in reverse order:
CALL h5dclose_f(dset_id, error)
CALL h5sclose_f(dspace_id, error)
CALL h5fclose_f(file_id, error)
And to finish, close HDF5 interface :
CALL h5close_f(error)
That's all. We wrote our first h5 file. Now, let's speak about groups (folders). Groups allow you to organize data into the h5 file. The best example is to create an hdf5 for fluid dynamic: you need to store the grid used for capture or computation, and the physical data like velocities and temperature. We would like to have something like that, for a 2D capture/simulation:
* /: the root
* /Grid: the folder that contains information for the grid
* /Fields: the folder that contains physical data
* /Fields/Velocities: the folder that contains the two velocities component: U (on X) and V (on Y)
Note that to write/read inside a group (folder), you need to open it first, like double-clicking on a folder in your computer to access inside it.
Here is the source, compile it, run it, and h5dump it to visualize the result:
PROGRAM CFDFIELD
USE HDF5
IMPLICIT NONE
integer, parameter :: ndim = 2
character(len=40) :: foldername
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers
integer :: error
foldername = '/Fields/Velocities'
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Create a new file using default properties.
CALL h5fcreate_f('CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error)
! Create all groups (folders) and close them after.
CALL h5gcreate_f(file_id,'/Grid',group_id,error)
CALL h5gclose_f(group_id,error)
CALL h5gcreate_f(file_id,'/Fields',group_id,error)
CALL h5gclose_f(group_id,error)
CALL h5gcreate_f(file_id,'/Fields/Velocities',group_id,error)
CALL h5gclose_f(group_id,error)
! Try to re-open a specific group (folder) and close it after.
CALL h5gopen_f(file_id,trim(adjustl(foldername)), group_id, error)
if(error==0) print *, "Succes in opening the folder :",trim(adjustl(foldername))
CALL h5gclose_f(group_id,error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
END PROGRAM CFDFIELD
When running it, you will (must !) see the following message : 'Succes in opening the folder :/Fields/Velocities'
Result with h5dump :
h5dump CFD-DATA.h5
HDF5 "CFD-DATA.h5" {
GROUP "/" {
GROUP "Fields" {
GROUP "Velocities" {
}
}
GROUP "Grid" {
}
}
}
\\
==== Storing data ====
\\
Now, let's store something in this file. We will open it, and open the /Grid folder to store the grid in it (X positions and Y positions). The open subroutine (to open an existing HDF file) is the following:
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
* First parameter is the file name, a character.
* Then the file access properties, there are 2 main ways :
* H5F_ACC_RDONLY: An existing file is opened with read-only access. If the file does not exist, H5Fopen fails.
* H5F_ACC_RDWR: An existing file is opened with read-write access. If the file does not exist, H5Fopen fails.
* file_id is the file identifier, like open(file='Dudule.dat',file_id) in Fortran.
* The error flag.
The source code is the following:
PROGRAM CFDFIELD2
USE HDF5
IMPLICIT NONE
integer, parameter :: ndim = 1
integer(8), dimension(ndim) :: nbXpoints,nbYpoints
character(len=40) :: foldername
real(8), dimension(:), allocatable :: Xpoints,Ypoints
real(8) :: Dx,Dy ! Delta x and Delta y, space between two points in the artificial grid, not important here
integer :: i
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers
integer(HID_T) :: dspace_Xgrid_id,dspace_Ygrid_id,dset_Xgrid_id,dset_Ygrid_id
integer :: error
foldername = '/Grid'
nbXpoints(1) = 10
nbYpoints(1) = 6
Dx = 0.1d0
Dy = 0.3d0
allocate(Xpoints(1:nbXpoints(1)),Ypoints(1:nbYpoints(1)))
! Generate artificial grid to be stored
do i=1,nbXpoints(1)
Xpoints(i) = (i-1)*Dx
end do
do i=1,nbYpoints(1)
Ypoints(i) = (i-1)*Dy
end do
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Open a file using default properties.
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
! Try to re-open a specific group (folder).
CALL h5gopen_f(file_id,trim(adjustl(foldername)), group_id, error)
if(error==0) print *, "Succes in opening the folder :",trim(adjustl(foldername))
! Create the dataspaces for X and Y.
CALL h5screate_simple_f(ndim, nbXpoints(1), dspace_Xgrid_id, error)
CALL h5screate_simple_f(ndim, nbYpoints(1), dspace_Ygrid_id, error)
! Create the datasets with default properties.
CALL h5dcreate_f(group_id, "Xpositions", H5T_NATIVE_DOUBLE, dspace_Xgrid_id, dset_Xgrid_id, error)
CALL h5dcreate_f(group_id, "Ypositions", H5T_NATIVE_DOUBLE, dspace_Ygrid_id, dset_Ygrid_id, error)
! Write data for the datasets.
CALL h5dwrite_f(dset_Xgrid_id, H5T_NATIVE_DOUBLE, Xpoints, nbXpoints, error)
CALL h5dwrite_f(dset_Ygrid_id, H5T_NATIVE_DOUBLE, Ypoints, nbYpoints, error)
! End access to the datasets and release resources used by them.
CALL h5dclose_f(dset_Xgrid_id, error)
CALL h5dclose_f(dset_Ygrid_id, error)
! Terminate access to the data spaces.
CALL h5sclose_f(dspace_Xgrid_id, error)
CALL h5sclose_f(dspace_Ygrid_id, error)
! Terminate access to the group.
CALL h5gclose_f(group_id,error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
deallocate(Xpoints,Ypoints)
END PROGRAM CFDFIELD2
And you shall see the following information:
./a.out
Success in opening the folder :/Grid
h5dump CFD-DATA.h5
HDF5 "CFD-DATA.h5" {
GROUP "/" {
GROUP "Fields" {
GROUP "Velocities" {
}
}
GROUP "Grid" {
DATASET "Xpositions" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA {
(0): 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
}
}
DATASET "Ypositions" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
DATA {
(0): 0, 0.3, 0.6, 0.9, 1.2, 1.5
}
}
}
}
}
You can see that grid information are now stored in the Grid group (like a folder).
We can then add the data Fields, which are 2D fields, of 10x6 points each. (Note that this time, the program is voluntary a little less documented, and some values are passed directly to the subroutines). nbpoints(1:2) is a 2D array that store the number of points per axes (per dimensions).
PROGRAM CFDFIELD3
USE HDF5
IMPLICIT NONE
integer(8), dimension(2) :: nbpoints
character(len=40) :: foldername
real(8), dimension(:,:), allocatable :: U,V,Temp
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers
integer(HID_T) :: dspace_U_id,dspace_V_id,dspace_Temp_id,dset_U_id,dset_V_id,dset_Temp_id
integer :: error
nbpoints(1) = 10 ! = nbXpoints(1)
nbpoints(2) = 6 ! = nbYpoints(1)
allocate(U(1:nbpoints(1),1:nbpoints(2)),V(1:nbpoints(1),1:nbpoints(2)), &
& Temp(1:nbpoints(1),1:nbpoints(2)) )
! Generate artificial data
U(:,:) = 1.7d0
V(:,:) = 2.5d0
Temp(:,:) = 350.0d0
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Open a file using default properties.
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
! Try to re-open a specific group (folder).
CALL h5gopen_f(file_id,'/Fields/Velocities', group_id, error)
! Create the dataspaces for U and V, with 2 dimensions of size nbpoints(1)*nbpoints(2).
CALL h5screate_simple_f(2, nbpoints, dspace_U_id, error)
CALL h5screate_simple_f(2, nbpoints, dspace_V_id, error)
! Create the datasets with default properties.
CALL h5dcreate_f(group_id, "U", H5T_NATIVE_DOUBLE, dspace_U_id, dset_U_id, error)
CALL h5dcreate_f(group_id, "V", H5T_NATIVE_DOUBLE, dspace_V_id, dset_V_id, error)
! Write data for the datasets.
CALL h5dwrite_f(dset_U_id, H5T_NATIVE_DOUBLE, U, nbpoints, error)
CALL h5dwrite_f(dset_V_id, H5T_NATIVE_DOUBLE, V, nbpoints, error)
! End access to the datasets and release resources used by them.
CALL h5dclose_f(dset_U_id, error)
CALL h5dclose_f(dset_V_id, error)
! Terminate access to the data spaces.
CALL h5sclose_f(dspace_U_id, error)
CALL h5sclose_f(dspace_V_id, error)
! Terminate access to the group.
CALL h5gclose_f(group_id,error)
! Same thing with Temperature data
CALL h5gopen_f(file_id,'/Fields', group_id, error)
CALL h5screate_simple_f(2, nbpoints, dspace_Temp_id, error)
CALL h5dcreate_f(group_id, "Temperature", H5T_NATIVE_DOUBLE, dspace_Temp_id, dset_Temp_id, error)
CALL h5dwrite_f(dset_Temp_id, H5T_NATIVE_DOUBLE, Temp, nbpoints, error)
CALL h5dclose_f(dset_Temp_id, error)
CALL h5sclose_f(dspace_Temp_id, error)
CALL h5gclose_f(group_id,error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
deallocate(U,V,Temp)
END PROGRAM CFDFIELD3
You should obtain something like this:
./a.out
h5dump CFD-DATA.h5
HDF5 "CFD-DATA.h5" {
GROUP "/" {
GROUP "Fields" {
DATASET "Temperature" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) }
DATA {
(0,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(1,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(2,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(3,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(4,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(5,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350
}
}
GROUP "Velocities" {
DATASET "U" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) }
DATA {
(0,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7,
(1,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7,
(2,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7,
(3,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7,
(4,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7,
(5,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7
}
}
DATASET "V" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) }
DATA {
(0,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
(1,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
(2,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
(3,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
(4,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,
(5,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5
}
}
}
}
GROUP "Grid" {
DATASET "Xpositions" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA {
(0): 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
}
}
DATASET "Ypositions" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
DATA {
(0): 0, 0.3, 0.6, 0.9, 1.2, 1.5
}
}
}
}
}
\\
==== Reading data ====
\\
Last thing to do is to read these data from another program. The subroutine to read data is :
CALL h5dread_f(dset_id, H5T_NATIVE_DOUBLE, U, nbpoints, error)
* dset_id is the dataset identifier where the dataspace is.
* H5T_NATIVE_DOUBLE the data format
* The array where read data will be stored
* nbpoints the number of elements of the array of data stored
* The error flag.
The source code is the following:
PROGRAM CFDFIELD4
USE HDF5
IMPLICIT NONE
integer(8), dimension(2) :: nbpoints
real(8), dimension(:,:), allocatable :: U
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers
integer :: error,i
nbpoints(1) = 10 ! = nbXpoints(1)
nbpoints(2) = 6 ! = nbYpoints(1)
allocate(U(1:nbpoints(1),1:nbpoints(2)))
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Open a file using default properties.
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
! Try to re-open a specific group (folder).
CALL h5gopen_f(file_id,'/Fields/Velocities', group_id, error)
! Open an existing dataset
CALL h5dopen_f(group_id, 'U', dset_id, error)
! Read the data in dataset.
CALL h5dread_f(dset_id, H5T_NATIVE_DOUBLE, U, nbpoints, error)
! Print data
do i=1,nbpoints(2)
write(*,'(10F6.2)') U(:,i)
end do
! End access to the datasets and release resources used by them.
CALL h5dclose_f(dset_id, error)
! Terminate access to the group.
CALL h5gclose_f(group_id,error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
deallocate(U)
END PROGRAM CFDFIELD4
And this should give this:
./a.out
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
Data were read correctly.
\\
==== Adding Metadata ====
\\
Last thing to do is to add attributes. It is possible to add information to data using attributes. The definition given in the HDF documentation is clear: "An HDF5 attribute is a small metadata object describing the nature and/or intended usage of a primary data object.". Attribute can be used to add some information on a data (for example time of the simulation corresponding to these data, configuration of the simulation, input information of the program, etc).
The subroutine to open an existing dataset is :
CALL h5dopen_f(group_id, 'Temperature', dset_id, error)
* group_id the id corresponding of the position where the dataset is (must have been open before).
* the name of the dataset
* the id of the dataset
* error flag
Then, first thing to do is to create a dataspace for the attribute.
CALL h5screate_simple_f(ndim, StringNumber, aspace_id, error)
* ndim is the dimension of the dataspace
* StringNumber is the number of values stored in it
* aspace_id is the id of this dataspace
* the error flag
The second thing to do is to create a type to store character in string form. String is not supported natively in fortran, but there is a trick :
CALL h5tcopy_f(H5T_NATIVE_CHARACTER, atype_id, error)
CALL h5tset_size_f(atype_id, StringLenght, error)
The H5T_NATIVE_CHARACTER is locked, the subroutine h5tcopy_f copy the existing datatype. This copy can be manipulated. Then, the new data type is enlarged to StringLength size. This datatype can now be used to store StringLenght size strings.
* H5T_NATIVE_CHARACTER is the existing datatype
* atype_id is the new datatype id
* StringLenght is the size wanted for the new datatype
Then, the attribute must be stored into a specific dataset :
CALL h5acreate_f(dset_id, 'Additional Information', atype_id, aspace_id, attr_id, error)
* dest_it the id of this dataset that will contain this dataspace
* the name of the dataspace
* atype_id the type of data stored in it
* aspace_id the id of the attribute dataspace
* attr_id is the id of the attribute
* the error flag
It is the possible to write the metadata of this attribute :
CALL h5awrite_f(attr_id, atype_id, information, StringNumber, error)
* attr_id is the id of the attribute
* atype_id the type of data stored in it
* the metadata to store
* StringNumber is the number of values stored
* the error flag
The program bellow opens the existing h5 file, and adds metadata to the 'Temperature' dataset.
PROGRAM CFDFIELD5
USE HDF5
IMPLICIT NONE
! HDF5
integer(HID_T) :: file_id,dspace_id,dset_id,group_id,atype_id,aspace_id,attr_id ! Identifiers
integer :: error
integer(8) :: StringLenght
integer(8), dimension(1) :: StringNumber
CHARACTER(LEN=80), DIMENSION(3) :: information
real(8) :: Pressure,Time
Pressure = 1.015d0
Time = 1.58d-5
StringNumber(1) = 3
StringLenght = 80
information(1) = 'Pressure : '
information(2) = 'Time : '
information(3) = 'Dimensions : 2cm x 3cm'
write(information(1),'(A,ES14.7)') trim(adjustl(information(1))),Pressure
write(information(2),'(A,ES14.7)') trim(adjustl(information(2))),Time
! Initialize FORTRAN interface.
CALL h5open_f (error)
! Open a file using default properties.
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
! Try to re-open a specific group (folder).
CALL h5gopen_f(file_id,'/Fields', group_id, error)
! Open an existing dataset.
CALL h5dopen_f(group_id, 'Temperature', dset_id, error)
! Create scalar data space for the attribute.
CALL h5screate_simple_f(1, StringNumber, aspace_id, error)
! Create datatype for the char strings.
CALL h5tcopy_f(H5T_NATIVE_CHARACTER, atype_id, error)
CALL h5tset_size_f(atype_id, StringLenght, error)
! Create dataset attribute.
CALL h5acreate_f(dset_id, 'Additional Information', atype_id, aspace_id, attr_id, error)
! Write the attribute data.
CALL h5awrite_f(attr_id, atype_id, information, StringNumber, error)
! Close the attribute.
CALL h5aclose_f(attr_id, error)
! End access to the dataset and release resources used by it.
CALL h5dclose_f(dset_id, error)
! Terminate access to the data space.
CALL h5sclose_f(aspace_id, error)
! Terminate access to the group.
CALL h5gclose_f(group_id,error)
! Terminate access to the file.
CALL h5fclose_f(file_id, error)
! Close FORTRAN interface.
CALL h5close_f(error)
END PROGRAM CFDFIELD5
And the result :
./a.out
h5dump CFD-DATA.h5
[.........]
GROUP "/" {
GROUP "Fields" {
DATASET "Temperature" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) }
DATA {
(0,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(1,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(2,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(3,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(4,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350,
(5,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350
}
ATTRIBUTE "Additional Information" {
DATATYPE H5T_STRING {
STRSIZE 80;
STRPAD H5T_STR_SPACEPAD;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): "Pressure : 1.0150000E+00 ",
(1): "Time : 1.5800000E-05 ",
(2): "Dimensions : 2cm x 3cm "
}
}
}
[.........]
That is all for the serial way of using HDF5. There are many other useful things, to adapt to different cases, but the bases are here.
\\
==== Reading HDF5 with Paraview ====
\\
One of the most famous rendering software is Paraview, an open-source visualization tool. Paraview has 3D acceleration using OpenGL, making it fast, and easy to use. On top of that, in includes many tools like fft, python scripting, etc.
Paraview has the ability to open HDF5 files, using XDMF index. The simpler way to do that is to add three levels of XDMF: Temporal, Spatial, Data.
We will suppose you ran a 4 thread MPI job, and that each of the thread wrote an HDF5 file, at time 0.0s. Each HDF5 file contains the Grid structure and the data of each individual thread.
The first XDMF would be:
**MYCFD.xmf**
Then, the second level at time 0.0 would be :
**MYCFD_st000000.xmf**
The 4 files refer to the HDF5 files written by the 4 threads at time 0.0 of the simulation.
Finally, all of the 4 XDMF files include PATH to the data in the HDF files (assuming the HDF5 files are MYCFD_st000000_p000000.h5, MYCFD_st000000_p000001.h5, MYCFD_st000000_p000002.h5, and MYCFD_st000000_p000003.h5):
**MYCFD_st000000_p000000.xmf**
MYCFD_st000000_p000000.h5:/Grid/Xpositions
MYCFD_st000000_p000000.h5:/Grid/Ypositions
MYCFD_st000000_p000000.h5:/Fields/velocities/U
MYCFD_st000000_p000000.h5:/Fields/velocities/V
MYCFD_st000000_p000000.h5:/Fields/Temperature
Paraview will the load all the data when opening the first file. Note that this is the simplest case. You can find a lot more about GeometryType, AttributeType, TopologyType, etc. on the internet.
\\
===== Using PHDF5 =====
\\
First of all, this section requires a few words. The present performances of the MPI-IO, on which is based PHDF5, are really machine dependant, and need tuning to be efficient. The main advantage of PHDF5 is to get only one file from multiple MPI threads, making it easier to share or post-process. But its performances should be taken into account, and PHDF5 should only be used if the cluster administrator can provides you the tuning parameters for the cluster.
\\
==== Writing data with PHDF5 ====
\\
Writing files using PHDF5 is the same as with HDF5, but with additional steps:
* The file must be created collectively using a list with parallel I/O access.
* Data must be stored into a memory-space before being written.
* An hyperslab corresponding to the area where data will be written need to be defined.
* Data must be written collectively using a list for parallel I/O access.
Create a new property list (h5pcreate_f), and link it to the MPI communicator MPI_COMM_WORLD (h5pset_fapl_mpio_f):
CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error)
CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL, error)
* H5P_FILE_ACCESS_F, the type of the property list. Possible values (others were added with 1.8.0 release):
* H5P_FILE_CREATE_F
* H5P_FILE_ACCESS_F
* H5P_DATASET_CREATE_F
* H5P_DATASET_XFER_F
* H5P_MOUNT_F
* plist_id, the id of the property list
* the error flag
* plist_id, the id of the property list
* the MPI communicator, here MPI_COMM_WORLD
* MPI info object to be used for MPI file open (use MPI_INFO_NULL)
* the error flag
Then, create the file collectively, using the property list freshly defined, and close the property list.
CALL h5fcreate_f('MPI-CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id)
CALL h5pclose_f(plist_id, error)
* the name of the file
* the way to create it (refer to serial section for further information)
* the file id
* the error flag
* the property list, linked to MPI communicator
* the property list id
* the error flag
Then, each thread needs to prepare a dataspace called memory-space, which will be used as a buffer when writing data.
CALL h5screate_simple_f(ndim, datasize, memspace, error)
* the number of dimensions of the data
* the size of data
* the id of this buffer
* the error flag
Next step, the hyperslab. The hyperslab is the position in the dataspace where data of the current thread have to be written. For example, in the code above, if we imagine the 2D array of data using x,y positions, the thread 0 will write its data of size 2*2 at x=0, y=0. The thread 1 with 8*2 data will write at x=2, y=0. The thread 2 with 2*8 data will write at x=0,y=2. And the last thread will complete the array, with 8*8 data to be written at x=2,y=2.
10 10 11 11 11 11 11 11 11 11
10 10 11 11 11 11 11 11 11 11
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
12 12 13 13 13 13 13 13 13 13
So, for thread 0 in blue, data size is 2*2, and offset (offset correspond to the difference from the origin) is 0,0. For thread 1 in red, 8*2 and 2,0. For thread 2 in green, 2*8 and 0,2. And for thread 3 in yellow, 8*8 and 2,2.
Each thread creates an hyperslab when specifying datasize and offset.
CALL h5sselect_hyperslab_f (dataspace, H5S_SELECT_SET_F, offset, datasize, error)
* the dataspace corresponding to the hyperslab
* the way the hyperslab will write over existing data (from documentation):
* H5S_SELECT_SET: Replaces the existing selection with the parameters from this call. Overlapping blocks are not supported with this operator.
* H5S_SELECT_OR: Adds the new selection to the existing selection. (Binary OR)
* H5S_SELECT_AND: Retains only the overlapping portions of the new selection and the existing selection. (Binary AND)
* H5S_SELECT_XOR: Retains only the elements that are members of the new selection or the existing selection, excluding elements that are members of both selections. (Binary exclusive-OR, XOR)
* H5S_SELECT_NOTB: Retains only elements of the existing selection that are not in the new selection.
* H5S_SELECT_NOTA: Retains only elements of the new selection that are not in the existing selection.
* the offset of the data area to be written
* the size of the data area to be written
* the error flag
Last step before writing; create a property list for collective write in the dataset.
CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error)
CALL h5pset_dxpl_mpio_f(plist_id, H5FD_MPIO_COLLECTIVE_F, error)
* the type of the property list (see at the beginning of this section for information)
* the property list id
* the error flag
* the property list id
* Specify that the property list will correspond to an MPI-IO collective write
* the error flag
Then, data can be written using the same function as serial HDF5, but with additional arguments:
CALL h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, F, datasize, error, file_space_id = dataspace, mem_space_id = memspace, xfer_prp = plist_id)
* dset_id is the dataset identifier where the dataspace is
* H5T_NATIVE_DOUBLE the data format
* the data
* datasize the number of elements of the array of data stored
* the error flag
* the dataspace were data have to be written
* the dataspace used as a buffer, the memory-space
* the property list id to be used
The following program does a simple collective write, as described above.
PROGRAM T5
USE HDF5 ! This module contains all necessary modules
USE mpi
IMPLICIT NONE
integer, parameter :: ndim = 2
integer(HSIZE_T), dimension(ndim) :: Nx
integer(HID_T) :: file_id ! File identifier
integer(HID_T) :: dset_id ! Dataset identifier
integer(HID_T) :: dataspace ! Dataspace identifier
integer(HID_T) :: memspace ! memspace identifier
integer(HID_T) :: plist_id ! Property list identifier
integer(HSIZE_T), dimension(ndim) :: datasize
integer(HSIZE_T), dimension(ndim) :: offset
real(8), allocatable, dimension(:,:) :: F
integer :: mpierror,error ! MPI error flag
integer :: mpi_size, mpi_rank
CALL MPI_INIT(mpierror)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, mpi_size, mpierror)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, mpi_rank, mpierror)
Nx(1) = 10
Nx(2) = 10
! mpi_rank = 0
if(mpi_rank==0) allocate(F(2,2))
if(mpi_rank==1) allocate(F(8,2))
if(mpi_rank==2) allocate(F(2,8))
if(mpi_rank==3) allocate(F(8,8))
F(:,:) = mpi_rank+10.0d0
! Initialize FORTRAN predefined datatypes
CALL h5open_f(error)
! Setup file access property list with parallel I/O access.
CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error)
CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL, error)
! Create the file collectively.
CALL h5fcreate_f('MPI-CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id)
CALL h5pclose_f(plist_id, error)
! Create the data space for the dataset.
CALL h5screate_simple_f(ndim, Nx, dataspace, error)
! Create the dataset with default properties.
CALL h5dcreate_f(file_id, 'Temperature', H5T_NATIVE_DOUBLE, dataspace,dset_id, error)
datasize(1) = size(F,1)
datasize(2) = size(F,2)
CALL h5screate_simple_f(ndim, datasize, memspace, error)
if(mpi_rank==0) then; offset(1) = 0; offset(2) = 0; endif
if(mpi_rank==1) then; offset(1) = 2; offset(2) = 0; endif
if(mpi_rank==2) then; offset(1) = 0; offset(2) = 2; endif
if(mpi_rank==3) then; offset(1) = 2; offset(2) = 2; endif
! Select hyperslab in the file.
CALL h5sselect_hyperslab_f (dataspace, H5S_SELECT_SET_F, offset, datasize, error)
! Create property list for collective dataset write
CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error)
CALL h5pset_dxpl_mpio_f(plist_id, H5FD_MPIO_COLLECTIVE_F, error)
CALL h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, F, datasize, error, &
file_space_id = dataspace, mem_space_id = memspace, xfer_prp = plist_id)
! Close dataspaces.
CALL h5sclose_f(dataspace, error)
CALL h5sclose_f(memspace, error)
! Close the dataset and property list.
CALL h5dclose_f(dset_id, error)
CALL h5pclose_f(plist_id, error)
! Close the file.
CALL h5fclose_f(file_id, error)
CALL h5close_f(error)
CALL MPI_FINALIZE(mpierror)
END PROGRAM T5
And the result:
mpirun -n 4 ./a.out
h5dump MPI-CFD-DATA.h5
HDF5 "MPI-CFD-DATA.h5" {
GROUP "/" {
DATASET "Temperature" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 10, 10 ) / ( 10, 10 ) }
DATA {
(0,0): 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
(1,0): 10, 10, 11, 11, 11, 11, 11, 11, 11, 11,
(2,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(3,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(4,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(5,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(6,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(7,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(8,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13,
(9,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13
}
}
}
}