The aim of this tutorial is to help new HDF5 users to find how to use this library, from the source package, through binaries building, to serial and parallel (MPI) write and read.
The following knowledge are required to follow this tutorial :
Note that you can find the official and full documentation here : http://www.hdfgroup.org/HDF5/doc/UG/index.html
You can download sources from the HDF5 web site: http://www.hdfgroup.org/ftp/HDF5/current/src/
We will suppose you do not have root privilege, so library will need to be installed in $HOME directory.
cd mkdir $HOME/HDF5-Build mkdir $HOME/HDF5 cd $HOME/HDF5-Build wget http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8.10/src/hdf5-1.8.10.tar.gz tar xvzf hdf5-1.8.10.tar.gz cd hdf5-1.8.10
First thing to do is to check which compiler we are using to compile the library. Note that after that, the library will be dependent of this compiler, and the code using this library will probably have to be compiled also with the same compiler.
We will consider the gcc compiler, but it is the same with intel. Just replace gcc by icc and gfortran by ifort when needed.
Check that gcc and gfortran are available, with MPI wrappers if you plan to use parallel HDF5 : MPI :
which mpicc; which mpif90 /usr/bin/mpicc /usr/bin/mpif90
Serial :
which gcc; which gfortran /usr/bin/gcc /usr/bin/gfortran
Then, let prepare the makefile.
MPI :
./configure CC=mpicc FC=mpif90 --prefix=$HOME/HDF5 --libdir=$HOME/usr/lib --bindir=$HOME/usr/bin --enable-parallel --enable-shared --with-pic --enable-fortran --with-zlib=/usr/lib
Serial :
./configure CC=gcc FC=gfortran --prefix=$HOME/HDF5 --libdir=$HOME/usr/lib --bindir=$HOME/usr/bin --enable-parallel --enable-shared --with-pic --enable-fortran --with-zlib=/usr/lib
For Ubuntu 12.04 or > based ubuntu systems, replace –with-zlib=/usr/lib with –with-zlib=/usr/lib/x86_64-linux-gnu. Now, make, test (check) and install. These commands can take a lot of time to achieve. If you get an error, try to find it on internet. If you get an error with zlib, consider if you need compression in HDF5. If not, remove the –with-zlib=/usr/lib. Note also that parallel HDF5 (PHDF5) cannot use compression, no need to activate it if you plan to use MPI HDF5 only.
make make check make install cd $HOME rm -R HDF5-Build
(If you get an error at test, see in Libraries section of this website for more information).
You are done for the install. Now, you can load the library using this :
export LD_LIBRARY_PATH=$HOME/HDF5-Build/lib:$LD_LIBRARY_PATH export PATH=$HOME/HDF5-Build/bin:$PATH
Or if you want to make it permanent :
echo 'export LD_LIBRARY_PATH=$HOME/HDF5-Build/lib:$LD_LIBRARY_PATH' >> ~/.bashrc echo 'export PATH=$HOME/HDF5-Build/bin:$PATH' >> ~/.bashrc
In the following section, we will suppose that you are using MPI version of HDF5. If not, simply replace mpif90 with gfortran when used. Create the working directory:
mkdir HDF5-WORK cd HDF5-WORK
Create an empty file in it, hello.f90, and past the following code, we will explain later:
PROGRAM hello USE HDF5 IMPLICIT NONE integer, parameter :: charsize = 13, ndim = 1 integer(8), dimension(ndim) :: nbcharacter character(len=charsize) :: message ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id ! Identifiers integer :: error nbcharacter = charsize message = 'hello world !' ! Initialize FORTRAN interface. CALL h5open_f (error) ! Create a new file using default properties. CALL h5fcreate_f('helloworld.h5', H5F_ACC_TRUNC_F, file_id, error) ! Create the dataspace. CALL h5screate_simple_f(ndim, nbcharacter, dspace_id, error) ! Create the dataset with default properties. CALL h5dcreate_f(file_id, "message for you", H5T_NATIVE_CHARACTER, dspace_id, dset_id, error) ! Write data for the dataset. CALL h5dwrite_f(dset_id, H5T_NATIVE_CHARACTER, message, nbcharacter, error) ! End access to the dataset and release resources used by it. CALL h5dclose_f(dset_id, error) ! Terminate access to the data space. CALL h5sclose_f(dspace_id, error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) END PROGRAM hello
Now, we need to compile the file. Do not try to compile it in one line. It is possible with HDF5 for C code, but not with Fortran. Remove the -lz if you didn't used zlib when building HDF5 lib.
mpif90 -c -I $HOME/HDF5/include hello.f90 mpif90 hello.o -o a.out -L$HDF5FOLDER/lib -lhdf5_fortran -lhdf5 -lz ./a.out
If you get the following error: “./a.out: error while loading shared libraries: libhdf5_fortran.so.7: cannot open shared object file: No such file or directory”, it means your LD_LIBRARY_PATH does not contain the path to the HDF5 libs. Check it using “echo $LD_LIBRARY_PATH”. See in precedent section for adding HDF5 libs to LD_LIBRARY_PATH. Now, we can check the file created using the h5dump tool (again, if system says h5dump does not exist, check your PATH environment variable).
h5dump helloworld.h5 HDF5 "helloworld.h5" { GROUP "/" { DATASET "message for you" { DATATYPE H5T_STRING { STRSIZE 1; STRPAD H5T_STR_SPACEPAD; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 13 ) / ( 13 ) } DATA { (0): "h", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d", " ", "!" } } } }
We can observe the 'hello world !' message, which is an 13 size long array of H5T_STRING of size 1 (~character). Now that everything works, we can take a look inside.
The HDF5 file can be seen as a drive with a file system, mounted in the program.
For example, when you plug in an USB key, you can organize data inside with folders, folder's names, etc. And the USB key contains all data inside. An HDF5 file is the same: it has a root '/', and you can add folders (called group by HDF5 standard), like '/Grid/' or '/Fields/Species/' etc. In each folder (group), you can add dataset, which are containers for data to be stored. Each dataset, contain a headers and the data themself. The headers contain information (also called metadata) on the data stored (Type, Size, Organization, Compression Level, etc). This includes DataType and Attribute (which will be seen after). The data are stored in a container called dataspace.
The following representation may be clearer:
The similarity with file system here is obvious. The HDF5 contains a hierarchy structure, with Folders called Groups in HDF5 language. Each group can contain multiple groups or/and datasets. Here, to access velocity U data, the address is /Fields/Velocities/U.
Note that each dataset contains a header and a dataspace. Note also that you can add special information in dataset, alone or side to side with the datasapce, called Attributes. Here, /Information contains the time in simulation corresponding to the data, and the copyright for example.
Important: this is not a real representation on how it is organized inside the HDF file. Dataspace are linked in reality to the dataset, allowing multiple links to a single entity. However, this representation can be used for a majority of cases: a lot of users will never use more than one link at a time with each entity.
HDF5 files are a container, organized like a file system, and each data can be seen as a package containing the data with information on it. Another comparison can be made with media files: movies and pictures. A movie file contain an header (with all information to read the movie's data: resolution, codec used to compress, frame rate, etc) and the raw data in a data container. Same things for jpeg images (information: resolution, compression level, date of creation, etc., and after a container for the data of the image).
Going back to code, first thing is to create an HDF5 file, create groups in it, open and close groups, and close the file.
You need to add the USE HDF5 in your program to specify the program to use HDF5 headers. Then, before creating an HDF5 file, you need to load Fortran HDF interface using h5open_f :
CALL h5open_f (error)
If error is different from 0, then the program failed to open interface.
Then, we create the file 'myfile.h5':
CALL h5fcreate_f('myfile.h5', H5F_ACC_TRUNC_F, file_id, error)
We then create the dataspace that will hold Data. Note that we first create the dataspace, which will be attributed to a dataset later.
CALL h5screate_simple_f(ndim, nbcharacter, dspace_id, error)
Then, we create a dataset on which we attach a dataspace (using it's id previously obtained):
CALL h5dcreate_f(file_id, "message for you", H5T_NATIVE_CHARACTER, dspace_id, dset_id, error)
Now that the container (the dataset, linked to the dataspace) is created, we just need to write the data itself inside it.
CALL h5dwrite_f(dset_id, H5T_NATIVE_CHARACTER, message, nbcharacter, error)
Then, just need to close resources, in reverse order:
CALL h5dclose_f(dset_id, error) CALL h5sclose_f(dspace_id, error) CALL h5fclose_f(file_id, error)
And to finish, close HDF5 interface :
CALL h5close_f(error)
That's all. We wrote our first h5 file. Now, let's speak about groups (folders). Groups allow you to organize data into the h5 file. The best example is to create an hdf5 for fluid dynamic: you need to store the grid used for capture or computation, and the physical data like velocities and temperature. We would like to have something like that, for a 2D capture/simulation:
Note that to write/read inside a group (folder), you need to open it first, like double-clicking on a folder in your computer to access inside it. Here is the source, compile it, run it, and h5dump it to visualize the result:
PROGRAM CFDFIELD USE HDF5 IMPLICIT NONE integer, parameter :: ndim = 2 character(len=40) :: foldername ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers integer :: error foldername = '/Fields/Velocities' ! Initialize FORTRAN interface. CALL h5open_f (error) ! Create a new file using default properties. CALL h5fcreate_f('CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error) ! Create all groups (folders) and close them after. CALL h5gcreate_f(file_id,'/Grid',group_id,error) CALL h5gclose_f(group_id,error) CALL h5gcreate_f(file_id,'/Fields',group_id,error) CALL h5gclose_f(group_id,error) CALL h5gcreate_f(file_id,'/Fields/Velocities',group_id,error) CALL h5gclose_f(group_id,error) ! Try to re-open a specific group (folder) and close it after. CALL h5gopen_f(file_id,trim(adjustl(foldername)), group_id, error) if(error==0) print *, "Succes in opening the folder :",trim(adjustl(foldername)) CALL h5gclose_f(group_id,error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) END PROGRAM CFDFIELD
When running it, you will (must !) see the following message : 'Succes in opening the folder :/Fields/Velocities' Result with h5dump :
h5dump CFD-DATA.h5 HDF5 "CFD-DATA.h5" { GROUP "/" { GROUP "Fields" { GROUP "Velocities" { } } GROUP "Grid" { } } }
Now, let's store something in this file. We will open it, and open the /Grid folder to store the grid in it (X positions and Y positions). The open subroutine (to open an existing HDF file) is the following:
CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error)
The source code is the following:
PROGRAM CFDFIELD2 USE HDF5 IMPLICIT NONE integer, parameter :: ndim = 1 integer(8), dimension(ndim) :: nbXpoints,nbYpoints character(len=40) :: foldername real(8), dimension(:), allocatable :: Xpoints,Ypoints real(8) :: Dx,Dy ! Delta x and Delta y, space between two points in the artificial grid, not important here integer :: i ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers integer(HID_T) :: dspace_Xgrid_id,dspace_Ygrid_id,dset_Xgrid_id,dset_Ygrid_id integer :: error foldername = '/Grid' nbXpoints(1) = 10 nbYpoints(1) = 6 Dx = 0.1d0 Dy = 0.3d0 allocate(Xpoints(1:nbXpoints(1)),Ypoints(1:nbYpoints(1))) ! Generate artificial grid to be stored do i=1,nbXpoints(1) Xpoints(i) = (i-1)*Dx end do do i=1,nbYpoints(1) Ypoints(i) = (i-1)*Dy end do ! Initialize FORTRAN interface. CALL h5open_f (error) ! Open a file using default properties. CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error) ! Try to re-open a specific group (folder). CALL h5gopen_f(file_id,trim(adjustl(foldername)), group_id, error) if(error==0) print *, "Succes in opening the folder :",trim(adjustl(foldername)) ! Create the dataspaces for X and Y. CALL h5screate_simple_f(ndim, nbXpoints(1), dspace_Xgrid_id, error) CALL h5screate_simple_f(ndim, nbYpoints(1), dspace_Ygrid_id, error) ! Create the datasets with default properties. CALL h5dcreate_f(group_id, "Xpositions", H5T_NATIVE_DOUBLE, dspace_Xgrid_id, dset_Xgrid_id, error) CALL h5dcreate_f(group_id, "Ypositions", H5T_NATIVE_DOUBLE, dspace_Ygrid_id, dset_Ygrid_id, error) ! Write data for the datasets. CALL h5dwrite_f(dset_Xgrid_id, H5T_NATIVE_DOUBLE, Xpoints, nbXpoints, error) CALL h5dwrite_f(dset_Ygrid_id, H5T_NATIVE_DOUBLE, Ypoints, nbYpoints, error) ! End access to the datasets and release resources used by them. CALL h5dclose_f(dset_Xgrid_id, error) CALL h5dclose_f(dset_Ygrid_id, error) ! Terminate access to the data spaces. CALL h5sclose_f(dspace_Xgrid_id, error) CALL h5sclose_f(dspace_Ygrid_id, error) ! Terminate access to the group. CALL h5gclose_f(group_id,error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) deallocate(Xpoints,Ypoints) END PROGRAM CFDFIELD2
And you shall see the following information:
./a.out Success in opening the folder :/Grid h5dump CFD-DATA.h5 HDF5 "CFD-DATA.h5" { GROUP "/" { GROUP "Fields" { GROUP "Velocities" { } } GROUP "Grid" { DATASET "Xpositions" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 10 ) / ( 10 ) } DATA { (0): 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 } } DATASET "Ypositions" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6 ) / ( 6 ) } DATA { (0): 0, 0.3, 0.6, 0.9, 1.2, 1.5 } } } } }
You can see that grid information are now stored in the Grid group (like a folder).
We can then add the data Fields, which are 2D fields, of 10×6 points each. (Note that this time, the program is voluntary a little less documented, and some values are passed directly to the subroutines). nbpoints(1:2) is a 2D array that store the number of points per axes (per dimensions).
PROGRAM CFDFIELD3 USE HDF5 IMPLICIT NONE integer(8), dimension(2) :: nbpoints character(len=40) :: foldername real(8), dimension(:,:), allocatable :: U,V,Temp ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers integer(HID_T) :: dspace_U_id,dspace_V_id,dspace_Temp_id,dset_U_id,dset_V_id,dset_Temp_id integer :: error nbpoints(1) = 10 ! = nbXpoints(1) nbpoints(2) = 6 ! = nbYpoints(1) allocate(U(1:nbpoints(1),1:nbpoints(2)),V(1:nbpoints(1),1:nbpoints(2)), & & Temp(1:nbpoints(1),1:nbpoints(2)) ) ! Generate artificial data U(:,:) = 1.7d0 V(:,:) = 2.5d0 Temp(:,:) = 350.0d0 ! Initialize FORTRAN interface. CALL h5open_f (error) ! Open a file using default properties. CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error) ! Try to re-open a specific group (folder). CALL h5gopen_f(file_id,'/Fields/Velocities', group_id, error) ! Create the dataspaces for U and V, with 2 dimensions of size nbpoints(1)*nbpoints(2). CALL h5screate_simple_f(2, nbpoints, dspace_U_id, error) CALL h5screate_simple_f(2, nbpoints, dspace_V_id, error) ! Create the datasets with default properties. CALL h5dcreate_f(group_id, "U", H5T_NATIVE_DOUBLE, dspace_U_id, dset_U_id, error) CALL h5dcreate_f(group_id, "V", H5T_NATIVE_DOUBLE, dspace_V_id, dset_V_id, error) ! Write data for the datasets. CALL h5dwrite_f(dset_U_id, H5T_NATIVE_DOUBLE, U, nbpoints, error) CALL h5dwrite_f(dset_V_id, H5T_NATIVE_DOUBLE, V, nbpoints, error) ! End access to the datasets and release resources used by them. CALL h5dclose_f(dset_U_id, error) CALL h5dclose_f(dset_V_id, error) ! Terminate access to the data spaces. CALL h5sclose_f(dspace_U_id, error) CALL h5sclose_f(dspace_V_id, error) ! Terminate access to the group. CALL h5gclose_f(group_id,error) ! Same thing with Temperature data CALL h5gopen_f(file_id,'/Fields', group_id, error) CALL h5screate_simple_f(2, nbpoints, dspace_Temp_id, error) CALL h5dcreate_f(group_id, "Temperature", H5T_NATIVE_DOUBLE, dspace_Temp_id, dset_Temp_id, error) CALL h5dwrite_f(dset_Temp_id, H5T_NATIVE_DOUBLE, Temp, nbpoints, error) CALL h5dclose_f(dset_Temp_id, error) CALL h5sclose_f(dspace_Temp_id, error) CALL h5gclose_f(group_id,error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) deallocate(U,V,Temp) END PROGRAM CFDFIELD3
You should obtain something like this:
./a.out h5dump CFD-DATA.h5 HDF5 "CFD-DATA.h5" { GROUP "/" { GROUP "Fields" { DATASET "Temperature" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) } DATA { (0,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (1,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (2,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (3,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (4,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (5,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350 } } GROUP "Velocities" { DATASET "U" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) } DATA { (0,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, (1,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, (2,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, (3,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, (4,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, (5,0): 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7, 1.7 } } DATASET "V" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) } DATA { (0,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, (1,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, (2,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, (3,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, (4,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, (5,0): 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5 } } } } GROUP "Grid" { DATASET "Xpositions" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 10 ) / ( 10 ) } DATA { (0): 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 } } DATASET "Ypositions" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6 ) / ( 6 ) } DATA { (0): 0, 0.3, 0.6, 0.9, 1.2, 1.5 } } } } }
Last thing to do is to read these data from another program. The subroutine to read data is :
CALL h5dread_f(dset_id, H5T_NATIVE_DOUBLE, U, nbpoints, error)
The source code is the following:
PROGRAM CFDFIELD4 USE HDF5 IMPLICIT NONE integer(8), dimension(2) :: nbpoints real(8), dimension(:,:), allocatable :: U ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id,group_id ! Identifiers integer :: error,i nbpoints(1) = 10 ! = nbXpoints(1) nbpoints(2) = 6 ! = nbYpoints(1) allocate(U(1:nbpoints(1),1:nbpoints(2))) ! Initialize FORTRAN interface. CALL h5open_f (error) ! Open a file using default properties. CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error) ! Try to re-open a specific group (folder). CALL h5gopen_f(file_id,'/Fields/Velocities', group_id, error) ! Open an existing dataset CALL h5dopen_f(group_id, 'U', dset_id, error) ! Read the data in dataset. CALL h5dread_f(dset_id, H5T_NATIVE_DOUBLE, U, nbpoints, error) ! Print data do i=1,nbpoints(2) write(*,'(10F6.2)') U(:,i) end do ! End access to the datasets and release resources used by them. CALL h5dclose_f(dset_id, error) ! Terminate access to the group. CALL h5gclose_f(group_id,error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) deallocate(U) END PROGRAM CFDFIELD4
And this should give this:
./a.out 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70 1.70
Data were read correctly.
Last thing to do is to add attributes. It is possible to add information to data using attributes. The definition given in the HDF documentation is clear: “An HDF5 attribute is a small metadata object describing the nature and/or intended usage of a primary data object.”. Attribute can be used to add some information on a data (for example time of the simulation corresponding to these data, configuration of the simulation, input information of the program, etc).
The subroutine to open an existing dataset is :
CALL h5dopen_f(group_id, 'Temperature', dset_id, error)
Then, first thing to do is to create a dataspace for the attribute.
CALL h5screate_simple_f(ndim, StringNumber, aspace_id, error)
The second thing to do is to create a type to store character in string form. String is not supported natively in fortran, but there is a trick :
CALL h5tcopy_f(H5T_NATIVE_CHARACTER, atype_id, error) CALL h5tset_size_f(atype_id, StringLenght, error)
The H5T_NATIVE_CHARACTER is locked, the subroutine h5tcopy_f copy the existing datatype. This copy can be manipulated. Then, the new data type is enlarged to StringLength size. This datatype can now be used to store StringLenght size strings.
Then, the attribute must be stored into a specific dataset :
CALL h5acreate_f(dset_id, 'Additional Information', atype_id, aspace_id, attr_id, error)
It is the possible to write the metadata of this attribute :
CALL h5awrite_f(attr_id, atype_id, information, StringNumber, error)
The program bellow opens the existing h5 file, and adds metadata to the 'Temperature' dataset.
PROGRAM CFDFIELD5 USE HDF5 IMPLICIT NONE ! HDF5 integer(HID_T) :: file_id,dspace_id,dset_id,group_id,atype_id,aspace_id,attr_id ! Identifiers integer :: error integer(8) :: StringLenght integer(8), dimension(1) :: StringNumber CHARACTER(LEN=80), DIMENSION(3) :: information real(8) :: Pressure,Time Pressure = 1.015d0 Time = 1.58d-5 StringNumber(1) = 3 StringLenght = 80 information(1) = 'Pressure : ' information(2) = 'Time : ' information(3) = 'Dimensions : 2cm x 3cm' write(information(1),'(A,ES14.7)') trim(adjustl(information(1))),Pressure write(information(2),'(A,ES14.7)') trim(adjustl(information(2))),Time ! Initialize FORTRAN interface. CALL h5open_f (error) ! Open a file using default properties. CALL h5fopen_f('CFD-DATA.h5', H5F_ACC_RDWR_F, file_id, error) ! Try to re-open a specific group (folder). CALL h5gopen_f(file_id,'/Fields', group_id, error) ! Open an existing dataset. CALL h5dopen_f(group_id, 'Temperature', dset_id, error) ! Create scalar data space for the attribute. CALL h5screate_simple_f(1, StringNumber, aspace_id, error) ! Create datatype for the char strings. CALL h5tcopy_f(H5T_NATIVE_CHARACTER, atype_id, error) CALL h5tset_size_f(atype_id, StringLenght, error) ! Create dataset attribute. CALL h5acreate_f(dset_id, 'Additional Information', atype_id, aspace_id, attr_id, error) ! Write the attribute data. CALL h5awrite_f(attr_id, atype_id, information, StringNumber, error) ! Close the attribute. CALL h5aclose_f(attr_id, error) ! End access to the dataset and release resources used by it. CALL h5dclose_f(dset_id, error) ! Terminate access to the data space. CALL h5sclose_f(aspace_id, error) ! Terminate access to the group. CALL h5gclose_f(group_id,error) ! Terminate access to the file. CALL h5fclose_f(file_id, error) ! Close FORTRAN interface. CALL h5close_f(error) END PROGRAM CFDFIELD5
And the result :
./a.out h5dump CFD-DATA.h5 [.........] GROUP "/" { GROUP "Fields" { DATASET "Temperature" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 6, 10 ) / ( 6, 10 ) } DATA { (0,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (1,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (2,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (3,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (4,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350, (5,0): 350, 350, 350, 350, 350, 350, 350, 350, 350, 350 } ATTRIBUTE "Additional Information" { DATATYPE H5T_STRING { STRSIZE 80; STRPAD H5T_STR_SPACEPAD; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 3 ) / ( 3 ) } DATA { (0): "Pressure : 1.0150000E+00 ", (1): "Time : 1.5800000E-05 ", (2): "Dimensions : 2cm x 3cm " } } } [.........]
That is all for the serial way of using HDF5. There are many other useful things, to adapt to different cases, but the bases are here.
One of the most famous rendering software is Paraview, an open-source visualization tool. Paraview has 3D acceleration using OpenGL, making it fast, and easy to use. On top of that, in includes many tools like fft, python scripting, etc.
Paraview has the ability to open HDF5 files, using XDMF index. The simpler way to do that is to add three levels of XDMF: Temporal, Spatial, Data.
We will suppose you ran a 4 thread MPI job, and that each of the thread wrote an HDF5 file, at time 0.0s. Each HDF5 file contains the Grid structure and the data of each individual thread.
The first XDMF would be:
MYCFD.xmf
<?xml version="1.0" ?> <!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []> <Xdmf xmlns:xi="http://www.w3.org/2001/XInclude" Version="2.0"> <Domain> <Grid Name="Solutions" GridType="Collection" CollectionType="Temporal" > <xi:include href="MYCFD_st000000.xmf" xpointer="xpointer(//Xdmf/Domain/Grid)" /> </Grid> </Domain> </Xdmf>
Then, the second level at time 0.0 would be : MYCFD_st000000.xmf
<?xml version="1.0" ?> <!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []> <Xdmf xmlns:xi="http://www.w3.org/2001/XInclude" Version="2.0"> <Domain> <Grid Name="Solutions" GridType="Collection" CollectionType="Spatial" > <Time Value=" 0.0000E+00" /> <xi:include href="MYCFD_st000000_p000000.xmf" xpointer="xpointer(//Xdmf/Domain/Grid)" /> <xi:include href="MYCFD_st000000_p000001.xmf" xpointer="xpointer(//Xdmf/Domain/Grid)" /> <xi:include href="MYCFD_st000000_p000002.xmf" xpointer="xpointer(//Xdmf/Domain/Grid)" /> <xi:include href="MYCFD_st000000_p000003.xmf" xpointer="xpointer(//Xdmf/Domain/Grid)" /> </Grid> </Domain> </Xdmf>
The 4 files refer to the HDF5 files written by the 4 threads at time 0.0 of the simulation. Finally, all of the 4 XDMF files include PATH to the data in the HDF files (assuming the HDF5 files are MYCFD_st000000_p000000.h5, MYCFD_st000000_p000001.h5, MYCFD_st000000_p000002.h5, and MYCFD_st000000_p000003.h5): MYCFD_st000000_p000000.xmf
<?xml version="1.0" ?> <!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []> <Xdmf xmlns:xi="http://www.w3.org/2001/XInclude" Version="2.0"> <Domain> <Grid Name="MYCFD_st000000_p000000" GridType="Uniform"> <Topology TopologyType="2DRectMesh" Dimensions=" 032 032"/> <Geometry GeometryType="VxVy"> <DataItem Dimensions="032" NumberType="Float" Precision="8" Format="HDF"> MYCFD_st000000_p000000.h5:/Grid/Xpositions </DataItem> <DataItem Dimensions="032" NumberType="Float" Precision="8" Format="HDF"> MYCFD_st000000_p000000.h5:/Grid/Ypositions </DataItem> </Geometry> <Attribute Name="U" AttributeType="Scalar" Center="node"> <DataItem Format="HDF" Dimensions=" 032 032" NumberType="Float" Precision="8"> MYCFD_st000000_p000000.h5:/Fields/velocities/U </DataItem> </Attribute> <Attribute Name="V" AttributeType="Scalar" Center="node"> <DataItem Format="HDF" Dimensions=" 032 032" NumberType="Float" Precision="8"> MYCFD_st000000_p000000.h5:/Fields/velocities/V </DataItem> </Attribute> <Attribute Name="Temperature" AttributeType="Scalar" Center="node"> <DataItem Format="HDF" Dimensions=" 032 032" NumberType="Float" Precision="8"> MYCFD_st000000_p000000.h5:/Fields/Temperature </DataItem> </Attribute> </Grid> </Domain> </Xdmf>
Paraview will the load all the data when opening the first file. Note that this is the simplest case. You can find a lot more about GeometryType, AttributeType, TopologyType, etc. on the internet.
First of all, this section requires a few words. The present performances of the MPI-IO, on which is based PHDF5, are really machine dependant, and need tuning to be efficient. The main advantage of PHDF5 is to get only one file from multiple MPI threads, making it easier to share or post-process. But its performances should be taken into account, and PHDF5 should only be used if the cluster administrator can provides you the tuning parameters for the cluster.
Writing files using PHDF5 is the same as with HDF5, but with additional steps:
Create a new property list (h5pcreate_f), and link it to the MPI communicator MPI_COMM_WORLD (h5pset_fapl_mpio_f):
CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL, error)
Then, create the file collectively, using the property list freshly defined, and close the property list.
CALL h5fcreate_f('MPI-CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) CALL h5pclose_f(plist_id, error)
Then, each thread needs to prepare a dataspace called memory-space, which will be used as a buffer when writing data.
CALL h5screate_simple_f(ndim, datasize, memspace, error)
Next step, the hyperslab. The hyperslab is the position in the dataspace where data of the current thread have to be written. For example, in the code above, if we imagine the 2D array of data using x,y positions, the thread 0 will write its data of size 2*2 at x=0, y=0. The thread 1 with 8*2 data will write at x=2, y=0. The thread 2 with 2*8 data will write at x=0,y=2. And the last thread will complete the array, with 8*8 data to be written at x=2,y=2.
10 10 11 11 11 11 11 11 11 11 10 10 11 11 11 11 11 11 11 11 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13 12 12 13 13 13 13 13 13 13 13
So, for thread 0 in blue, data size is 2*2, and offset (offset correspond to the difference from the origin) is 0,0. For thread 1 in red, 8*2 and 2,0. For thread 2 in green, 2*8 and 0,2. And for thread 3 in yellow, 8*8 and 2,2. Each thread creates an hyperslab when specifying datasize and offset.
CALL h5sselect_hyperslab_f (dataspace, H5S_SELECT_SET_F, offset, datasize, error)
Last step before writing; create a property list for collective write in the dataset.
CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) CALL h5pset_dxpl_mpio_f(plist_id, H5FD_MPIO_COLLECTIVE_F, error)
Then, data can be written using the same function as serial HDF5, but with additional arguments:
CALL h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, F, datasize, error, file_space_id = dataspace, mem_space_id = memspace, xfer_prp = plist_id)
The following program does a simple collective write, as described above.
PROGRAM T5 USE HDF5 ! This module contains all necessary modules USE mpi IMPLICIT NONE integer, parameter :: ndim = 2 integer(HSIZE_T), dimension(ndim) :: Nx integer(HID_T) :: file_id ! File identifier integer(HID_T) :: dset_id ! Dataset identifier integer(HID_T) :: dataspace ! Dataspace identifier integer(HID_T) :: memspace ! memspace identifier integer(HID_T) :: plist_id ! Property list identifier integer(HSIZE_T), dimension(ndim) :: datasize integer(HSIZE_T), dimension(ndim) :: offset real(8), allocatable, dimension(:,:) :: F integer :: mpierror,error ! MPI error flag integer :: mpi_size, mpi_rank CALL MPI_INIT(mpierror) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, mpi_size, mpierror) CALL MPI_COMM_RANK(MPI_COMM_WORLD, mpi_rank, mpierror) Nx(1) = 10 Nx(2) = 10 ! mpi_rank = 0 if(mpi_rank==0) allocate(F(2,2)) if(mpi_rank==1) allocate(F(8,2)) if(mpi_rank==2) allocate(F(2,8)) if(mpi_rank==3) allocate(F(8,8)) F(:,:) = mpi_rank+10.0d0 ! Initialize FORTRAN predefined datatypes CALL h5open_f(error) ! Setup file access property list with parallel I/O access. CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL, error) ! Create the file collectively. CALL h5fcreate_f('MPI-CFD-DATA.h5', H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) CALL h5pclose_f(plist_id, error) ! Create the data space for the dataset. CALL h5screate_simple_f(ndim, Nx, dataspace, error) ! Create the dataset with default properties. CALL h5dcreate_f(file_id, 'Temperature', H5T_NATIVE_DOUBLE, dataspace,dset_id, error) datasize(1) = size(F,1) datasize(2) = size(F,2) CALL h5screate_simple_f(ndim, datasize, memspace, error) if(mpi_rank==0) then; offset(1) = 0; offset(2) = 0; endif if(mpi_rank==1) then; offset(1) = 2; offset(2) = 0; endif if(mpi_rank==2) then; offset(1) = 0; offset(2) = 2; endif if(mpi_rank==3) then; offset(1) = 2; offset(2) = 2; endif ! Select hyperslab in the file. CALL h5sselect_hyperslab_f (dataspace, H5S_SELECT_SET_F, offset, datasize, error) ! Create property list for collective dataset write CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) CALL h5pset_dxpl_mpio_f(plist_id, H5FD_MPIO_COLLECTIVE_F, error) CALL h5dwrite_f(dset_id, H5T_NATIVE_DOUBLE, F, datasize, error, & file_space_id = dataspace, mem_space_id = memspace, xfer_prp = plist_id) ! Close dataspaces. CALL h5sclose_f(dataspace, error) CALL h5sclose_f(memspace, error) ! Close the dataset and property list. CALL h5dclose_f(dset_id, error) CALL h5pclose_f(plist_id, error) ! Close the file. CALL h5fclose_f(file_id, error) CALL h5close_f(error) CALL MPI_FINALIZE(mpierror) END PROGRAM T5
And the result:
mpirun -n 4 ./a.out h5dump MPI-CFD-DATA.h5 HDF5 "MPI-CFD-DATA.h5" { GROUP "/" { DATASET "Temperature" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 10, 10 ) / ( 10, 10 ) } DATA { (0,0): 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, (1,0): 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, (2,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (3,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (4,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (5,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (6,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (7,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (8,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, (9,0): 12, 12, 13, 13, 13, 13, 13, 13, 13, 13 } } } }