doc/documentation.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194

% Thorn documentation template
\documentclass{article}
\begin{document}

\title{IOHDF5}
\author{Thomas Radke}
\date{1999}
\maketitle

\abstract{
Thorn IOHDF5 provides an I/O method to output variables in HDF5 file format.
It also implements checkpointing/recovery functionality using HDF5.
}
%
\section{Purpose}
%
Thorn IOHDF5 uses the standard I/O library HDF5 (Hierarchical Data Format
version 5, see {\tt http://hdf.ncsa.uiuc.edu/whatishdf5.html} for details)
to output any type of grid variables (grid scalars, grid functions, and arrays
of arbitrary dimension) in the HDF5 file format.\\

Output is done by invoking the {\tt IOHDF5} I/O method which thorn IOHDF5
registers with the flesh's I/O interface at startup.\\

You obtain output by either
\begin{itemize}
  \item setting the appropriate I/O parameters in your parameter files, eg.
\begin{verbatim}
  IOHDF5::out_every = 10
  IOHDF5::out_vars  = "wavetoy::phi"
\end{verbatim}
  \item calling one the flesh's I/O interface routines in your thorn's
        code, eg.
\begin{verbatim}
  CCTK_OutputVarByMethod (cctkGH, "wavetoy::phi", "IOHDF5");
\end{verbatim}
\end{itemize}

Data is written into files named {\tt "<varname>.h5"}.
Such datafiles can be used for further postprocessing (eg. visualization)
or fed back into Cactus via the filereader capabilities of thorn IOUtil.


\subsection{Parallel File I/O}

According to the ouptput mode parameter settings ({\tt IO::out3D\_mode,
IO::out3D\_unchunked, IO::out3D\_procs}) of thorn IOUtil, thorn IOHDF5
will output distributed data either
\begin{itemize}
  \item in serial into a single unchunked file
\begin{verbatim}
  IO::out3D_mode      = "onefile"
  IO::out3D_unchunked = "yes"
\end{verbatim}
  \item in parallel, that is, into separate files containing chunks of the
        individual processors' patches of the distributed array
\begin{verbatim}
  IO::out3D_mode      = "proc | np"
\end{verbatim}
\end{itemize}
The default is to output data in parallel, in order to get maximum I/O
performance. If needed, you can recombine the resulting chunked datafiles
into a single unchunked file using the recombiner utility program provided
in {\tt IOHDF5/src/util/}.\\

To build the recombiner just do a

\begin{verbatim}
  make <configuration>-utils
\end{verbatim}

in the Cactus toplevel directory. The recombiner executable {\tt
hdf5\_recombiner} will be placed in the {\tt exe/<configuration>/}
subdirectory.


\subsection{Checkpointing \& Recovery}

Thorn IOHDF5 can also be used for creating HDF5 checkpoint files and recovering
from such files later on.\\

Checkpoint routines are scheduled at several timebins so that you can save
the current state of your simulation after the initial data phase,
during evolution, or at termination.
A recovery routine is registered with thorn IOUtil in order to restart
a new simulation from a given HDF5 checkpoint.
The very same recovery mechanism is used to implement a filereader
functionality to feed back data into Cactus.\\

Checkpointing and recovery are controlled by corresponding checkpoint/recovery
parameters of thorn IOUtil (for a description of these parameters please refer
to this thorn's documentation).


\section{Comments}

\subsection{Importing external data into Cactus with IOHDF5}

In order to import external data into Cactus (eg. to initialize some variable)
you first need to convert this data into an HDF5 datafile which then can be
processed by the registered recovery routine of thorn IOHDF5.\\

The following description explains the HDF5 file layout of an unchunked
datafile which thorn IOHDF5 expects in order to restore Cactus variables
from it properly. There is also a well-documented example C program provided
({\tt IOHDF5/doc/CreateIOHDF5datafile.c}) which illustrates how to create
a datafile with IOHDF5 file layout. This working example can be used as a
template for building your own data converter program.\\

\begin{enumerate}
  \item Actual data is stored as multidimensional datasets in an IOHDF5 file.
        There is no nested grouping structure, every dataset is located
        in the root group.\\
        A dataset's name must match the following naming pattern which
        guarantees to generate unique names:
\begin{verbatim}
  "<full variable name> timelevel <timelevel> at iteration <iteration>"
\end{verbatim}
        IOHDF5's recovery routine parses a dataset's name according to this
        pattern to determine the Cactus variable to restore, along with its
        timelevel. The iteration number is just informative and not needed here.

  \item The type of your data as well as its dimensions are already
        inherited by a dataset itself as metainformation. But this is not
        enough for IOHDF5 to safely match it against a specific Cactus variable.
        For that reason, the variable's groupname, its grouptype, and the
        total number of timelevels must be attached to every dataset
        as attribute information.

  \item Finally, the recovery routine needs to know how the datafile to
        recover from was created:
        \begin{itemize}
          \item Does the file contain chunked or unchunked data ?
          \item How many processors were used to produce the data ?
          \item How many I/O processors were used to write the data ?
        \end{itemize}
        Such information is put into as attributes into a group named
        {\tt "Global Attributes"}. Since we assume unchunked data here
        the processor information isn't relevant --- unchunked data can
        be fed back into a Cactus simulation running on an arbitrary
        number of processors.
\end{enumerate}

The example C program goes through all of these steps and creates a datafile
{\tt x\_3d.h5} in IOHDF5 file layout which contains a single dataset named
{\tt "grid::x timelevel 0 at iteration 0"}, with groupname
{\tt "grid::coordinates"}, grouptype {\tt CCTK\_GF} (thus identifying the
variable as a grid function), and the total number of timelevels set to 1.\\
The global attributes are set to
{\tt "unchunked" $=$ "yes", nprocs $=$ 1,} and {\tt ioproc\_every $=$ 1}.\\

Once you've built and ran the program you can easily verify if it worked
properly with
\begin{verbatim}
  h5dump x_3d.h5
\end{verbatim}
which lists all objects in the datafile along with their values.
It will also dump the contents of the 3D dataset. Since it only contains zeros
it would probably not make much sense to feed this datafile into Cactus for
initializing your x coordinate grid function :-)
%
%
\subsection{Other utility programs in IOHDF5}
%
In addition to the HDF5 recombiner program, thorn IOHDF5 also provides
some other utilities which can be build the same way:
%
\begin{itemize}
  \item {\tt hdf5\_convert\_from\_ieeeio.c}\\
    Converts a datafile created by thorn IOFlexIO into an HDF5 datafile.
  \item {\tt hdf5\_merge.c}\\
    Merges a list of HDF5 input files into a single HDF5 output file.
    This can be used to concatenate HDF5 output data created as one file per
    timestep.
  \item {\tt hdf5\_extract.c}\\
    Extracts a given list of named objects (groups or datasets) from an HDF5
    input file and writes them into a new HDF5 output file.
    This is the reverse operation to what {\tt hdf5\_merge.c} does. Useful eg.
    for extracting individual timesteps from a time series HDF5 datafile.
\end{itemize}
%
All utility programs are self-explaining -- just call them without arguments
to get a short usage info.
%
If any of these utility programs is called without arguments it will print
a usage message.
%
% Automatically created from the ccl files
% Do not worry for now.
\include{interface}
\include{param}
\include{schedule}

\end{document}