Carpet/CarpetIOHDF5/doc/documentation.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341

\documentclass{article}

% Use the Cactus ThornGuide style file
% (Automatically used from Cactus distribution, if you have a 
%  thorn without the Cactus Flesh download this from the Cactus
%  homepage at www.cactuscode.org)
\usepackage{../../../../doc/ThornGuide/cactus}

\begin{document}

% The author of the documentation
\author{Erik Schnetter \textless schnetter@uni-tuebingen.de\textgreater\\
        Christian D. Ott \textless cott@aei.mpg.de\textgreater\\
        Thomas Radke \textless tradke@aei.mpg.de\textgreater}

% The title of the document (not necessarily the name of the Thorn)
\title{CarpetIOHDF5}

% the date your document was last changed, if your document is in CVS, 
% please use:
\date{1 December 2004}

\maketitle

% Do not delete next line
% START CACTUS THORNGUIDE

\newcommand{\ThisThorn}{{\it CarpetIOHDF5}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{abstract}
Thorn \ThisThorn\ provides HDF5-based output to the {\em Carpet} mesh
refinement driver in {\em Cactus}.
This document explains \ThisThorn's usage and contains a specification
of the HDF5 file format that was adapted from John Shalf's FlexIO library.
\end{abstract}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}

Having encountered various problems with the Carpet I/O thorn
{\bf CarpetIOFlexIO} and the underlying FlexIO library,
Erik Schnetter decided to write this thorn \ThisThorn\ which bypasses
any intermediate binary I/O layer and outputs in HDF5\footnote{Hierarchical
Data Format version 5, see {\tt http://hdf.ncsa.uiuc.edu/whatishdf5.html}
for details} file format directly.

\ThisThorn\ provides output for the {\em Carpet} Mesh Refinement driver
within the Cactus Code. Christian D. Ott added  a file reader (analogous to
Erik Schnetter's implementation present in {\bf CarpetIOFlexIO}) 
as well as checkpoint/recovery functionality to \ThisThorn.
Thomas Radke has taken over maintainence of this I/O thorn and is continuously
working on fixing known bugs and improving the code functionality and
efficiency.

The \ThisThorn\ I/O method can output any type of CCTK grid variables
(grid scalars, grid functions, and grid arrays of arbitrary dimension);
data is written into separate files named {\tt "<varname>.h5"}.
It implements both serial and full parallel I/O --
data files can be written/read either by processor 0 only or by all processors.
Such datafiles can be used for further postprocessing (eg. visualization with
OpenDX or DataVault\footnote{see our VizTools page at \url{http://www.cactuscode.org/VizTools.html}
for details}) or fed back into Cactus via the filereader capabilities of thorn
{\bf IOUtil}.

This document aims at giving the user a first handle on how to use
\ThisThorn. It also documents the HDF5 file layout used.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{\ThisThorn\ Parameters}

Parameters to control the \ThisThorn\ I/O method are:

\begin{itemize}
  \item {\tt IOHDF5::out\_every} (steerable)\\
        How often to do periodic \ThisThorn\ output. If this parameter is set
        in the parameter file, it will override the setting of the shared
        {\tt IO::out\_every} parameter. The output frequency can also be set
        for individual variables using the {\tt out\_every} option in an option
        string appended to the {\tt IOHDF5::out\_vars} parameter.

  \item {\tt IOHDF5::out\_dt} (steerable)\\
        output in intervals of that much coordinate time (overwrites {\tt IO::out\_dt})

  \item {\tt IOHDF5::out\_criterion} (steerable)\\
        criterion to select output intervals (overwrites {\tt IO::out\_criterion})

  \item {\tt IOHDF5::out\_vars} (steerable)\\
        The list of variables to output using the \ThisThorn\ I/O method.
        The variables must be given by their fully qualified variable or group
        name. The special keyword {\it all} requests \ThisThorn\ output for
        all variables. Multiple names must be separated by whitespaces.

        Each group/variable name can have an option string attached in which you
        can specify a different output frequency for that individual variable
	or a set of individual refinement levels to be output, e.g.
\begin{verbatim}
  IOHDF5::out_vars = "wavetoy::phi{ out_every = 4 refinement_levels = { 1 2 } }"
\end{verbatim}

  \item {\tt IOHDF5::out\_dir}\\
        The directory in which to place the \ThisThorn\ output files.
        If the directory doesn't exist at startup it will be created.
        If parallel output is enabled and the directory name contains the
        substring {\tt "\%u"} it will be substituted by the processor ID.
        By this means each processor can have its own output directory.\\
        If this parameter is set to an empty string \ThisThorn\ output will go
        to the standard output directory as specified in {\tt IO::out\_dir}.

  \item {\tt IO::out\_single\_precision (steerable)}\\
        whether to output double-precision data in single precision

\end{itemize}


\section{Serial versus Parallel Output}

According to the ouptput mode parameter settings of ({\tt IO::out\_mode},
{\tt IO::out\_unchunked},\newline{\tt IO::out\_proc\_every}) of thorn
{\bf IOUtil}, thorn \ThisThorn\ will output distributed grid variables either

\begin{itemize}
  \item in serial from processor 0 into a single unchunked file
\begin{verbatim}
  IO::out_mode      = "onefile"
  IO::out_unchunked = "yes"
\end{verbatim}

  \item in serial from processor 0 into a single chunked file
\begin{verbatim}
  IO::out_mode      = "onefile"
  IO::out_unchunked = "no"
\end{verbatim}

  \item in parallel, that is, into separate chunked files (one per processor)
        containing the individual processors' patches of the
        distributed grid variable
\begin{verbatim}
  IO::out_mode      = "proc"
\end{verbatim}
\end{itemize}

For unchunked data all interprocessor ghostzones are excluded from the output.
The entire grid variable in contained in a single HDF5 dataset.
Chunked output includes all information from all processors as chunks in
separate HDF5 datasets (thus adding some overhead in storing metadata).
When visualising chunked data files, they probably need to be recombined
for a global view on the data.

The default is to output distributed grid variables in parallel, each processor
writing a file
{\tt \textless varname\textgreater.file\_\textless processor ID\textgreater.h5}.
Grid scalars
and {\tt DISTRIB = CONST} grid arrays are always output as unchunked data
on processor 0 only.\\
Parallel output in a parallel simulation will ensure maximum I/O
performance. Note that changing the output mode to serial I/O might only be
necessary if the data analysis and visualisation tools cannot deal with
chunked output files. Cactus itself, as well as many of the tools to
visualise Carpet HDF5 data (see \url{http://www.cactuscode.org/VizTools.html}),
can process both chunked and unchunked data. For instance, to visualise parallel
output datafiles with DataVault, you would just send all the individual files
to the DV server: {\tt hdf5todv phi.file\_*.h5}. In OpenDX the {\tt
ImportCarpetIOHDF5} module can be given any filename from the set of parallel
chunked files; the module will determine the total number of files in the set
automatically and read them all.


\section{Using the flesh I/O API to produce HDF5 output}

Periodic output of grid variables is usually specified via I/O parameters
in the parameter file and then automatically triggered by the flesh scheduler
at each iteration step after analysis. If output should also be triggered
at a different time, one can do that from within an application thorn by
invoking one of the {\tt CCTK\_OutputVar*()} I/O routines provided
by the flesh I/O API (see chapter B8.2 ``IO'' in the Cactus Users Guide).
In this case, the application thorn routine which calls {\tt CCTK\_OutputVar*()}
must be scheduled in level mode.

It should be noted here that -- due to a restriction in the naming scheme of
objects in an HDF5 data file -- \ThisThorn\ can output a given grid variable
with given refinement level only once per timestep. Attempts of application
thorns to trigger the output of the same variable multiple times during an iteration
will result in a runtime warning and have no further effect.
If output for a variable is required also for intermediate timesteps
this can be achieved by calling {\tt CCTK\_OutputVarAs*()} with a different
{\tt alias} name; output for the same variable is then written into
different HDF5 files based on the {\tt alias} argument.


\section{Checkpointing \& Recovery and Importing Data}

Thorn \ThisThorn\ can also be used to create HDF5 checkpoint files and
to recover from such files later on. In addition it can read HDF5 datafiles
back in using the generic filereader interface described in the thorn
documentation of {\bf IOUtil}.

Checkpoint routines are scheduled at several timebins so that you can save
the current state of your simulation after the initial data phase,
during evolution, or at termination. Checkpointing for thorn \ThisThorn\ 
is enabled by setting the parameter {\tt IOHDF5::checkpoint = "yes"}.

A recovery routine is registered with thorn {\bf IOUtil} in order to restart
a new simulation from a given HDF5 checkpoint.
The very same recovery mechanism is used to implement a filereader
functionality to feed back data into Cactus.

Checkpointing and recovery are controlled by corresponding checkpoint/recovery
parameters of thorn {\bf IOUtil} (for a description of these parameters please
refer to this thorn's documentation).


\section{Example Parameter File Excerpts}

\subsection{Serial (unchunked) Output of Grid Variables}

\begin{verbatim}
  # how often to output and where output files should go
  IO::out_every = 2
  IO::out_dir   = "wavetoy-data"

  # request output for wavetoy::psi at every other iteration for timelevel 0,
  #                for wavetoy::phi every 4th iteration with timelevels 1 and 2
  IOHDF5::out_vars = "wavetoy::phi{ out_every = 4 refinement_levels = { 1 2 } }
                      wavetoy::psi"

  # we want unchunked output
  # (because the visualisation tool cannot deal with chunked data files)
  IO::out_mode      = "onefile"
  IO::out_unchunked = 1
\end{verbatim}

\subsection{Parallel (chunked) Output of Grid Variables}

\begin{verbatim}
  # how often to output
  IO::out_every = 2

  # each processor writes to its own output directory
  IOHDF5::out_dir = "wavetoy-data-proc%u"

  # request output for wavetoy::psi at every other iteration for timelevel 0,
  #                for wavetoy::phi every 4th iteration with timelevels 1 and 2
  IOHDF5::out_vars = "wavetoy::phi{ out_every = 4 refinement_levels = { 1 2 } }
                      wavetoy::psi"

  # we want parallel chunked output (note that this already is the default)
  IO::out_mode = "proc"
\end{verbatim}

\subsection{Checkpointing \& Recovery}

\begin{verbatim}
  # say how often we want to checkpoint, how many checkpoints should be kept,
  # how the checkpoints should be named, and they should be written to
  IO::checkpoint_ID   = 100
  IO::checkpoint_keep = 2
  IO::checkpoint_file = "wavetoy"
  IO::checkpoint_dir  = "wavetoy-checkpoints"

  # enable checkpointing for CarpetIOHDF5
  IOHDF5::checkpoint = "yes"

  #######################################################

  # recover from the latest checkpoint found
  IO::recover_file = "wavetoy"
  IO::recover_dir  = "wavetoy-checkpoints"
  IO::recover      = "auto"
\end{verbatim}

\subsection{Importing Grid Variables via Filereader}

\begin{verbatim}
  # which data files to import and where to find them
  IO::filereader_ID_files = "phi psi"
  IO::filereader_ID_dir   = "wavetoy-data"

  # what variables and which timestep to read
  # (if this parameter is left empty, all variables and timesteps found
  #  in the data files will be read)
  IO::filereader_ID_vars  = "WaveToyMoL::phi{ cctk_iteration = 0 }
                             WaveToyMoL::psi"
\end{verbatim}


\iffalse
\section{CarpetIOHDF5's HDF5 file layout}

The HDF5 file layout of {\bf CarpetIOHDF5} is quite simple.
There are no groups besides the standard HDF5 root data object group:

Each dataset is named according to this template:

\begin{verbatim}
  <group::varname> it=<cctk_iteration> tl=<timelevel> [ml=<mglevel>] [m=<map>]
  [rl=<reflevel>] [c=<component>]}
\end{verbatim}

where optional parts only contribute to the name if they vary (if there is
more than one multigrid level, map, refinement level, component respectively).

Each HDF5 dataset has the following attributes associated with it:

\begin{itemize}
  \item {\tt level} : Carpet::reflevel
  \item {\tt origin} : 1-D array of length vdim. \\
        origin[d] = CCTK\_ORIGIN\_SPACE(d) + cctk\_lbnd[d] * delta[d]
  \item {\tt delta} : 1-D array of length vdim. \\
        delta[d] = CCTK\_DELTA\_SPACE(d)
  \item {\tt time} : cctk\_time
  \item {\tt timestep} : cctk\_iteration
  \item {\tt iorigin} : 1-D array of length vdim. \\ iorigin[d] = (Carpet::ext.lower() / Carpet::ext.stride())[d]
  \item {\tt name} : CCTK\_FullName(variable index)
  \item {\tt cctk\_bbox} : 1-D array of length 2*Carpet::dim. cctk\_box
  \item {\tt cctk\_nghostzones} : 1-D array of length Carpet::dim. cctk\_nghostzones
  \item {\tt carpet\_mglevel} : Carpet::mglevel
  \item {\tt carpet\_reflevel} : Carpet::reflevel
\end{itemize}


\subsection{Attributes needed by the file reader}

The number of attributes needed by the CarpetIOHDF5 file reader is much smaller then the total
number of attributes attached to each dataset:

\begin{itemize}
  \item {\tt name}
  \item {\tt level}
  \item {\tt iorigin}
\end{itemize}

\fi


% Do not delete next line
% END CACTUS THORNGUIDE

\end{document}