aboutsummaryrefslogtreecommitdiff
path: root/Carpet/Carpet/doc
diff options
context:
space:
mode:
authoreschnett <>2001-03-26 00:27:00 +0000
committereschnett <>2001-03-26 00:27:00 +0000
commite9020906e0ce2db439522ccdba76f4936ae12bde (patch)
tree743ce0ed741920216a60e509bef1f4bc5462084f /Carpet/Carpet/doc
parent4123f52ffccee9997e4c228376ec131f773da364 (diff)
Added documentation.
darcs-hash:20010326002722-f6438-4bab305f4955af4a8712f8a1fd1f24f8926a2652.gz
Diffstat (limited to 'Carpet/Carpet/doc')
-rw-r--r--Carpet/Carpet/doc/howto.html536
1 files changed, 536 insertions, 0 deletions
diff --git a/Carpet/Carpet/doc/howto.html b/Carpet/Carpet/doc/howto.html
new file mode 100644
index 000000000..911d4b840
--- /dev/null
+++ b/Carpet/Carpet/doc/howto.html
@@ -0,0 +1,536 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- $Header: /home/eschnett/C/carpet/Carpet/Carpet/Carpet/doc/Attic/howto.html,v 1.1 2001/03/26 02:27:22 eschnett Exp $ -->
+<html>
+ <head>
+ <title>Fixed Mesh Refinement in Cactus using Carpet</title>
+ </head>
+
+ <body>
+ <h1>Fixed Mesh Refinement in Cactus using Carpet</h1>
+
+
+
+ <h2>Overview</h2>
+
+ <h3>Fixed Mesh Refinement, aka Box-in-Box</h3>
+
+ <p>Fixed Mesh Refinement (FMR), also known as box-in-box, is a way
+ to increase the local resolution in unigrid applications, while
+ retaining the basic unigrid character of an application. A small
+ number (maybe two or three) of grids with varying resolution
+ overlay each other, where the coarsest grid has the largest
+ extent. This allows the application to benefit from the higher
+ resolution of the smaller grids while keeping the outer boundary
+ far out at the same time. The main advantage of FMR are that it
+ needs far less resources than globally increasing the
+ resolution.</p>
+
+ <h3>Carpet</h3>
+
+ <p>Carpet is the name of an FMR driver, i. e. the back end that
+ handles storage allocation for the grid functions, parallelism,
+ I/O, and the various inter-grid operations. Carpet was developed
+ in early summer of 2000 by <a
+ href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a>, then
+ a research scholar in the <a
+ href="http://www.astro.psu.edu/">Department for Astronomy and
+ Astrophysics</a> of <a href="http://www.psu.edu/">Penn State
+ University</a>. In spring 2001, Carpet was coupled to Cactus as a
+ drop-in enhancement for the standard unigrid Cactus driver
+ PUGH.</p>
+
+ <h3>Cactus</h3>
+
+ <p>From the <a href="http://www.cactuscode.org">main Cactus web
+ page</a>: "Cactus is an open source problem solving environment
+ designed for scientests and engineers. Its modular structure
+ easily enables parallel computation across different architectures
+ and collaborative code development between different groups.
+ Cactus originated in the academic research community, where it was
+ developed and used over many years by a large international
+ collaboration of physicists and computational scientists."</p>
+
+
+
+ <h2>Introduction</h2>
+
+ <h3>Fixed Mesh Refinement</h3>
+
+ <p>A standard way of solving partial differential equations are
+ finite differences on a regular grid. This is also called
+ "unigrid". Such an application discretises its problem space onto
+ a single, rectangular grid which has everywhere the same grid
+ spacing. This grid might be broken up into several parts for
+ parallelisation purposes, but parallelisation should be
+ transparent to the physics part of the application.</p>
+
+ <p>Increasing the resolution in a unigrid application is somewhat
+ expensive. For example, increasing the resolution by a factor of
+ two requires a factor of eight more storage in three dimensions.
+ Given a constant Courant factor, the calculation time will even go
+ up by a factor of sixteen. This behaviour makes it easy to find
+ problems that cannot be solved on contemporary supercomputers, no
+ matter how big and fast those computers are.</p>
+
+ <p>Apart from physical insight, which then often has to be used to
+ decrease the problem size until it fits the current hardware,
+ there are also numerical and algorithmic methods to decrease the
+ resource requirements of the application. Most applications need
+ the high resolution only in a part of the simulation domain.
+ Discretisation methods that don't require a uniform resolution,
+ such as finite elements, can implement non-uniform resolutions
+ very naturally. One problem with finite elements is that many
+ physicists today are not familiar with finite elements, or shy
+ away from their perceived complexity, or are not willing to adapt
+ existing finite difference code.</p>
+
+ <p>Fixed Mesh Refinement (FMR) is a poor man's way of implementing
+ a non-uniform resolution into a unigrid application with minimal
+ changes to its structure. Instead of only one grid, there are
+ several grids or grid patches with different resolutions. The
+ coarsest grid usually encloses the whole simulation domain.
+ Successively finer grids overlay the coarse grid at those
+ locations where a higher resolutions is needed. The coarser grids
+ provide boundary conditions to the finer grid through
+ interpolation.</p>
+
+ <p>Instead of updating only one grid, the application has to
+ update all grids. The usual approach is to first take a step on
+ the coarsest grid, and then recursively take several smaller steps
+ on the finer grids. The Courant criterion requires that the step
+ sizes on the finer grids be smaller than on the coarse grid. The
+ boundary values for the finer grids are found through
+ interpolation in space and time from the coarser grid. In the
+ end, the information on the finer grids is injected into the
+ coarse grids.</p>
+
+ <p>Strictly speaking there is no need for a coarse grid on the
+ regions covered by the finer grids. But as stated above, the
+ resources required for treating the overlapping region on the
+ coarse grid are only minimal compared to treating the finer grids.
+ And because a coarse grid with a hole often creates complications,
+ this obvious optimisation is often left out.</p>
+
+ <h3>Carpet</h3>
+
+ <p>Carpet is a C++ library that provides infrastructure to
+ describe regions of varying resolution in a convenient and
+ efficient way. Carpet contains routines to manage grid
+ hierarchies, containing the relationships between the components
+ of the grid on the different refinement and convergence levels.
+ Carpet has a notion of simulation time and grid spacing, which are
+ necessary for interpolation, and contains efficient
+ interpolators.</p>
+
+ <p>Carpet can run on several processors in parallel using MPI for
+ communication. Each grid can be broken down into several
+ components, and every component has a home processor. Carpet also
+ contains operators to move certain regions to a different
+ processor, or to synchronise all components of a grid.</p>
+
+ <p>Carpet is also an arrangement of thorns for Cactus,
+ implementing a driver and associated I/O routines for both ASCII
+ and binary I/O. It should be possible to substitute Carpet for
+ the standard Cactus driver PUGH without changes to the application
+ thorns and thus use Carpet as a unigrid driver. Making use of the
+ FMR capabilities of Carpet usually requires some rearranging of
+ the application, comparable in general to the changes necessary
+ for a uniprocessor application to run on multiple processors.</p>
+
+ <p>The driver section of Carpet contains the logic to manage
+ storage for the grid functions, to traverse the grid hierarchy for
+ all scheduled routines, and to automatically apply the necessary
+ inter-grid operators for prolongation (interpolation of the fine
+ grid boundaries) and restriction (injecting the fine grid
+ information back into the coarse grid).</p>
+
+ <p>The ASCII I/O routines use the quasi-standard <a
+ href="gnuplot">gnuplot</a> format. The binary I/O routines use
+ the <a href="FlexIO">FlexIO library</a> written by John Shalf. It
+ allows efficient and platform independent I/O. The FlexIO format
+ is based on <a href="HDF">HDF</a> and also supported by several
+ visualisation packages.</p>
+
+ <p>Carpet is copyrighted by Erik Schnetter, and is available under
+ the LGPL licence from a <a href="CVS">CVS</a> repository.</p>
+
+ <h3>WaveToy</h3>
+
+ <p>Cactus comes with a sample application called "WaveToy", which
+ solves the scalar wave equation with various initial data and
+ boundary conditions. An an example, I have extended WaveToy so
+ that is uses Carpet's FMR capabilities. WaveToy serves both as a
+ test case for Carpet, and as example of how to convert an
+ application to using FMR.</p>
+
+ <p>The equation solved by WaveToy is the well known scalar wave
+ equation, discretised using the Leapfrog method with three time
+ levels, yielding second order accuracy in space and time. A
+ typical set of initial data are a plane wave, and a typical
+ boundary condition is periodicity. Those allow long term
+ simulations as well as easy and meaningful comparisons to the
+ analytic solution.</p>
+
+
+
+ <h2>Compiling Cactus With Carpet</h2>
+
+ <p>Carpet has been written in C++, using templates and the STL
+ (Standard Template Library). Both templates and the STL make
+ writing and debugging code a lot easier. Without templates, I
+ would have had to put much effort into making Carpet support all
+ of Cactus' data types. Without the STL, I would have had to spend
+ quite some time implementing basic containers such as lists or
+ sets. I still had to implement a custom vector type, because
+ STL's vector type is optimised for large vectors only, and I
+ needed threedimensional vectors of integers.</p>
+
+ <p>The inner loops of Carpet are the inter-grid operators, that is
+ the routines that copy, restrict, and prolongate between grids.
+ Due to Cactus it was rather easy to write these in Fortran 77,
+ which makes them both fast and portable.</p>
+
+ <p>Carpet is an arrangement in Cactus. It can theoretically be
+ compiled without any other external library, if you don't include
+ the binary I/O support which requires FlexIO. I do recommend
+ using FlexIO, so you should install the FlexIO library first.
+ Although FlexIO is already part of Cactus in the thorn called
+ CactusExternal/FlexIO, this seems to be a version that has FMR
+ support disabled and is hence not usable. You will have to
+ install a complete copy of FlexIO by hand.</p>
+
+ <h3>Hurdle 1: FlexIO</h3>
+
+ <p>DESCRIBE INSTALLING FLEXIO HERE. MENTION INSTALLING HDF4
+ FIRST, THEN HDF5, THEN FLEXIO</p>
+
+ <h3>Hurdle 2: STL</h3>
+
+ <p>DESCRIBE HOW TO GET/INSTALL AN STL, OR HOW TO MAKE THE SYSTEM'S
+ C++ COMPILER WORK WITH THE STL</p>
+
+ <h3>Hurdle 3: Templates</h3>
+
+ <p>DESCRIBE HOW TO INSTANTIATE, AND NOT-INSTANTIATE, THE TEMPLATES
+ AS NEEDED.</p>
+
+ <h3>WaveToy</h3>
+
+ <p>CONFIGURING, THORNS, COMPILING</p>
+
+ <p>To be continued...</p>
+
+
+
+ <h2>Running The Example Applications</h2>
+
+ <p>SAMPLE FILES</p>
+
+ <p>CARPET'S OPTIONS</p>
+
+ <p>To be continued...</p>
+
+ <p>Second order convergence requires second order interpolation in
+ time, which requires that at least three time levels are
+ present.</p>
+
+
+
+ <h2>Fold Your Own FMR Application</h2>
+
+ <p>There are three steps to take from a simple unigrid
+ uniprocessor toy application to a full-blown FMR multiprocessor
+ production application. Those steps are almost independent, and I
+ would like to explain them and their implications in some detail
+ below.</p>
+
+ <h3>Multiple Processors</h3>
+
+ <p>The probably best known of these is the step from using one to
+ using several processors, also known as parallelisation. Because
+ many people are already familiar with this step, I will describe
+ it first.</p>
+
+ <p>In a uniprocessor application, it is possible to access every
+ grid point in arbitrary manners. In order to allow multiple
+ processors to run efficiently in parallel, the grid is broken down
+ into several rectangular components, and each processor is
+ assigned one of these components.</p>
+
+ <p>The components will usually overlap by a few grid points, so as
+ to allow the processors to e. g. calculate spatial derivatives
+ (which require neighbouring grid points) without having to
+ communicate for every grid point. From time to time it is then
+ necessary to synchronise the overlapping region, which is the only
+ time at which communication happens. This allows the application
+ to run almost unchanged, i. e. without invoking communication
+ itself. The synchronisation routine is provided by the driver and
+ not by the application.</p>
+
+ <p>Of course a serial applicate usually will have to be changed to
+ support multiple processors. In order to do so, all the
+ operations that the application performs have to be classified
+ into one of two categories:</p>
+
+ <p>One category contains the so-called <em>local</em> operations.
+ These are operations that are applied to each and every grid point
+ individually, and that do not depend on any other grid point
+ except nearby neighbours. Each local operation will thus involve
+ a loop over all grid points, and in order to run on multiple
+ processors, after each such loop the synchronisation routine has
+ to be called. An example of a local operation would be
+ calculating a spatial derivative.</p>
+
+ <p>The other category contains so-called <em>global</em>
+ operations. These operations do not depend on individual grid
+ points, and thus do not involve loops over grid points. The
+ result of a global operation is the same on all processors;
+ therefore global operations don't involve communication and don't
+ require synchronisation. An example of a global operation would
+ be to check how many time steps have been taken, and decide
+ whether the simulation should be terminated.</p>
+
+ <p>Typically most operations can be classified or rewritten to be
+ either local or global. But often there are operations that fit
+ neither category, and these parts of an application are hardest to
+ parallelise. Applying the boundary conditions, to give another
+ example, might seem at first to be neither local nor global. But
+ in a slight (yet completely correct) stretch of the term "applied
+ to all grid points", boundary conditions can be classified as
+ local; they are a local operation that just does nothing to most
+ grid points.</p>
+
+ <p>To give one more example, calculating an error norm does not
+ fit these categories. It is neither local nor global. It is not
+ local because the results involved all grid points (and not only
+ nearby neighbours), and it is not global because it does involve
+ the grid points. All operations that do not fit the two category
+ require typically special handling, and often require hand-coded
+ communication in the application. Luckily calculating various
+ norms is such a common case that there are special routines for
+ that already present, called <em>reduction operators</em>.</p>
+
+ <h3>Multiple Resolution Levels</h3>
+
+ <p>There are several reasons why an application might want to
+ incorporate more than one grid, overlapping and each with a
+ different resolution.</p>
+
+ <p>The most commonly known reason is probably a convergence test,
+ where the very same problem is treated in different resolutions.
+ Differences in the result are then likely caused by insufficient
+ resolution on the coarser (or on all) grids. For a convergence
+ test, the grids are completely independent, and it does not matter
+ whether the simulation runs on all grids simultaneously or
+ sequentially. In order to treat the grid sequentially, the
+ application does not have to be changed at all.</p>
+
+ <p>The reason of interest here is of course FMR. For FMR, the
+ order in which the grids are treated is fixed. As described
+ above, there is first a time step on the coarse grid, and then
+ recursively several smaller steps on the finer grids. This order
+ does require certain changes in the application. The sequence of
+ operations that form a single time step have to be identified and
+ isolated. (Which is to say that there has to be a routine that
+ calculates a time step, that is, a complete time step, and nothing
+ else.) It is then the task of the FMR driver to call this routine
+ for the correct grids in the correct order.</p>
+
+ <p>Other reasons for multiple resolution levels are
+ e. g. multigrid algorithms for elliptic equations, which I do not
+ want to mention here, or shadow hierarchies to determine
+ truncation errors, which I also want to skip here. Shadow
+ hierarchies are very similar to the convergence levels described
+ above.</p>
+
+ <p>Beside this order in which the operations are performed on the
+ grids, there is one more complication for FMR. The boundary
+ values of the finer grids have to be calculated from the coarser
+ grids through interpolation. An because the time steps on the
+ finer grids are smaller, there is not always a corresponding value
+ on the coarser grids available. This makes it necessary to
+ interpolate in time between time steps on the coarser grids. The
+ alternative would be to take smaller steps on the coarser grids,
+ and this would be very expensive.</p>
+
+ <p>These interpolations in time make it necessary that the driver
+ knows which grid function contains values corresponding to what
+ time. The usual way to achieve this is to have several time
+ levels per grid function; three time levels allow for a second
+ order interpolation in time. Only grid functions with enough time
+ levels can be interpolated, i. e. boundary conditions can be
+ calculated only for those.</p>
+
+ <p>Fortunately time levels are rather widespread in applications,
+ so they are no new concept to introduce. Unfortunately they are
+ often abused, so that values corresponding to the wrong time are
+ stored in a time level, usually with the excuse of saving storage.
+ This will in general not work with FMR, because the driver then
+ cannot interpolate in time, leading to incorrect values on the
+ boundaries of the finer grids.</p>
+
+ <h3>Multiple Grid Components</h3>
+
+ <p>Sometimes it is convenient to have a simulation domain that is
+ not a rectangle. It might instead be an L-shaped simulation
+ domain, or a domain that consists of two disconnected rectangular
+ regions. This issue becomes more important with FMR, because
+ there it is often convenient to have several disconnected refined
+ regions. As long as there are enough processors available, each
+ processor can be assigned a region or a part thereof, and no new
+ concept need be introduced. If, however, there are fewer
+ processors than regions, then a new problem arises.</p>
+
+ <p>A common case for that problem might be a simulation containing
+ just two refined regions, and running on a single processor. The
+ refined grid the consists of two component. The problem then is
+ that the two components cannot be treated sequentially: Imagine
+ the time evolution routine working on (say) the first component.
+ It will at some time call the synchronisation routine. At that
+ time there are no values from the second component available,
+ because the second component has not been treated yet. Therefore
+ the synchronisation routine cannot complete. That means in turn
+ that the time evolution routine cannot complete working on the
+ first component, leading to a deadlock. Work on neither component
+ can be completed before work on the other component.</p>
+
+ <p>The solution is to break up the time evolution routine into
+ several smaller routines, each consisting of a single either local
+ or global operation. ("Local" and "global" have here the exact
+ same meanings that were defined above for parallelisation.) A
+ local operation works, by definition, on individual grid points.
+ Hence the local routines have to be called once for every grid
+ component. A global operation, by definition, does not depend on
+ individual grid points. Hence it has to be called only once per
+ processor, and not once per component. That means that the driver
+ has to be told the category individual routine is in.</p>
+
+ <h3>Example</h3>
+
+ <p>Let me finish this section with an detailed example. Suppose
+ you want to solve the equation</p>
+
+ <center>d/dt u = f(u) ,</center>
+
+ <p>integrating using the midpoint rule, i. e. the simplemost
+ second-order time integration scheme. Given values at the
+ previous time u<sup>n-1</sup>, one first calculates a first order
+ solution using an Euler step, leading to the intermediate
+ result</p>
+
+ <center>v<sup>n</sup> = u<sup>n-1</sup> + dt f(u<sup>n-1</sup>)
+ .</center>
+
+ <p>The second and final step is then calculated via</p>
+
+ <center>u<sup>n</sup> = u<sup>n-1</sup> + dt f([u<sup>n-1</sup> +
+ v<sup>n</sup>]/2) .</center>
+
+ <p>The corresponding pseudo code would look like</p>
+
+ <ol>
+ <li>Calculate Euler step, storing the result into
+ u<sup>n</sup></li>
+
+ <li>Apply boundary conditions to u<sup>n</sup></li>
+
+ <li>Synchronise u<sup>n</sup></li>
+
+ <li>Calculate average of u<sup>n-1</sup> and u<sup>n</sup>,
+ storing the result into v<sup>n</sup></li>
+
+ <li>Calculate second step, storing the result again into
+ u<sup>n</sup></li>
+
+ <li>Apply boundary conditions again to u<sup>n</sup></li>
+
+ <li>Synchronise again u<sup>n</sup></li>
+ </ol>
+
+ <p>The above algorithm looks a bit different from a naive
+ implementation of the midpoint rule. One difference is that both
+ the first and the second step store their result into
+ u<sup>n</sup>. This is necessary because it would be inconvenient
+ to apply boundary conditions to the intermediate value
+ v<sup>n</sup>. Remember, in order to apply boundary conditions on
+ the finer grids, there have to be several time levels present.
+ With the above scheme, only u needs several time levels. v is
+ used only as a temporary (and could conceivably be completely
+ eliminated).</p>
+
+ <p>Note also that the first step goes all the way from time level
+ n-1 to time level n. The midpoint rule can be rewritten (in fact,
+ is usually written) so that the first step is only a half step,
+ leading to the time level n-1/2. This is not possible for FMR,
+ because interpolating to the time n-1/2 is not possible, and thus
+ there could be no boundary conditions applied after the first
+ step.</p>
+
+ <p>The second thing to note is that the application of the
+ boundary condition and the synchronisation have been separated
+ rather artificially. Normally synchronisation would be considered
+ part of the boundary condition. In this case, however, the
+ applying the boundary condition is a local operation, whereas
+ synchronisation counts as global operation. (It is not obvious
+ that synchronisation should be global, but as the synchronisation
+ routine is a part of Carpet, it was up to me to decide this.) As
+ explained above, local and global operations have to be
+ separated.</p>
+
+ <p>Separating the evolution steps and the boundary condition
+ routines is, on the other hand, just a notational convenience.
+ There could well be a single routine implementing both.</p>
+
+ <p>For Cactus, the order in which to call the individual parts of
+ the time evolution routines is described in the schedule routines,
+ i. e. in the files called <code>schedule.ccl</code>. By default a
+ routine is assumed to be local; global routines have to be tagged
+ with <code>OPTIONS: GLOBAL</code>.</p>
+
+ <p>The tag <code>SYNC: groupname</code> indicates that the group
+ <code>groupname</code> should be synchronised after the scheduled
+ routine has been called for all grid components. This obviously
+ makes sense only for local routines. Using the <code>SYNC:</code>
+ tag is preferred over calling the synchronisation routine
+ <code>CCTK_SyncGroup</code> directly.</p>
+
+ <p>The example thorn WaveToy in Carpet's arrangement is a bit
+ simpler than what is described here, because it uses the Leapfrog
+ scheme which consists of only a single step. I would suggest
+ looking at WaveToy as an initial FMR example.</p>
+
+ <p>The thorn SpaceToy is implemented very close to the way
+ described here. It evolves two variables phi and psi, but it is
+ also coupled to the thorn HydroToy. This coupling introduces some
+ additional complications. The thorn HydroToy, on the other hand
+ uses a predictor-corrector scheme, which is also a two step scheme
+ and thus more complex that WaveToy. All the coupling between
+ SpaceToy and HydroToy is contained in SpaceToy. I would thus
+ suggest looking at HydroToy first.</p>
+
+ <p>I assume that converting an application to FMR is
+ straightforward after handling the time levels has been
+ straightened out.</p>
+
+
+
+ <h2>Carpet Under The Hood</h2>
+
+ <p>To be continued...</p>
+
+
+
+ <h2>Moving Boxes, Adaptive Mesh Refinement</h2>
+
+ <p>To be continued...</p>
+
+
+
+ <hr>
+ <address><a href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a></address>
+<!-- Created: Sun Mar 25 15:15:18 CST 2001 -->
+<!-- hhmts start -->
+Last modified: Sun Mar 25 20:26:30 CST 2001
+<!-- hhmts end -->
+ </body>
+</html>