Added documentation.

darcs-hash:20010326002722-f6438-4bab305f4955af4a8712f8a1fd1f24f8926a2652.gz
author: eschnett <> 2001-03-26 00:27:00 +0000
committer: eschnett <> 2001-03-26 00:27:00 +0000
commit: e9020906e0ce2db439522ccdba76f4936ae12bde (patch)
tree: 743ce0ed741920216a60e509bef1f4bc5462084f /Carpet/Carpet/doc
parent: 4123f52ffccee9997e4c228376ec131f773da364 (diff)
1 files changed, 536 insertions, 0 deletions
diff --git a/Carpet/Carpet/doc/howto.html b/Carpet/Carpet/doc/howto.html
new file mode 100644
index 000000000..911d4b840
--- /dev/null
+++ b/Carpet/Carpet/doc/howto.html
@@ -0,0 +1,536 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- $Header: /home/eschnett/C/carpet/Carpet/Carpet/Carpet/doc/Attic/howto.html,v 1.1 2001/03/26 02:27:22 eschnett Exp $ -->
+<html>
+  <head>
+    <title>Fixed Mesh Refinement in Cactus using Carpet</title>
+  </head>
+
+  <body>
+    <h1>Fixed Mesh Refinement in Cactus using Carpet</h1>
+    
+    
+    
+    <h2>Overview</h2>
+    
+    <h3>Fixed Mesh Refinement, aka Box-in-Box</h3>
+
+    <p>Fixed Mesh Refinement (FMR), also known as box-in-box, is a way
+    to increase the local resolution in unigrid applications, while
+    retaining the basic unigrid character of an application.  A small
+    number (maybe two or three) of grids with varying resolution
+    overlay each other, where the coarsest grid has the largest
+    extent.  This allows the application to benefit from the higher
+    resolution of the smaller grids while keeping the outer boundary
+    far out at the same time.  The main advantage of FMR are that it
+    needs far less resources than globally increasing the
+    resolution.</p>
+    
+    <h3>Carpet</h3>
+    
+    <p>Carpet is the name of an FMR driver, i. e. the back end that
+    handles storage allocation for the grid functions, parallelism,
+    I/O, and the various inter-grid operations.  Carpet was developed
+    in early summer of 2000 by <a
+    href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a>, then
+    a research scholar in the <a
+    href="http://www.astro.psu.edu/">Department for Astronomy and
+    Astrophysics</a> of <a href="http://www.psu.edu/">Penn State
+    University</a>.  In spring 2001, Carpet was coupled to Cactus as a
+    drop-in enhancement for the standard unigrid Cactus driver
+    PUGH.</p>
+    
+    <h3>Cactus</h3>
+    
+    <p>From the <a href="http://www.cactuscode.org">main Cactus web
+    page</a>: "Cactus is an open source problem solving environment
+    designed for scientests and engineers.  Its modular structure
+    easily enables parallel computation across different architectures
+    and collaborative code development between different groups.
+    Cactus originated in the academic research community, where it was
+    developed and used over many years by a large international
+    collaboration of physicists and computational scientists."</p>
+    
+    
+    
+    <h2>Introduction</h2>
+    
+    <h3>Fixed Mesh Refinement</h3>
+    
+    <p>A standard way of solving partial differential equations are
+    finite differences on a regular grid.  This is also called
+    "unigrid".  Such an application discretises its problem space onto
+    a single, rectangular grid which has everywhere the same grid
+    spacing.  This grid might be broken up into several parts for
+    parallelisation purposes, but parallelisation should be
+    transparent to the physics part of the application.</p>
+    
+    <p>Increasing the resolution in a unigrid application is somewhat
+    expensive.  For example, increasing the resolution by a factor of
+    two requires a factor of eight more storage in three dimensions.
+    Given a constant Courant factor, the calculation time will even go
+    up by a factor of sixteen.  This behaviour makes it easy to find
+    problems that cannot be solved on contemporary supercomputers, no
+    matter how big and fast those computers are.</p>
+    
+    <p>Apart from physical insight, which then often has to be used to
+    decrease the problem size until it fits the current hardware,
+    there are also numerical and algorithmic methods to decrease the
+    resource requirements of the application.  Most applications need
+    the high resolution only in a part of the simulation domain.
+    Discretisation methods that don't require a uniform resolution,
+    such as finite elements, can implement non-uniform resolutions
+    very naturally.  One problem with finite elements is that many
+    physicists today are not familiar with finite elements, or shy
+    away from their perceived complexity, or are not willing to adapt
+    existing finite difference code.</p>
+    
+    <p>Fixed Mesh Refinement (FMR) is a poor man's way of implementing
+    a non-uniform resolution into a unigrid application with minimal
+    changes to its structure.  Instead of only one grid, there are
+    several grids or grid patches with different resolutions.  The
+    coarsest grid usually encloses the whole simulation domain.
+    Successively finer grids overlay the coarse grid at those
+    locations where a higher resolutions is needed.  The coarser grids
+    provide boundary conditions to the finer grid through
+    interpolation.</p>
+    
+    <p>Instead of updating only one grid, the application has to
+    update all grids.  The usual approach is to first take a step on
+    the coarsest grid, and then recursively take several smaller steps
+    on the finer grids.  The Courant criterion requires that the step
+    sizes on the finer grids be smaller than on the coarse grid.  The
+    boundary values for the finer grids are found through
+    interpolation in space and time from the coarser grid.  In the
+    end, the information on the finer grids is injected into the
+    coarse grids.</p>
+    
+    <p>Strictly speaking there is no need for a coarse grid on the
+    regions covered by the finer grids.  But as stated above, the
+    resources required for treating the overlapping region on the
+    coarse grid are only minimal compared to treating the finer grids.
+    And because a coarse grid with a hole often creates complications,
+    this obvious optimisation is often left out.</p>
+    
+    <h3>Carpet</h3>
+    
+    <p>Carpet is a C++ library that provides infrastructure to
+    describe regions of varying resolution in a convenient and
+    efficient way.  Carpet contains routines to manage grid
+    hierarchies, containing the relationships between the components
+    of the grid on the different refinement and convergence levels.
+    Carpet has a notion of simulation time and grid spacing, which are
+    necessary for interpolation, and contains efficient
+    interpolators.</p>
+    
+    <p>Carpet can run on several processors in parallel using MPI for
+    communication.  Each grid can be broken down into several
+    components, and every component has a home processor.  Carpet also
+    contains operators to move certain regions to a different
+    processor, or to synchronise all components of a grid.</p>
+    
+    <p>Carpet is also an arrangement of thorns for Cactus,
+    implementing a driver and associated I/O routines for both ASCII
+    and binary I/O.  It should be possible to substitute Carpet for
+    the standard Cactus driver PUGH without changes to the application
+    thorns and thus use Carpet as a unigrid driver.  Making use of the
+    FMR capabilities of Carpet usually requires some rearranging of
+    the application, comparable in general to the changes necessary
+    for a uniprocessor application to run on multiple processors.</p>
+    
+    <p>The driver section of Carpet contains the logic to manage
+    storage for the grid functions, to traverse the grid hierarchy for
+    all scheduled routines, and to automatically apply the necessary
+    inter-grid operators for prolongation (interpolation of the fine
+    grid boundaries) and restriction (injecting the fine grid
+    information back into the coarse grid).</p>
+    
+    <p>The ASCII I/O routines use the quasi-standard <a
+    href="gnuplot">gnuplot</a> format.  The binary I/O routines use
+    the <a href="FlexIO">FlexIO library</a> written by John Shalf.  It
+    allows efficient and platform independent I/O.  The FlexIO format
+    is based on <a href="HDF">HDF</a> and also supported by several
+    visualisation packages.</p>
+    
+    <p>Carpet is copyrighted by Erik Schnetter, and is available under
+    the LGPL licence from a <a href="CVS">CVS</a> repository.</p>
+    
+    <h3>WaveToy</h3>
+    
+    <p>Cactus comes with a sample application called "WaveToy", which
+    solves the scalar wave equation with various initial data and
+    boundary conditions.  An an example, I have extended WaveToy so
+    that is uses Carpet's FMR capabilities.  WaveToy serves both as a
+    test case for Carpet, and as example of how to convert an
+    application to using FMR.</p>
+    
+    <p>The equation solved by WaveToy is the well known scalar wave
+    equation, discretised using the Leapfrog method with three time
+    levels, yielding second order accuracy in space and time.  A
+    typical set of initial data are a plane wave, and a typical
+    boundary condition is periodicity.  Those allow long term
+    simulations as well as easy and meaningful comparisons to the
+    analytic solution.</p>
+    
+    
+    
+    <h2>Compiling Cactus With Carpet</h2>
+    
+    <p>Carpet has been written in C++, using templates and the STL
+    (Standard Template Library).  Both templates and the STL make
+    writing and debugging code a lot easier.  Without templates, I
+    would have had to put much effort into making Carpet support all
+    of Cactus' data types.  Without the STL, I would have had to spend
+    quite some time implementing basic containers such as lists or
+    sets.  I still had to implement a custom vector type, because
+    STL's vector type is optimised for large vectors only, and I
+    needed threedimensional vectors of integers.</p>
+    
+    <p>The inner loops of Carpet are the inter-grid operators, that is
+    the routines that copy, restrict, and prolongate between grids.
+    Due to Cactus it was rather easy to write these in Fortran 77,
+    which makes them both fast and portable.</p>
+    
+    <p>Carpet is an arrangement in Cactus.  It can theoretically be
+    compiled without any other external library, if you don't include
+    the binary I/O support which requires FlexIO.  I do recommend
+    using FlexIO, so you should install the FlexIO library first.
+    Although FlexIO is already part of Cactus in the thorn called
+    CactusExternal/FlexIO, this seems to be a version that has FMR
+    support disabled and is hence not usable.  You will have to
+    install a complete copy of FlexIO by hand.</p>
+    
+    <h3>Hurdle 1: FlexIO</h3>
+    
+    <p>DESCRIBE INSTALLING FLEXIO HERE.  MENTION INSTALLING HDF4
+    FIRST, THEN HDF5, THEN FLEXIO</p>
+    
+    <h3>Hurdle 2: STL</h3>
+    
+    <p>DESCRIBE HOW TO GET/INSTALL AN STL, OR HOW TO MAKE THE SYSTEM'S
+    C++ COMPILER WORK WITH THE STL</p>
+    
+    <h3>Hurdle 3: Templates</h3>
+    
+    <p>DESCRIBE HOW TO INSTANTIATE, AND NOT-INSTANTIATE, THE TEMPLATES
+    AS NEEDED.</p>
+    
+    <h3>WaveToy</h3>
+    
+    <p>CONFIGURING, THORNS, COMPILING</p>
+    
+    <p>To be continued...</p>
+    
+    
+    
+    <h2>Running The Example Applications</h2>
+    
+    <p>SAMPLE FILES</p>
+    
+    <p>CARPET'S OPTIONS</p>
+    
+    <p>To be continued...</p>
+    
+    <p>Second order convergence requires second order interpolation in
+    time, which requires that at least three time levels are
+    present.</p>
+    
+    
+    
+    <h2>Fold Your Own FMR Application</h2>
+    
+    <p>There are three steps to take from a simple unigrid
+    uniprocessor toy application to a full-blown FMR multiprocessor
+    production application.  Those steps are almost independent, and I
+    would like to explain them and their implications in some detail
+    below.</p>
+    
+    <h3>Multiple Processors</h3>
+    
+    <p>The probably best known of these is the step from using one to
+    using several processors, also known as parallelisation.  Because
+    many people are already familiar with this step, I will describe
+    it first.</p>
+    
+    <p>In a uniprocessor application, it is possible to access every
+    grid point in arbitrary manners.  In order to allow multiple
+    processors to run efficiently in parallel, the grid is broken down
+    into several rectangular components, and each processor is
+    assigned one of these components.</p>
+    
+    <p>The components will usually overlap by a few grid points, so as
+    to allow the processors to e. g. calculate spatial derivatives
+    (which require neighbouring grid points) without having to
+    communicate for every grid point.  From time to time it is then
+    necessary to synchronise the overlapping region, which is the only
+    time at which communication happens.  This allows the application
+    to run almost unchanged, i. e. without invoking communication
+    itself.  The synchronisation routine is provided by the driver and
+    not by the application.</p>
+    
+    <p>Of course a serial applicate usually will have to be changed to
+    support multiple processors.  In order to do so, all the
+    operations that the application performs have to be classified
+    into one of two categories:</p>
+    
+    <p>One category contains the so-called <em>local</em> operations.
+    These are operations that are applied to each and every grid point
+    individually, and that do not depend on any other grid point
+    except nearby neighbours.  Each local operation will thus involve
+    a loop over all grid points, and in order to run on multiple
+    processors, after each such loop the synchronisation routine has
+    to be called.  An example of a local operation would be
+    calculating a spatial derivative.</p>
+    
+    <p>The other category contains so-called <em>global</em>
+    operations.  These operations do not depend on individual grid
+    points, and thus do not involve loops over grid points.  The
+    result of a global operation is the same on all processors;
+    therefore global operations don't involve communication and don't
+    require synchronisation.  An example of a global operation would
+    be to check how many time steps have been taken, and decide
+    whether the simulation should be terminated.</p>
+    
+    <p>Typically most operations can be classified or rewritten to be
+    either local or global.  But often there are operations that fit
+    neither category, and these parts of an application are hardest to
+    parallelise.  Applying the boundary conditions, to give another
+    example, might seem at first to be neither local nor global.  But
+    in a slight (yet completely correct) stretch of the term "applied
+    to all grid points", boundary conditions can be classified as
+    local; they are a local operation that just does nothing to most
+    grid points.</p>
+    
+    <p>To give one more example, calculating an error norm does not
+    fit these categories.  It is neither local nor global.  It is not
+    local because the results involved all grid points (and not only
+    nearby neighbours), and it is not global because it does involve
+    the grid points.  All operations that do not fit the two category
+    require typically special handling, and often require hand-coded
+    communication in the application.  Luckily calculating various
+    norms is such a common case that there are special routines for
+    that already present, called <em>reduction operators</em>.</p>
+    
+    <h3>Multiple Resolution Levels</h3>
+    
+    <p>There are several reasons why an application might want to
+    incorporate more than one grid, overlapping and each with a
+    different resolution.</p>
+    
+    <p>The most commonly known reason is probably a convergence test,
+    where the very same problem is treated in different resolutions.
+    Differences in the result are then likely caused by insufficient
+    resolution on the coarser (or on all) grids.  For a convergence
+    test, the grids are completely independent, and it does not matter
+    whether the simulation runs on all grids simultaneously or
+    sequentially.  In order to treat the grid sequentially, the
+    application does not have to be changed at all.</p>
+    
+    <p>The reason of interest here is of course FMR.  For FMR, the
+    order in which the grids are treated is fixed.  As described
+    above, there is first a time step on the coarse grid, and then
+    recursively several smaller steps on the finer grids.  This order
+    does require certain changes in the application.  The sequence of
+    operations that form a single time step have to be identified and
+    isolated.  (Which is to say that there has to be a routine that
+    calculates a time step, that is, a complete time step, and nothing
+    else.)  It is then the task of the FMR driver to call this routine
+    for the correct grids in the correct order.</p>
+    
+    <p>Other reasons for multiple resolution levels are
+    e. g. multigrid algorithms for elliptic equations, which I do not
+    want to mention here, or shadow hierarchies to determine
+    truncation errors, which I also want to skip here.  Shadow
+    hierarchies are very similar to the convergence levels described
+    above.</p>
+    
+    <p>Beside this order in which the operations are performed on the
+    grids, there is one more complication for FMR.  The boundary
+    values of the finer grids have to be calculated from the coarser
+    grids through interpolation.  An because the time steps on the
+    finer grids are smaller, there is not always a corresponding value
+    on the coarser grids available.  This makes it necessary to
+    interpolate in time between time steps on the coarser grids.  The
+    alternative would be to take smaller steps on the coarser grids,
+    and this would be very expensive.</p>
+    
+    <p>These interpolations in time make it necessary that the driver
+    knows which grid function contains values corresponding to what
+    time.  The usual way to achieve this is to have several time
+    levels per grid function; three time levels allow for a second
+    order interpolation in time.  Only grid functions with enough time
+    levels can be interpolated, i. e. boundary conditions can be
+    calculated only for those.</p>
+
+    <p>Fortunately time levels are rather widespread in applications,
+    so they are no new concept to introduce.  Unfortunately they are
+    often abused, so that values corresponding to the wrong time are
+    stored in a time level, usually with the excuse of saving storage.
+    This will in general not work with FMR, because the driver then
+    cannot interpolate in time, leading to incorrect values on the
+    boundaries of the finer grids.</p>
+    
+    <h3>Multiple Grid Components</h3>
+    
+    <p>Sometimes it is convenient to have a simulation domain that is
+    not a rectangle.  It might instead be an L-shaped simulation
+    domain, or a domain that consists of two disconnected rectangular
+    regions.  This issue becomes more important with FMR, because
+    there it is often convenient to have several disconnected refined
+    regions.  As long as there are enough processors available, each
+    processor can be assigned a region or a part thereof, and no new
+    concept need be introduced.  If, however, there are fewer
+    processors than regions, then a new problem arises.</p>
+    
+    <p>A common case for that problem might be a simulation containing
+    just two refined regions, and running on a single processor.  The
+    refined grid the consists of two component.  The problem then is
+    that the two components cannot be treated sequentially: Imagine
+    the time evolution routine working on (say) the first component.
+    It will at some time call the synchronisation routine.  At that
+    time there are no values from the second component available,
+    because the second component has not been treated yet.  Therefore
+    the synchronisation routine cannot complete.  That means in turn
+    that the time evolution routine cannot complete working on the
+    first component, leading to a deadlock.  Work on neither component
+    can be completed before work on the other component.</p>
+    
+    <p>The solution is to break up the time evolution routine into
+    several smaller routines, each consisting of a single either local
+    or global operation.  ("Local" and "global" have here the exact
+    same meanings that were defined above for parallelisation.)  A
+    local operation works, by definition, on individual grid points.
+    Hence the local routines have to be called once for every grid
+    component.  A global operation, by definition, does not depend on
+    individual grid points.  Hence it has to be called only once per
+    processor, and not once per component.  That means that the driver
+    has to be told the category individual routine is in.</p>
+    
+    <h3>Example</h3>
+    
+    <p>Let me finish this section with an detailed example.  Suppose
+    you want to solve the equation</p>
+    
+    <center>d/dt u = f(u) ,</center>
+    
+    <p>integrating using the midpoint rule, i. e. the simplemost
+    second-order time integration scheme.  Given values at the
+    previous time u<sup>n-1</sup>, one first calculates a first order
+    solution using an Euler step, leading to the intermediate
+    result</p>
+      
+    <center>v<sup>n</sup> = u<sup>n-1</sup> + dt f(u<sup>n-1</sup>)
+    .</center>
+    
+    <p>The second and final step is then calculated via</p>
+      
+    <center>u<sup>n</sup> = u<sup>n-1</sup> + dt f([u<sup>n-1</sup> +
+    v<sup>n</sup>]/2) .</center>
+    
+    <p>The corresponding pseudo code would look like</p>
+    
+    <ol>
+      <li>Calculate Euler step, storing the result into
+      u<sup>n</sup></li>
+      
+      <li>Apply boundary conditions to u<sup>n</sup></li>
+      
+      <li>Synchronise u<sup>n</sup></li>
+      
+      <li>Calculate average of u<sup>n-1</sup> and u<sup>n</sup>,
+      storing the result into v<sup>n</sup></li>
+      
+      <li>Calculate second step, storing the result again into
+      u<sup>n</sup></li>
+      
+      <li>Apply boundary conditions again to u<sup>n</sup></li>
+      
+      <li>Synchronise again u<sup>n</sup></li>
+    </ol>
+    
+    <p>The above algorithm looks a bit different from a naive
+    implementation of the midpoint rule.  One difference is that both
+    the first and the second step store their result into
+    u<sup>n</sup>.  This is necessary because it would be inconvenient
+    to apply boundary conditions to the intermediate value
+    v<sup>n</sup>.  Remember, in order to apply boundary conditions on
+    the finer grids, there have to be several time levels present.
+    With the above scheme, only u needs several time levels.  v is
+    used only as a temporary (and could conceivably be completely
+    eliminated).</p>
+    
+    <p>Note also that the first step goes all the way from time level
+    n-1 to time level n.  The midpoint rule can be rewritten (in fact,
+    is usually written) so that the first step is only a half step,
+    leading to the time level n-1/2.  This is not possible for FMR,
+    because interpolating to the time n-1/2 is not possible, and thus
+    there could be no boundary conditions applied after the first
+    step.</p>
+    
+    <p>The second thing to note is that the application of the
+    boundary condition and the synchronisation have been separated
+    rather artificially.  Normally synchronisation would be considered
+    part of the boundary condition.  In this case, however, the
+    applying the boundary condition is a local operation, whereas
+    synchronisation counts as global operation.  (It is not obvious
+    that synchronisation should be global, but as the synchronisation
+    routine is a part of Carpet, it was up to me to decide this.)  As
+    explained above, local and global operations have to be
+    separated.</p>
+    
+    <p>Separating the evolution steps and the boundary condition
+    routines is, on the other hand, just a notational convenience.
+    There could well be a single routine implementing both.</p>
+    
+    <p>For Cactus, the order in which to call the individual parts of
+    the time evolution routines is described in the schedule routines,
+    i. e. in the files called <code>schedule.ccl</code>.  By default a
+    routine is assumed to be local; global routines have to be tagged
+    with <code>OPTIONS: GLOBAL</code>.</p>
+    
+    <p>The tag <code>SYNC: groupname</code> indicates that the group
+    <code>groupname</code> should be synchronised after the scheduled
+    routine has been called for all grid components.  This obviously
+    makes sense only for local routines.  Using the <code>SYNC:</code>
+    tag is preferred over calling the synchronisation routine
+    <code>CCTK_SyncGroup</code> directly.</p>
+    
+    <p>The example thorn WaveToy in Carpet's arrangement is a bit
+    simpler than what is described here, because it uses the Leapfrog
+    scheme which consists of only a single step.  I would suggest
+    looking at WaveToy as an initial FMR example.</p>
+    
+    <p>The thorn SpaceToy is implemented very close to the way
+    described here.  It evolves two variables phi and psi, but it is
+    also coupled to the thorn HydroToy.  This coupling introduces some
+    additional complications.  The thorn HydroToy, on the other hand
+    uses a predictor-corrector scheme, which is also a two step scheme
+    and thus more complex that WaveToy.  All the coupling between
+    SpaceToy and HydroToy is contained in SpaceToy.  I would thus
+    suggest looking at HydroToy first.</p>
+    
+    <p>I assume that converting an application to FMR is
+    straightforward after handling the time levels has been
+    straightened out.</p>
+    
+    
+    
+    <h2>Carpet Under The Hood</h2>
+    
+    <p>To be continued...</p>
+    
+    
+    
+    <h2>Moving Boxes, Adaptive Mesh Refinement</h2>
+    
+    <p>To be continued...</p>
+    
+    
+    
+    <hr>
+    <address><a href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a></address>
+<!-- Created: Sun Mar 25 15:15:18 CST 2001 -->
+<!-- hhmts start -->
+Last modified: Sun Mar 25 20:26:30 CST 2001
+<!-- hhmts end -->
+  </body>
+</html>
author	eschnett <>	2001-03-26 00:27:00 +0000
committer	eschnett <>	2001-03-26 00:27:00 +0000
commit	e9020906e0ce2db439522ccdba76f4936ae12bde (patch)
tree	743ce0ed741920216a60e509bef1f4bc5462084f /Carpet/Carpet/doc
parent	4123f52ffccee9997e4c228376ec131f773da364 (diff)