diff options
author | eschnett <> | 2001-03-26 00:27:00 +0000 |
---|---|---|
committer | eschnett <> | 2001-03-26 00:27:00 +0000 |
commit | e9020906e0ce2db439522ccdba76f4936ae12bde (patch) | |
tree | 743ce0ed741920216a60e509bef1f4bc5462084f /Carpet/Carpet/doc | |
parent | 4123f52ffccee9997e4c228376ec131f773da364 (diff) |
Added documentation.
darcs-hash:20010326002722-f6438-4bab305f4955af4a8712f8a1fd1f24f8926a2652.gz
Diffstat (limited to 'Carpet/Carpet/doc')
-rw-r--r-- | Carpet/Carpet/doc/howto.html | 536 |
1 files changed, 536 insertions, 0 deletions
diff --git a/Carpet/Carpet/doc/howto.html b/Carpet/Carpet/doc/howto.html new file mode 100644 index 000000000..911d4b840 --- /dev/null +++ b/Carpet/Carpet/doc/howto.html @@ -0,0 +1,536 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> +<!-- $Header: /home/eschnett/C/carpet/Carpet/Carpet/Carpet/doc/Attic/howto.html,v 1.1 2001/03/26 02:27:22 eschnett Exp $ --> +<html> + <head> + <title>Fixed Mesh Refinement in Cactus using Carpet</title> + </head> + + <body> + <h1>Fixed Mesh Refinement in Cactus using Carpet</h1> + + + + <h2>Overview</h2> + + <h3>Fixed Mesh Refinement, aka Box-in-Box</h3> + + <p>Fixed Mesh Refinement (FMR), also known as box-in-box, is a way + to increase the local resolution in unigrid applications, while + retaining the basic unigrid character of an application. A small + number (maybe two or three) of grids with varying resolution + overlay each other, where the coarsest grid has the largest + extent. This allows the application to benefit from the higher + resolution of the smaller grids while keeping the outer boundary + far out at the same time. The main advantage of FMR are that it + needs far less resources than globally increasing the + resolution.</p> + + <h3>Carpet</h3> + + <p>Carpet is the name of an FMR driver, i. e. the back end that + handles storage allocation for the grid functions, parallelism, + I/O, and the various inter-grid operations. Carpet was developed + in early summer of 2000 by <a + href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a>, then + a research scholar in the <a + href="http://www.astro.psu.edu/">Department for Astronomy and + Astrophysics</a> of <a href="http://www.psu.edu/">Penn State + University</a>. In spring 2001, Carpet was coupled to Cactus as a + drop-in enhancement for the standard unigrid Cactus driver + PUGH.</p> + + <h3>Cactus</h3> + + <p>From the <a href="http://www.cactuscode.org">main Cactus web + page</a>: "Cactus is an open source problem solving environment + designed for scientests and engineers. Its modular structure + easily enables parallel computation across different architectures + and collaborative code development between different groups. + Cactus originated in the academic research community, where it was + developed and used over many years by a large international + collaboration of physicists and computational scientists."</p> + + + + <h2>Introduction</h2> + + <h3>Fixed Mesh Refinement</h3> + + <p>A standard way of solving partial differential equations are + finite differences on a regular grid. This is also called + "unigrid". Such an application discretises its problem space onto + a single, rectangular grid which has everywhere the same grid + spacing. This grid might be broken up into several parts for + parallelisation purposes, but parallelisation should be + transparent to the physics part of the application.</p> + + <p>Increasing the resolution in a unigrid application is somewhat + expensive. For example, increasing the resolution by a factor of + two requires a factor of eight more storage in three dimensions. + Given a constant Courant factor, the calculation time will even go + up by a factor of sixteen. This behaviour makes it easy to find + problems that cannot be solved on contemporary supercomputers, no + matter how big and fast those computers are.</p> + + <p>Apart from physical insight, which then often has to be used to + decrease the problem size until it fits the current hardware, + there are also numerical and algorithmic methods to decrease the + resource requirements of the application. Most applications need + the high resolution only in a part of the simulation domain. + Discretisation methods that don't require a uniform resolution, + such as finite elements, can implement non-uniform resolutions + very naturally. One problem with finite elements is that many + physicists today are not familiar with finite elements, or shy + away from their perceived complexity, or are not willing to adapt + existing finite difference code.</p> + + <p>Fixed Mesh Refinement (FMR) is a poor man's way of implementing + a non-uniform resolution into a unigrid application with minimal + changes to its structure. Instead of only one grid, there are + several grids or grid patches with different resolutions. The + coarsest grid usually encloses the whole simulation domain. + Successively finer grids overlay the coarse grid at those + locations where a higher resolutions is needed. The coarser grids + provide boundary conditions to the finer grid through + interpolation.</p> + + <p>Instead of updating only one grid, the application has to + update all grids. The usual approach is to first take a step on + the coarsest grid, and then recursively take several smaller steps + on the finer grids. The Courant criterion requires that the step + sizes on the finer grids be smaller than on the coarse grid. The + boundary values for the finer grids are found through + interpolation in space and time from the coarser grid. In the + end, the information on the finer grids is injected into the + coarse grids.</p> + + <p>Strictly speaking there is no need for a coarse grid on the + regions covered by the finer grids. But as stated above, the + resources required for treating the overlapping region on the + coarse grid are only minimal compared to treating the finer grids. + And because a coarse grid with a hole often creates complications, + this obvious optimisation is often left out.</p> + + <h3>Carpet</h3> + + <p>Carpet is a C++ library that provides infrastructure to + describe regions of varying resolution in a convenient and + efficient way. Carpet contains routines to manage grid + hierarchies, containing the relationships between the components + of the grid on the different refinement and convergence levels. + Carpet has a notion of simulation time and grid spacing, which are + necessary for interpolation, and contains efficient + interpolators.</p> + + <p>Carpet can run on several processors in parallel using MPI for + communication. Each grid can be broken down into several + components, and every component has a home processor. Carpet also + contains operators to move certain regions to a different + processor, or to synchronise all components of a grid.</p> + + <p>Carpet is also an arrangement of thorns for Cactus, + implementing a driver and associated I/O routines for both ASCII + and binary I/O. It should be possible to substitute Carpet for + the standard Cactus driver PUGH without changes to the application + thorns and thus use Carpet as a unigrid driver. Making use of the + FMR capabilities of Carpet usually requires some rearranging of + the application, comparable in general to the changes necessary + for a uniprocessor application to run on multiple processors.</p> + + <p>The driver section of Carpet contains the logic to manage + storage for the grid functions, to traverse the grid hierarchy for + all scheduled routines, and to automatically apply the necessary + inter-grid operators for prolongation (interpolation of the fine + grid boundaries) and restriction (injecting the fine grid + information back into the coarse grid).</p> + + <p>The ASCII I/O routines use the quasi-standard <a + href="gnuplot">gnuplot</a> format. The binary I/O routines use + the <a href="FlexIO">FlexIO library</a> written by John Shalf. It + allows efficient and platform independent I/O. The FlexIO format + is based on <a href="HDF">HDF</a> and also supported by several + visualisation packages.</p> + + <p>Carpet is copyrighted by Erik Schnetter, and is available under + the LGPL licence from a <a href="CVS">CVS</a> repository.</p> + + <h3>WaveToy</h3> + + <p>Cactus comes with a sample application called "WaveToy", which + solves the scalar wave equation with various initial data and + boundary conditions. An an example, I have extended WaveToy so + that is uses Carpet's FMR capabilities. WaveToy serves both as a + test case for Carpet, and as example of how to convert an + application to using FMR.</p> + + <p>The equation solved by WaveToy is the well known scalar wave + equation, discretised using the Leapfrog method with three time + levels, yielding second order accuracy in space and time. A + typical set of initial data are a plane wave, and a typical + boundary condition is periodicity. Those allow long term + simulations as well as easy and meaningful comparisons to the + analytic solution.</p> + + + + <h2>Compiling Cactus With Carpet</h2> + + <p>Carpet has been written in C++, using templates and the STL + (Standard Template Library). Both templates and the STL make + writing and debugging code a lot easier. Without templates, I + would have had to put much effort into making Carpet support all + of Cactus' data types. Without the STL, I would have had to spend + quite some time implementing basic containers such as lists or + sets. I still had to implement a custom vector type, because + STL's vector type is optimised for large vectors only, and I + needed threedimensional vectors of integers.</p> + + <p>The inner loops of Carpet are the inter-grid operators, that is + the routines that copy, restrict, and prolongate between grids. + Due to Cactus it was rather easy to write these in Fortran 77, + which makes them both fast and portable.</p> + + <p>Carpet is an arrangement in Cactus. It can theoretically be + compiled without any other external library, if you don't include + the binary I/O support which requires FlexIO. I do recommend + using FlexIO, so you should install the FlexIO library first. + Although FlexIO is already part of Cactus in the thorn called + CactusExternal/FlexIO, this seems to be a version that has FMR + support disabled and is hence not usable. You will have to + install a complete copy of FlexIO by hand.</p> + + <h3>Hurdle 1: FlexIO</h3> + + <p>DESCRIBE INSTALLING FLEXIO HERE. MENTION INSTALLING HDF4 + FIRST, THEN HDF5, THEN FLEXIO</p> + + <h3>Hurdle 2: STL</h3> + + <p>DESCRIBE HOW TO GET/INSTALL AN STL, OR HOW TO MAKE THE SYSTEM'S + C++ COMPILER WORK WITH THE STL</p> + + <h3>Hurdle 3: Templates</h3> + + <p>DESCRIBE HOW TO INSTANTIATE, AND NOT-INSTANTIATE, THE TEMPLATES + AS NEEDED.</p> + + <h3>WaveToy</h3> + + <p>CONFIGURING, THORNS, COMPILING</p> + + <p>To be continued...</p> + + + + <h2>Running The Example Applications</h2> + + <p>SAMPLE FILES</p> + + <p>CARPET'S OPTIONS</p> + + <p>To be continued...</p> + + <p>Second order convergence requires second order interpolation in + time, which requires that at least three time levels are + present.</p> + + + + <h2>Fold Your Own FMR Application</h2> + + <p>There are three steps to take from a simple unigrid + uniprocessor toy application to a full-blown FMR multiprocessor + production application. Those steps are almost independent, and I + would like to explain them and their implications in some detail + below.</p> + + <h3>Multiple Processors</h3> + + <p>The probably best known of these is the step from using one to + using several processors, also known as parallelisation. Because + many people are already familiar with this step, I will describe + it first.</p> + + <p>In a uniprocessor application, it is possible to access every + grid point in arbitrary manners. In order to allow multiple + processors to run efficiently in parallel, the grid is broken down + into several rectangular components, and each processor is + assigned one of these components.</p> + + <p>The components will usually overlap by a few grid points, so as + to allow the processors to e. g. calculate spatial derivatives + (which require neighbouring grid points) without having to + communicate for every grid point. From time to time it is then + necessary to synchronise the overlapping region, which is the only + time at which communication happens. This allows the application + to run almost unchanged, i. e. without invoking communication + itself. The synchronisation routine is provided by the driver and + not by the application.</p> + + <p>Of course a serial applicate usually will have to be changed to + support multiple processors. In order to do so, all the + operations that the application performs have to be classified + into one of two categories:</p> + + <p>One category contains the so-called <em>local</em> operations. + These are operations that are applied to each and every grid point + individually, and that do not depend on any other grid point + except nearby neighbours. Each local operation will thus involve + a loop over all grid points, and in order to run on multiple + processors, after each such loop the synchronisation routine has + to be called. An example of a local operation would be + calculating a spatial derivative.</p> + + <p>The other category contains so-called <em>global</em> + operations. These operations do not depend on individual grid + points, and thus do not involve loops over grid points. The + result of a global operation is the same on all processors; + therefore global operations don't involve communication and don't + require synchronisation. An example of a global operation would + be to check how many time steps have been taken, and decide + whether the simulation should be terminated.</p> + + <p>Typically most operations can be classified or rewritten to be + either local or global. But often there are operations that fit + neither category, and these parts of an application are hardest to + parallelise. Applying the boundary conditions, to give another + example, might seem at first to be neither local nor global. But + in a slight (yet completely correct) stretch of the term "applied + to all grid points", boundary conditions can be classified as + local; they are a local operation that just does nothing to most + grid points.</p> + + <p>To give one more example, calculating an error norm does not + fit these categories. It is neither local nor global. It is not + local because the results involved all grid points (and not only + nearby neighbours), and it is not global because it does involve + the grid points. All operations that do not fit the two category + require typically special handling, and often require hand-coded + communication in the application. Luckily calculating various + norms is such a common case that there are special routines for + that already present, called <em>reduction operators</em>.</p> + + <h3>Multiple Resolution Levels</h3> + + <p>There are several reasons why an application might want to + incorporate more than one grid, overlapping and each with a + different resolution.</p> + + <p>The most commonly known reason is probably a convergence test, + where the very same problem is treated in different resolutions. + Differences in the result are then likely caused by insufficient + resolution on the coarser (or on all) grids. For a convergence + test, the grids are completely independent, and it does not matter + whether the simulation runs on all grids simultaneously or + sequentially. In order to treat the grid sequentially, the + application does not have to be changed at all.</p> + + <p>The reason of interest here is of course FMR. For FMR, the + order in which the grids are treated is fixed. As described + above, there is first a time step on the coarse grid, and then + recursively several smaller steps on the finer grids. This order + does require certain changes in the application. The sequence of + operations that form a single time step have to be identified and + isolated. (Which is to say that there has to be a routine that + calculates a time step, that is, a complete time step, and nothing + else.) It is then the task of the FMR driver to call this routine + for the correct grids in the correct order.</p> + + <p>Other reasons for multiple resolution levels are + e. g. multigrid algorithms for elliptic equations, which I do not + want to mention here, or shadow hierarchies to determine + truncation errors, which I also want to skip here. Shadow + hierarchies are very similar to the convergence levels described + above.</p> + + <p>Beside this order in which the operations are performed on the + grids, there is one more complication for FMR. The boundary + values of the finer grids have to be calculated from the coarser + grids through interpolation. An because the time steps on the + finer grids are smaller, there is not always a corresponding value + on the coarser grids available. This makes it necessary to + interpolate in time between time steps on the coarser grids. The + alternative would be to take smaller steps on the coarser grids, + and this would be very expensive.</p> + + <p>These interpolations in time make it necessary that the driver + knows which grid function contains values corresponding to what + time. The usual way to achieve this is to have several time + levels per grid function; three time levels allow for a second + order interpolation in time. Only grid functions with enough time + levels can be interpolated, i. e. boundary conditions can be + calculated only for those.</p> + + <p>Fortunately time levels are rather widespread in applications, + so they are no new concept to introduce. Unfortunately they are + often abused, so that values corresponding to the wrong time are + stored in a time level, usually with the excuse of saving storage. + This will in general not work with FMR, because the driver then + cannot interpolate in time, leading to incorrect values on the + boundaries of the finer grids.</p> + + <h3>Multiple Grid Components</h3> + + <p>Sometimes it is convenient to have a simulation domain that is + not a rectangle. It might instead be an L-shaped simulation + domain, or a domain that consists of two disconnected rectangular + regions. This issue becomes more important with FMR, because + there it is often convenient to have several disconnected refined + regions. As long as there are enough processors available, each + processor can be assigned a region or a part thereof, and no new + concept need be introduced. If, however, there are fewer + processors than regions, then a new problem arises.</p> + + <p>A common case for that problem might be a simulation containing + just two refined regions, and running on a single processor. The + refined grid the consists of two component. The problem then is + that the two components cannot be treated sequentially: Imagine + the time evolution routine working on (say) the first component. + It will at some time call the synchronisation routine. At that + time there are no values from the second component available, + because the second component has not been treated yet. Therefore + the synchronisation routine cannot complete. That means in turn + that the time evolution routine cannot complete working on the + first component, leading to a deadlock. Work on neither component + can be completed before work on the other component.</p> + + <p>The solution is to break up the time evolution routine into + several smaller routines, each consisting of a single either local + or global operation. ("Local" and "global" have here the exact + same meanings that were defined above for parallelisation.) A + local operation works, by definition, on individual grid points. + Hence the local routines have to be called once for every grid + component. A global operation, by definition, does not depend on + individual grid points. Hence it has to be called only once per + processor, and not once per component. That means that the driver + has to be told the category individual routine is in.</p> + + <h3>Example</h3> + + <p>Let me finish this section with an detailed example. Suppose + you want to solve the equation</p> + + <center>d/dt u = f(u) ,</center> + + <p>integrating using the midpoint rule, i. e. the simplemost + second-order time integration scheme. Given values at the + previous time u<sup>n-1</sup>, one first calculates a first order + solution using an Euler step, leading to the intermediate + result</p> + + <center>v<sup>n</sup> = u<sup>n-1</sup> + dt f(u<sup>n-1</sup>) + .</center> + + <p>The second and final step is then calculated via</p> + + <center>u<sup>n</sup> = u<sup>n-1</sup> + dt f([u<sup>n-1</sup> + + v<sup>n</sup>]/2) .</center> + + <p>The corresponding pseudo code would look like</p> + + <ol> + <li>Calculate Euler step, storing the result into + u<sup>n</sup></li> + + <li>Apply boundary conditions to u<sup>n</sup></li> + + <li>Synchronise u<sup>n</sup></li> + + <li>Calculate average of u<sup>n-1</sup> and u<sup>n</sup>, + storing the result into v<sup>n</sup></li> + + <li>Calculate second step, storing the result again into + u<sup>n</sup></li> + + <li>Apply boundary conditions again to u<sup>n</sup></li> + + <li>Synchronise again u<sup>n</sup></li> + </ol> + + <p>The above algorithm looks a bit different from a naive + implementation of the midpoint rule. One difference is that both + the first and the second step store their result into + u<sup>n</sup>. This is necessary because it would be inconvenient + to apply boundary conditions to the intermediate value + v<sup>n</sup>. Remember, in order to apply boundary conditions on + the finer grids, there have to be several time levels present. + With the above scheme, only u needs several time levels. v is + used only as a temporary (and could conceivably be completely + eliminated).</p> + + <p>Note also that the first step goes all the way from time level + n-1 to time level n. The midpoint rule can be rewritten (in fact, + is usually written) so that the first step is only a half step, + leading to the time level n-1/2. This is not possible for FMR, + because interpolating to the time n-1/2 is not possible, and thus + there could be no boundary conditions applied after the first + step.</p> + + <p>The second thing to note is that the application of the + boundary condition and the synchronisation have been separated + rather artificially. Normally synchronisation would be considered + part of the boundary condition. In this case, however, the + applying the boundary condition is a local operation, whereas + synchronisation counts as global operation. (It is not obvious + that synchronisation should be global, but as the synchronisation + routine is a part of Carpet, it was up to me to decide this.) As + explained above, local and global operations have to be + separated.</p> + + <p>Separating the evolution steps and the boundary condition + routines is, on the other hand, just a notational convenience. + There could well be a single routine implementing both.</p> + + <p>For Cactus, the order in which to call the individual parts of + the time evolution routines is described in the schedule routines, + i. e. in the files called <code>schedule.ccl</code>. By default a + routine is assumed to be local; global routines have to be tagged + with <code>OPTIONS: GLOBAL</code>.</p> + + <p>The tag <code>SYNC: groupname</code> indicates that the group + <code>groupname</code> should be synchronised after the scheduled + routine has been called for all grid components. This obviously + makes sense only for local routines. Using the <code>SYNC:</code> + tag is preferred over calling the synchronisation routine + <code>CCTK_SyncGroup</code> directly.</p> + + <p>The example thorn WaveToy in Carpet's arrangement is a bit + simpler than what is described here, because it uses the Leapfrog + scheme which consists of only a single step. I would suggest + looking at WaveToy as an initial FMR example.</p> + + <p>The thorn SpaceToy is implemented very close to the way + described here. It evolves two variables phi and psi, but it is + also coupled to the thorn HydroToy. This coupling introduces some + additional complications. The thorn HydroToy, on the other hand + uses a predictor-corrector scheme, which is also a two step scheme + and thus more complex that WaveToy. All the coupling between + SpaceToy and HydroToy is contained in SpaceToy. I would thus + suggest looking at HydroToy first.</p> + + <p>I assume that converting an application to FMR is + straightforward after handling the time levels has been + straightened out.</p> + + + + <h2>Carpet Under The Hood</h2> + + <p>To be continued...</p> + + + + <h2>Moving Boxes, Adaptive Mesh Refinement</h2> + + <p>To be continued...</p> + + + + <hr> + <address><a href="mailto:schnetter@uni-tuebingen.de">Erik Schnetter</a></address> +<!-- Created: Sun Mar 25 15:15:18 CST 2001 --> +<!-- hhmts start --> +Last modified: Sun Mar 25 20:26:30 CST 2001 +<!-- hhmts end --> + </body> +</html> |