% *======================================================================* % Cactus Thorn template for ThornGuide documentation % Author: Ian Kelley % Date: Sun Jun 02, 2002 % $Header: /home/eschnett/C/carpet/Carpet/Carpet/doc/documentation.tex,v 1.2 2002/11/29 10:43:43 schnetter Exp $ % % Thorn documentation in the latex file doc/documentation.tex % will be included in ThornGuides built with the Cactus make system. % The scripts employed by the make system automatically include % pages about variables, parameters and scheduling parsed from the % relevent thorn CCL files. % % This template contains guidelines which help to assure that your % documentation will be correctly added to ThornGuides. More % information is available in the Cactus UsersGuide. % % Guidelines: % - Do not change anything before the line % % START CACTUS THORNGUIDE", % except for filling in the title, author, date etc. fields. % - Each of these fields should only be on ONE line. % - Author names should be sparated with a \\ or a comma % - You can define your own macros are OK, but they must appear after % the START CACTUS THORNGUIDE line, and do not redefine standard % latex commands. % - To avoid name clashes with other thorns, 'labels', 'citations', % 'references', and 'image' names should conform to the following % convention: % ARRANGEMENT_THORN_LABEL % For example, an image wave.eps in the arrangement CactusWave and % thorn WaveToyC should be renamed to CactusWave_WaveToyC_wave.eps % - Graphics should only be included using the graphix package. % More specifically, with the "includegraphics" command. Do % not specify any graphic file extensions in your .tex file. This % will allow us (later) to create a PDF version of the ThornGuide % via pdflatex. | % - References should be included with the latex "bibitem" command. % - use \begin{abstract}...\end{abstract} instead of \abstract{...} % - For the benefit of our Perl scripts, and for future extensions, % please use simple latex. % % *======================================================================* % % Example of including a graphic image: % \begin{figure}[ht] % \begin{center} % \includegraphics[width=6cm]{MyArrangement_MyThorn_MyFigure} % \end{center} % \caption{Illustration of this and that} % \label{MyArrangement_MyThorn_MyLabel} % \end{figure} % % Example of using a label: % \label{MyArrangement_MyThorn_MyLabel} % % Example of a citation: % \cite{MyArrangement_MyThorn_Author99} % % Example of including a reference % \bibitem{MyArrangement_MyThorn_Author99} % {J. Author, {\em The Title of the Book, Journal, or periodical}, 1 (1999), % 1--16. {\tt http://www.nowhere.com/}} % % *======================================================================* % If you are using CVS use this line to give version information % $Header: /home/eschnett/C/carpet/Carpet/Carpet/doc/documentation.tex,v 1.2 2002/11/29 10:43:43 schnetter Exp $ \documentclass{article} % Use the Cactus ThornGuide style file % (Automatically used from Cactus distribution, if you have a % thorn without the Cactus Flesh download this from the Cactus % homepage at www.cactuscode.org) \usepackage{../../../doc/ThornGuide/cactus} \usepackage{hyperref} \begin{document} % The author of the documentation \author{Erik Schnetter \textless schnetter@uni-tuebingen.de\textgreater} % The title of the document (not necessarily the name of the Thorn) \title{Carpet} % the date your document was last changed, if your document is in CVS, % please use: \date{$ $Date: 2002/11/29 10:43:43 $ $} \maketitle % Do not delete next line % START CACTUS THORNGUIDE % Add all definitions used in this documentation here % \def\mydef etc % Add an abstract for this thorn's documentation \begin{abstract} This text describes the Carpet arrangement. Carpet is a mesh refinement driver for Cactus that can replace PUGH, the standard unigrid driver. Carpet supports multiple refinement levels and multiple grid patches. Carpet can run in parallel, but not yet very efficiently so. Carpet does not yet support multiple grid hierarchies, i.e.\ shadow hierarchies or automatic convergence tests. \end{abstract} \section{Overview} \subsection{Fixed Mesh Refinement, aka Box-in-Box} Fixed Mesh Refinement (FMR), also known as box-in-box, is a way to increase the local resolution in unigrid applications, while retaining the basic unigrid character of an application. A small number (maybe two or three) of grids with varying resolution overlay each other, where the coarsest grid has the largest extent. This allows the application to benefit from the higher resolution of the smaller grids while keeping the outer boundary far out at the same time. The main advantage of FMR are that it needs far less resources than globally increasing the resolution. \subsection{Carpet} Carpet is the name of an FMR driver, i.e.\ the back end that handles storage allocation for the grid functions, parallelism, I/O, and the various inter-grid operations. Carpet was developed in early summer of 2000 by Erik Schnetter \cite{Carpet__erik-schnetter}, then a research scholar in the Department for Astronomy and Astrophysics \cite{Carpet__astro-psu-edu} of Penn State University \cite{Carpet__psu-edu}. In spring 2001, Carpet was coupled to Cactus as a drop-in enhancement for the standard unigrid Cactus driver PUGH. \subsection{Cactus} From the main Cactus web pages \cite{Carpet__cactuscode-org}: \begin{quote} Cactus is an open source problem solving environment designed for scientests and engineers. Its modular structure easily enables parallel computation across different architectures and collaborative code development between different groups. Cactus originated in the academic research community, where it was developed and used over many years by a large international collaboration of physicists and computational scientists. \end{quote} \section{Introduction} \subsection{Fixed Mesh Refinement} A standard way of solving partial differential equations are finite differences on a regular grid. This is also called \emph{unigrid}. Such an application discretises its problem space onto a single, rectangular grid which has everywhere the same grid spacing. This grid might be broken up into several parts for parallelisation purposes, but parallelisation should be transparent to the physics part of the application. Increasing the resolution in a unigrid application is somewhat expensive. For example, increasing the resolution by a factor of two requires a factor of eight more storage in three dimensions. Given a constant Courant factor, the calculation time will even go up by a factor of sixteen. This behaviour makes it easy to find problems that cannot be solved on contemporary supercomputers, no matter how big and fast those computers are. Apart from physical insight, which often has to be used to decrease the problem size until it fits the current hardware, there are also numerical and algorithmic methods to decrease the resource requirements of the application. Most applications need the high resolution only in a part of the simulation domain. Discretisation methods that don't require a uniform resolution, such as finite elements, can implement non-uniform resolutions very naturally. One problem with finite elements is that many physicists today are not familiar with finite elements, or shy away from their perceived complexity, or are not willing to adapt existing finite difference code. Fixed Mesh Refinement (FMR) is a poor man's way of implementing a non-uniform resolution into a unigrid application with minimal changes to its structure. Instead of only one grid, there are several grids or grid patches with different resolutions. The coarsest grid usually encloses the whole simulation domain. Successively finer grids overlay the coarse grid at those locations where a higher resolutions is needed. The coarser grids provide boundary conditions to the finer grid through interpolation. Instead of updating only one grid, the application has to update all grids. The usual approach is to first take a step on the coarsest grid, and then recursively take several smaller steps on the finer grids. The Courant criterion requires that the step sizes on the finer grids be smaller than on the coarse grid. The boundary values for the finer grids are found through interpolation in space and time from the coarser grid. In the end, the information on the finer grids is injected into the coarse grids. Strictly speaking there is no need for a coarse grid on the regions covered by the finer grids. But as stated above, the resources required for treating the overlapping region on the coarse grid are only minimal compared to treating the finer grids. And because a coarse grid with a hole often creates complications, this obvious optimisation is often left out. \subsection{Carpet} Carpet is a C++ library that provides infrastructure to describe regions of varying resolution in a convenient and efficient way. Carpet contains routines to manage grid hierarchies, containing the relationships between the components of the grid on the different refinement and convergence levels. Carpet has a notion of simulation time and grid spacing, which are necessary for interpolation, and contains efficient interpolators. Carpet can run on several processors in parallel using MPI for communication. Each grid can be broken down into several components, and every component has a home processor. Carpet also contains operators to move certain regions to a different processor, or to synchronise all components of a grid. Carpet is also an arrangement of thorns for Cactus, implementing a driver and associated I/O routines for both ASCII and binary I/O. It should be possible to substitute Carpet for the standard Cactus driver PUGH without changes to the application thorns and thus use Carpet as a unigrid driver. Making use of the FMR capabilities of Carpet usually requires some rearranging of the application, comparable in general to the changes necessary for a uniprocessor application to run on multiple processors. The driver section of Carpet contains the logic to manage storage for the grid functions, to traverse the grid hierarchy for all scheduled routines, and to automatically apply the necessary inter-grid operators for prolongation (interpolation of the fine grid boundaries) and restriction (injecting the fine grid information back into the coarse grid). The ASCII I/O routines use the quasi-standard gnuplot \cite{Carpet__gnuplot-info} format. The binary I/O routines use the FlexIO library \cite{Carpet__FlexIO} written by John Shalf. It allows efficient and platform independent I/O. The FlexIO format is based on HDF \cite{Carpet__HDF} and also supported by several visualisation packages. Carpet is copyrighted by Erik Schnetter, and is available under the LGPL licence from a CVS \cite{Carpet__CVS} repository. \subsection{WaveToy} Cactus comes with a sample application called \emph{WaveToy}, which solves the scalar wave equation with various initial data and boundary conditions. An an example, I have extended WaveToy so that is uses Carpet's FMR capabilities. WaveToy serves both as a test case for Carpet, and as example of how to convert an application to using FMR. The equation solved by WaveToy is the well known scalar wave equation, discretised using the Leapfrog method with three time levels, yielding second order accuracy in space and time. A typical set of initial data are a plane wave, and a typical boundary condition is periodicity. Those allow long term simulations as well as easy and meaningful comparisons to the analytic solution. \section{Compiling Cactus With Carpet} Carpet has been written in C++, using templates and the STL (Standard Template Library). Both templates and the STL make writing and debugging code a lot easier. Without templates, I would have had to put much effort into making Carpet support all of Cactus' data types. Without the STL, I would have had to spend quite some time implementing basic containers such as lists or sets. I still had to implement a custom vector type, because STL's vector type is optimised for large vectors only, and I needed threedimensional vectors of integers. The inner loops of Carpet are the inter-grid operators, that is the routines that copy, restrict, and prolongate between grids. Due to Cactus it was rather easy to write these in \textsc{Fortran 77}, which makes them both fast and portable. Carpet is an arrangement in Cactus. It can theoretically be compiled without any other external library, if you don't include the binary I/O support which requires FlexIO. I do recommend using FlexIO, so you should install the FlexIO library first. Although FlexIO is already part of Cactus in the thorn called CactusExternal/FlexIO, this seems to be a version that has FMR support disabled and is hence not usable. You will have to install a complete copy of FlexIO by hand. \subsection{Hurdle 1: FlexIO} DESCRIBE INSTALLING FLEXIO HERE. MENTION INSTALLING HDF4 FIRST, THEN HDF5, THEN FLEXIO. \subsection{Hurdle 2: STL} DESCRIBE HOW TO GET/INSTALL AN STL, OR HOW TO MAKE THE SYSTEM'S C++ COMPILER WORK WITH THE STL. \subsection{Hurdle 3: Templates} DESCRIBE HOW TO INSTANTIATE, AND NOT-INSTANTIATE, THE TEMPLATES AS NEEDED. \subsection{WaveToy} CONFIGURING, THORNS, COMPILING. To be continued\ldots \section{Running The Example Applications} SAMPLE FILES CARPET'S OPTIONS To be continued\ldots Second order convergence requires second order interpolation in time, which requires that at least three time levels are present. \section{Fold Your Own FMR Application} There are three steps to take from a simple unigrid uniprocessor toy application to a full-blown FMR multiprocessor production application. Those steps are almost independent, and I would like to explain them and their implications in some detail below. \subsection{Multiple Processors} The probably best known of these is the step from using one to using several processors, also known as parallelisation. Because many people are already familiar with this step, I will describe it first. In a uniprocessor application, it is possible to access every grid point in arbitrary manners. In order to allow multiple processors to run efficiently in parallel, the grid is broken down into several rectangular components, and each processor is assigned one of these components. The components will usually overlap by a few grid points, so as to allow the processors to e.g.\ calculate spatial derivatives (which require neighbouring grid points) without having to communicate for every grid point. From time to time it is then necessary to synchronise the overlapping region, which is the only time at which communication happens. This allows the application to run almost unchanged, i.e.\ without invoking communication itself. The synchronisation routine is provided by the driver and not by the application. Of course a serial applicate usually will have to be changed to support multiple processors. In order to do so, all the operations that the application performs have to be classified into one of two categories: One category contains the so-called \emph{local} operations. These are operations that are applied to each and every grid point individually, and that do not depend on any other grid point except nearby neighbours. Each local operation will thus involve a loop over all grid points, and in order to run on multiple processors, after each such loop the synchronisation routine has to be called. An example of a local operation would be calculating a spatial derivative. The other category contains so-called \emph{global} operations. These operations do not depend on individual grid points, and thus do not involve loops over grid points. The result of a global operation is the same on all processors; therefore global operations don't involve communication and don't require synchronisation. An example of a global operation would be to check how many time steps have been taken, and decide whether the simulation should be terminated. Typically most operations can be classified or rewritten to be either local or global. But often there are operations that fit neither category, and these parts of an application are hardest to parallelise. Applying the boundary conditions, to give another example, might seem at first to be neither local nor global. But in a slight (yet completely correct) stretch of the term "applied to all grid points", boundary conditions can be classified as local; they are a local operation that just does nothing to most grid points. To give one more example, calculating an error norm does not fit these categories. It is neither local nor global. It is not local because the results involved all grid points (and not only nearby neighbours), and it is not global because it does involve the grid points. All operations that do not fit the two category require typically special handling, and often require hand-coded communication in the application. Luckily calculating various norms is such a common case that there are special routines for that already present, called \emph{reduction operators}. \subsection{Multiple Resolution Levels} There are several reasons why an application might want to incorporate more than one grid, overlapping and each with a different resolution. The most commonly known reason is probably a convergence test, where the very same problem is treated in different resolutions. Differences in the result are then likely caused by insufficient resolution on the coarser (or on all) grids. For a convergence test, the grids are completely independent, and it does not matter whether the simulation runs on all grids simultaneously or sequentially. In order to treat the grid sequentially, the application does not have to be changed at all. The reason of interest here is of course FMR. For FMR, the order in which the grids are treated is fixed. As described above, there is first a time step on the coarse grid, and then recursively several smaller steps on the finer grids. This order does require certain changes in the application. The sequence of operations that form a single time step have to be identified and isolated. (Which is to say that there has to be a routine that calculates a time step, that is, a complete time step, and nothing else.) It is then the task of the FMR driver to call this routine for the correct grids in the correct order. Other reasons for multiple resolution levels are e.g.\ multigrid algorithms for elliptic equations, which I do not want to mention here, or shadow hierarchies to determine truncation errors, which I also want to skip here. Shadow hierarchies are very similar to the convergence levels described above. Apart from this order in which the operations are performed on the grids, there is one more complication for FMR. The boundary values of the finer grids have to be calculated from the coarser grids through interpolation. An because the time steps on the finer grids are smaller, there is not always a corresponding value on the coarser grids available. This makes it necessary to interpolate in time between time steps on the coarser grids. The alternative would be to take smaller steps on the coarser grids, and this would be very expensive. These interpolations in time make it necessary that the driver knows which grid function contains values corresponding to what time. The usual way to achieve this is to have several time levels per grid function; three time levels allow for a second order interpolation in time. Only grid functions with enough time levels can be interpolated, i.e.\ boundary conditions can be calculated only for those. Fortunately time levels are rather widespread in applications, so they are no new concept to introduce. Unfortunately they are often abused, so that values corresponding to the wrong time are stored in a time level, usually with the excuse of saving storage. This will in general not work with FMR, because the driver then cannot interpolate in time, leading to incorrect values on the boundaries of the finer grids. \subsection{Multiple Grid Components} Sometimes it is convenient to have a simulation domain that is not a rectangle. It might instead be an L-shaped simulation domain, or a domain that consists of two disconnected rectangular regions. This issue becomes more important with FMR, because there it is often convenient to have several disconnected refined regions. As long as there are enough processors available, each processor can be assigned a region or a part thereof, and no new concept need be introduced. If, however, there are fewer processors than regions, then a new problem arises. A common case for that problem might be a simulation containing just two refined regions, and running on a single processor. The refined grid the consists of two component. The problem then is that the two components cannot be treated sequentially: Imagine the time evolution routine working on (say) the first component. It will at some time call the synchronisation routine. At that time there are no values from the second component available, because the second component has not been treated yet. Therefore the synchronisation routine cannot complete. That means in turn that the time evolution routine cannot complete working on the first component, leading to a deadlock. Work on neither component can be completed before work on the other component. The solution is to break up the time evolution routine into several smaller routines, each consisting of a single either local or global operation. (``Local'' and ``global'' have here the exact same meanings that were defined above for parallelisation.) A local operation works, by definition, on individual grid points. Hence the local routines have to be called once for every grid component. A global operation, by definition, does not depend on individual grid points. Hence it has to be called only once per processor, and not once per component. That means that the driver has to be told the category individual routine is in. \subsection{Example} Let me finish this section with an detailed example. Suppose you want to solve the equation \begin{eqnarray} \frac{d}{dt} u & = & f(u) \quad , \end{eqnarray} integrating using the midpoint rule, i.e.\ the simplemost second-order time integration scheme. Given values at the previous time $u^{n-1}$, one first calculates a first order solution using an Euler step, leading to the intermediate result \begin{eqnarray} v^n & = & u^{n-1} + dt\; f(u^{n-1}) \quad . \end{eqnarray} The second and final step is then calculated via \begin{eqnarray} u^n & = & u^{n-1} + dt\; f(\frac{1}{2} [u^{n-1} + v^n]) \quad . \end{eqnarray} The corresponding pseudo code would look like \begin{enumerate} \item Calculate Euler step, storing the result into $u^n$ \item Apply boundary conditions to $u^n$ \item Synchronise $u^n$ \item Calculate average of $u^{n-1}$ and $u^n$, storing the result into $v^n$ \item Calculate second step, storing the result again into $u^n$ \item Apply boundary conditions again to $u^n$ \item Synchronise again $u^n$ \end{enumerate} The above algorithm looks a bit different from a naive implementation of the midpoint rule. One difference is that both the first and the second step store their result into $u^n$. This is necessary because it would be inconvenient to apply boundary conditions to the intermediate value $v^n$. Remember, in order to apply boundary conditions on the finer grids, there have to be several time levels present. With the above scheme, only $u$ needs several time levels. $v$ is used only as a temporary (and could conceivably be completely eliminated). Note also that the first step goes all the way from time level $n-1$ to time level $n$. The midpoint rule can be rewritten (in fact, is usually written) so that the first step is only a half step, leading to the time level $n - \frac{1}{2}$. This is not possible for FMR, because interpolating to the time $n - \frac{1}{2}$ is not possible, and thus there could be no boundary conditions applied after the first step. The second thing to note is that the application of the boundary condition and the synchronisation have been separated rather artificially. Normally synchronisation would be considered part of the boundary condition. In this case, however, the applying the boundary condition is a local operation, whereas synchronisation counts as global operation. (It is not obvious that synchronisation should be global, but as the synchronisation routine is a part of Carpet, it was up to me to decide this.) As explained above, local and global operations have to be separated. Separating the evolution steps and the boundary condition routines is, on the other hand, just a notational convenience. There could well be a single routine implementing both. For Cactus, the order in which to call the individual parts of the time evolution routines is described in the schedule routines, i.e.\ in the files called \texttt{schedule.ccl}. By default a routine is assumed to be local; global routines have to be tagged with \texttt{OPTIONS: GLOBAL}. The tag \texttt{SYNC: groupname} indicates that the group \texttt{groupname} should be synchronised after the scheduled routine has been called for all grid components. This obviously makes sense only for local routines. Using the \texttt{SYNC:} tag is preferred over calling the synchronisation routine \texttt{CCTK\_SyncGroup} directly. The example thorn WaveToy in Carpet's arrangement is a bit simpler than what is described here, because it uses the Leapfrog scheme which consists of only a single step. I would suggest looking at WaveToy as an initial FMR example. The thorn SpaceToy is implemented very close to the way described here. It evolves two variables phi and psi, but it is also coupled to the thorn HydroToy. This coupling introduces some additional complications. The thorn HydroToy, on the other hand uses a predictor-corrector scheme, which is also a two step scheme and thus more complex that WaveToy. All the coupling between SpaceToy and HydroToy is contained in SpaceToy. I would thus suggest looking at HydroToy first. I assume that converting an application to FMR is straightforward after handling the time levels has been straightened out. \section{Carpet Under The Hood} To be continued\ldots \section{Moving Boxes, Adaptive Mesh Refinement} To be continued\ldots %% \begin{thebibliography}{9} %% \end{thebibliography} \bibliographystyle{amsalpha} % initials + year \bibliography{carpet} % Do not delete next line % END CACTUS THORNGUIDE \end{document}